Re: key/value store optimized for disk storage

2012-05-06 Thread Paul Rubin
John Nagle writes: >That's awful. There's no point in compressing six characters > with zlib. Zlib has a minimum overhead of 11 bytes. You just > made the data bigger. This hack is about avoiding the initialization overhead--do you really get 11 bytes after every SYNC_FLUSH? I do remember

Re: key/value store optimized for disk storage

2012-05-06 Thread John Nagle
On 5/4/2012 12:14 AM, Steve Howell wrote: On May 3, 11:59 pm, Paul Rubin wrote: Steve Howell writes: compressor = zlib.compressobj() s = compressor.compress("foobar") s += compressor.flush(zlib.Z_SYNC_FLUSH) s_start = s compressor2 = compressor.copy() That's a

Re: Creating a directory structure and modifying files automatically in Python

2012-05-06 Thread Paul Rubin
Javier writes: > Or not... Using directories may be a way to do rapid prototyping, and > check quickly how things are going internally, without needing to resort > to complex database interfaces. dbm and shelve are extremely simple to use. Using the file system for a million item db is ridiculou

Re: Creating a directory structure and modifying files automatically in Python

2012-05-06 Thread Javier
>Learn how to use a database. Creating and managing a > big collection of directories to handle small data items is the > wrong approach to data storage. > >John Nagle Or not... Using directories may be a way to do rapid prototyp

new to Python - modules to leverage Perl scripts?

2012-05-06 Thread Rogelio
I've got quite a few Perl scripts that I would like to leverage, and I'd like to make some Python wrapper scripts for them. The Perl scripts shell into various network appliances, run certain commands, and then output those commands into a file. I recently found out about the subprocess modules (

Re: sorting 1172026 entries

2012-05-06 Thread Chris Angelico
On Mon, May 7, 2012 at 10:31 AM, Cameron Simpson wrote: > I didn't mean per .append() call (which I'd expect to be O(n) for large > n), I meant overall for the completed list. > > Don't the realloc()s make it O(n^2) overall for large n? The list > must get copied when the underlying space fills. I

Re: sorting 1172026 entries

2012-05-06 Thread Cameron Simpson
On 06May2012 17:10, Chris Rebert wrote: | On Sun, May 6, 2012 at 4:54 PM, Cameron Simpson wrote: | > On 06May2012 18:36, J. Mwebaze wrote: | > | > for filename in txtfiles: | > | >    temp=[] | > | >    f=open(filename) | > | >    for line in f.readlines(): | > | >      line = line.strip() | > |

Re: sorting 1172026 entries

2012-05-06 Thread Chris Rebert
On Sun, May 6, 2012 at 4:54 PM, Cameron Simpson wrote: > On 06May2012 18:36, J. Mwebaze wrote: > | > for filename in txtfiles: > | >    temp=[] > | >    f=open(filename) > | >    for line in f.readlines(): > | >      line = line.strip() > | >      line=line.split() > | >      temp.append((parser.

Re: sorting 1172026 entries

2012-05-06 Thread Cameron Simpson
On 06May2012 18:36, J. Mwebaze wrote: | > for filename in txtfiles: | >temp=[] | >f=open(filename) | >for line in f.readlines(): | > line = line.strip() | > line=line.split() | > temp.append((parser.parse(line[0]), float(line[1]))) Have you timed the different parts of

Re: sorting 1172026 entries

2012-05-06 Thread Dan Stromberg
How much physical RAM (not the virtual memory, but the physical memory) does your machine have available? We know the number of elements in your dataset, but how big are the individual elements? If a sort is never completing, you're probably swapping. list.sort() is preferrable to sorted(list),

Re: sorting 1172026 entries

2012-05-06 Thread Mark Lawrence
On 06/05/2012 20:11, Alec Taylor wrote: Also, is there a reason you are sorting the data-set after insert rather than using a self-sorting data-structure? A well chosen self-sorting data-structure is always more efficient when full data flow is controlled. I.e.: first insert can be modified to

Re: sorting 1172026 entries

2012-05-06 Thread Alec Taylor
Also, is there a reason you are sorting the data-set after insert rather than using a self-sorting data-structure? A well chosen self-sorting data-structure is always more efficient when full data flow is controlled. I.e.: first insert can be modified to use the self-sorting data-structure I can

Re: sorting 1172026 entries

2012-05-06 Thread Stefan Behnel
J. Mwebaze, 06.05.2012 18:29: > sorry see, corrected code > > for filename in txtfiles: >temp=[] >f=open(filename) >for line in f.readlines(): > line = line.strip() > line=line.split() > temp.append((parser.parse(line[0]), float(line[1]))) >temp=sorted(temp) >wit

Re: sorting 1172026 entries

2012-05-06 Thread Chris Rebert
On Sun, May 6, 2012 at 9:29 AM, J. Mwebaze wrote: > sorry see, corrected code > > > for filename in txtfiles: >    temp=[] >    f=open(filename) Why not use `with` here too? >    for line in f.readlines(): readlines() reads *the entire file contents* into memory all at once! Use `for line in f:

Re: sorting 1172026 entries

2012-05-06 Thread xDog Walker
On Sunday 2012 May 06 09:29, J. Mwebaze wrote: >  temp=sorted(temp) Change to: temp.sort() RTFM on sorted() and .sort(). -- Yonder nor sorghum stenches shut ladle gulls stopper torque wet strainers. -- http://mail.python.org/mailman/listinfo/python-list

Re: sorting 1172026 entries

2012-05-06 Thread Gary Herron
On 05/06/2012 09:29 AM, J. Mwebaze wrote: sorry see, corrected code for filename in txtfiles: temp=[] f=open(filename) for line in f.readlines(): line = line.strip() line=line.split() temp.append((parser.parse(line[0]), float(line[1]))) temp=sorted(temp) with open(

Re: sorting 1172026 entries

2012-05-06 Thread J. Mwebaze
I noticed the error in code please ignore this post.. On Sun, May 6, 2012 at 6:29 PM, J. Mwebaze wrote: > sorry see, corrected code > > > for filename in txtfiles: >temp=[] >f=open(filename) >for line in f.readlines(): > line = line.strip() > line=line.split() > temp.a

Re: sorting 1172026 entries

2012-05-06 Thread J. Mwebaze
sorry see, corrected code for filename in txtfiles: temp=[] f=open(filename) for line in f.readlines(): line = line.strip() line=line.split() temp.append((parser.parse(line[0]), float(line[1]))) temp=sorted(temp) with open(filename.strip('.txt')+ '.sorted', 'wb') as

Re: sorting 1172026 entries

2012-05-06 Thread J. Mwebaze
I have attached one of the files, try to sort and let me know the results. Kindly sort by date. ooops - am told the file exceed 25M. below is the code import glob txtfiles =glob.glob('*.txt') import dateutil.parser as parser for filename in txtfiles: temp=[] f=open(filename) for line

Re: sorting 1172026 entries

2012-05-06 Thread Devin Jeanpierre
On Sun, May 6, 2012 at 12:11 PM, J. Mwebaze wrote: > [ (datatime, int) ] * 1172026 I can't duplicate slowness. It finishes fairly quickly here. Maybe you could try posting specific code? It might be something else that is making your program take forever. >>> x = [(datetime.datetime.now() + date

Re: sorting 1172026 entries

2012-05-06 Thread J. Mwebaze
On Sun, May 6, 2012 at 6:09 PM, Devin Jeanpierre wrote: > On Sun, May 6, 2012 at 11:57 AM, J. Mwebaze wrote: > > I have several lists with approx 1172026 entries. I have been trying to > sort > > the records, but have failed.. I tried lists.sort() i also trired sorted > > python's inbuilt method.

Re: sorting 1172026 entries

2012-05-06 Thread J. Mwebaze
On Sun, May 6, 2012 at 6:07 PM, Benjamin Schollnick wrote: > > On May 6, 2012, at 11:57 AM, J. Mwebaze wrote: > > I have several lists with approx 1172026 entries. I have been trying to > sort the records, but have failed.. I tried lists.sort() i also trired > sorted python's inbuilt method. This

Re: sorting 1172026 entries

2012-05-06 Thread Devin Jeanpierre
On Sun, May 6, 2012 at 11:57 AM, J. Mwebaze wrote: > I have several lists with approx 1172026 entries. I have been trying to sort > the records, but have failed.. I tried lists.sort() i also trired sorted > python's inbuilt method. This has been running for weeks. Sorting 1172026 random floats ta

Re: sorting 1172026 entries

2012-05-06 Thread Benjamin Schollnick
On May 6, 2012, at 11:57 AM, J. Mwebaze wrote: > I have several lists with approx 1172026 entries. I have been trying to sort > the records, but have failed.. I tried lists.sort() i also trired sorted > python's inbuilt method. This has been running for weeks. > > Any one knows of method that

sorting 1172026 entries

2012-05-06 Thread J. Mwebaze
I have several lists with approx 1172026 entries. I have been trying to sort the records, but have failed.. I tried lists.sort() i also trired sorted python's inbuilt method. This has been running for weeks. Any one knows of method that can handle such lists. cheers -- *Mob UG: +256 (0) 70 17

Re: Problem with time.time() standing still

2012-05-06 Thread Bob Cowdery
On 06/05/2012 09:49, Cameron Simpson wrote: > On 06May2012 09:18, Bob Cowdery wrote: > | On 05/05/2012 23:05, Cameron Simpson wrote: > | > On 05May2012 20:33, Bob Cowdery wrote: > | > | [...] calls to time.time() always return the same > | > | time which is usually several seconds in the past or

Re: Problem with time.time() standing still

2012-05-06 Thread Bob Cowdery
On 06/05/2012 09:24, Chris Angelico wrote: > On Sun, May 6, 2012 at 6:18 PM, Bob Cowdery wrote: >> On 05/05/2012 23:05, Cameron Simpson wrote: >>> Thought #1: you are calling time.time() and haven't unfortunately >>> renamed it? (I doubt this scenario, though the lack of fractional part >>> is int

Re: Problem with time.time() standing still

2012-05-06 Thread Cameron Simpson
On 06May2012 09:18, Bob Cowdery wrote: | On 05/05/2012 23:05, Cameron Simpson wrote: | > On 05May2012 20:33, Bob Cowdery wrote: | > | [...] calls to time.time() always return the same | > | time which is usually several seconds in the past or future and always | > | has no fractional part. | > |

Re: Problem with time.time() standing still

2012-05-06 Thread Chris Angelico
On Sun, May 6, 2012 at 6:18 PM, Bob Cowdery wrote: > On 05/05/2012 23:05, Cameron Simpson wrote: >> Thought #1: you are calling time.time() and haven't unfortunately >> renamed it? (I doubt this scenario, though the lack of fractional part >> is interesting.) > Not sure what you mean by renamed it

Re: Problem with time.time() standing still

2012-05-06 Thread Bob Cowdery
On 06/05/2012 00:11, Chris Angelico wrote: > On Sun, May 6, 2012 at 6:51 AM, Bob Cowdery wrote: >> The time.clock() function does increment correctly. CPU is around 30% > 30% of how many cores? If that's a quad-core processor, that could > indicate one core completely pegged plus a little usage el

Re: Problem with time.time() standing still

2012-05-06 Thread Bob Cowdery
On 05/05/2012 23:05, Cameron Simpson wrote: > On 05May2012 20:33, Bob Cowdery wrote: > | I've written a straight forward extension that wraps a vendors SDK for a > | video capture card. All works well except that in the Python thread on > | which I call the extension, after certain calls that I be