Re: Memory efficient tuple storage

2009-03-19 Thread psaff...@googlemail.com
In the end, I used a cStringIO object to store the chromosomes - because there are only 23, I can use one character for each chromosome and represent the whole lot with a giant string and a dictionary to say what each character means. Then I used numpy arrays for the data and coordinates. This sque

Re: Memory efficient tuple storage

2009-03-13 Thread Aaron Brady
On Mar 13, 1:13 pm, "psaff...@googlemail.com" wrote: > Thanks for all the replies. > > First of all, can anybody recommend a good way to show memory usage? I > tried heapy, but couldn't make much sense of the output and it didn't > seem to change too much for different usages. Maybe I was just mak

Re: Memory efficient tuple storage

2009-03-13 Thread Paul Rubin
"psaff...@googlemail.com" writes: > However, I still need the coordinates. If I don't keep them in a list, > where can I keep them? See the docs for the array module: http://docs.python.org/library/array.html -- http://mail.python.org/mailman/listinfo/python-list

Re: Memory efficient tuple storage

2009-03-13 Thread Kurt Smith
On Fri, Mar 13, 2009 at 1:13 PM, psaff...@googlemail.com wrote: > Thanks for all the replies. > [snip] > > The numpy solution does work, but it uses more than 1GB of memory for > one of my 130MB files. I'm using > > np.dtype({'names': ['chromo', 'position', 'dpoint'], 'formats': ['S6', > 'i4', 'f8

Re: Memory efficient tuple storage

2009-03-13 Thread Gabriel Genellina
En Fri, 13 Mar 2009 14:49:51 -0200, Tim Wintle escribió: If the same chromosome string is being used multiple times then you may find it more efficient to reference the same string, so you don't need to have multiple copies of the same string in memory. That may be what is taking up the space

Re: Memory efficient tuple storage

2009-03-13 Thread Benjamin Peterson
psaffrey googlemail.com googlemail.com> writes: > > First of all, can anybody recommend a good way to show memory usage? Python 2.6 has a function called sys.getsizeof(). -- http://mail.python.org/mailman/listinfo/python-list

Re: Memory efficient tuple storage

2009-03-13 Thread psaff...@googlemail.com
Thanks for all the replies. First of all, can anybody recommend a good way to show memory usage? I tried heapy, but couldn't make much sense of the output and it didn't seem to change too much for different usages. Maybe I was just making the h.heap() call in the wrong place. I also tried getrusag

Re: Memory efficient tuple storage

2009-03-13 Thread Tim Chase
While Kurt gave some excellent ideas for using numpy, there were some missing details in your original post that might help folks come up with a "work smarter, not harder" solution. Clearly, you're not loading it into memory just for giggles -- surely you're *doing* something with it once it's

Re: Memory efficient tuple storage

2009-03-13 Thread Kurt Smith
On Fri, Mar 13, 2009 at 11:33 AM, Kurt Smith wrote: [snip OP] > > Assuming your data is in a plaintext file something like > 'genomedata.txt' below, the following will load it into a numpy array > with a customized dtype.  You can access the different fields by name > ('chromo', 'position', and 'd

Re: Memory efficient tuple storage

2009-03-13 Thread Tim Wintle
On Fri, 2009-03-13 at 08:59 -0700, psaff...@googlemail.com wrote: > I'm reading in some rather large files (28 files each of 130MB). Each > file is a genome coordinate (chromosome (string) and position (int)) > and a data point (float). I want to read these into a list of > coordinates (each a tupl

Re: Memory efficient tuple storage

2009-03-13 Thread Kurt Smith
On Fri, Mar 13, 2009 at 10:59 AM, psaff...@googlemail.com wrote: > I'm reading in some rather large files (28 files each of 130MB). Each > file is a genome coordinate (chromosome (string) and position (int)) > and a data point (float). I want to read these into a list of > coordinates (each a tupl

Memory efficient tuple storage

2009-03-13 Thread psaff...@googlemail.com
I'm reading in some rather large files (28 files each of 130MB). Each file is a genome coordinate (chromosome (string) and position (int)) and a data point (float). I want to read these into a list of coordinates (each a tuple of (chromosome, position)) and a list of data points. This has taught m