In the end, I used a cStringIO object to store the chromosomes -
because there are only 23, I can use one character for each chromosome
and represent the whole lot with a giant string and a dictionary to
say what each character means. Then I used numpy arrays for the data
and coordinates. This sque
On Mar 13, 1:13 pm, "psaff...@googlemail.com"
wrote:
> Thanks for all the replies.
>
> First of all, can anybody recommend a good way to show memory usage? I
> tried heapy, but couldn't make much sense of the output and it didn't
> seem to change too much for different usages. Maybe I was just mak
"psaff...@googlemail.com" writes:
> However, I still need the coordinates. If I don't keep them in a list,
> where can I keep them?
See the docs for the array module:
http://docs.python.org/library/array.html
--
http://mail.python.org/mailman/listinfo/python-list
On Fri, Mar 13, 2009 at 1:13 PM, psaff...@googlemail.com
wrote:
> Thanks for all the replies.
>
[snip]
>
> The numpy solution does work, but it uses more than 1GB of memory for
> one of my 130MB files. I'm using
>
> np.dtype({'names': ['chromo', 'position', 'dpoint'], 'formats': ['S6',
> 'i4', 'f8
En Fri, 13 Mar 2009 14:49:51 -0200, Tim Wintle
escribió:
If the same chromosome string is being used multiple times then you may
find it more efficient to reference the same string, so you don't need
to have multiple copies of the same string in memory. That may be what
is taking up the space
psaffrey googlemail.com googlemail.com> writes:
>
> First of all, can anybody recommend a good way to show memory usage?
Python 2.6 has a function called sys.getsizeof().
--
http://mail.python.org/mailman/listinfo/python-list
Thanks for all the replies.
First of all, can anybody recommend a good way to show memory usage? I
tried heapy, but couldn't make much sense of the output and it didn't
seem to change too much for different usages. Maybe I was just making
the h.heap() call in the wrong place. I also tried getrusag
While Kurt gave some excellent ideas for using numpy, there were
some missing details in your original post that might help folks
come up with a "work smarter, not harder" solution.
Clearly, you're not loading it into memory just for giggles --
surely you're *doing* something with it once it's
On Fri, Mar 13, 2009 at 11:33 AM, Kurt Smith wrote:
[snip OP]
>
> Assuming your data is in a plaintext file something like
> 'genomedata.txt' below, the following will load it into a numpy array
> with a customized dtype. You can access the different fields by name
> ('chromo', 'position', and 'd
On Fri, 2009-03-13 at 08:59 -0700, psaff...@googlemail.com wrote:
> I'm reading in some rather large files (28 files each of 130MB). Each
> file is a genome coordinate (chromosome (string) and position (int))
> and a data point (float). I want to read these into a list of
> coordinates (each a tupl
On Fri, Mar 13, 2009 at 10:59 AM, psaff...@googlemail.com
wrote:
> I'm reading in some rather large files (28 files each of 130MB). Each
> file is a genome coordinate (chromosome (string) and position (int))
> and a data point (float). I want to read these into a list of
> coordinates (each a tupl
I'm reading in some rather large files (28 files each of 130MB). Each
file is a genome coordinate (chromosome (string) and position (int))
and a data point (float). I want to read these into a list of
coordinates (each a tuple of (chromosome, position)) and a list of
data points.
This has taught m
12 matches
Mail list logo