I've got a large text processing task to attack (it's actually a genomics
task; matching DNA probes against bacterial genomes). I've got roughly
200,000 probes, each of which is a 25 character long text string. My first
thought is to compile these into 200,000 regexes, but before I launch
Hi,
What is your 'static' data (database), and what is your input-data?
Those 200.000 probes are your database? Perhaps they can be stored as
pickled compiled regexes and thus be loaded in pickled form; then you
don't need to keep them all in memory at once -- if you fear that
memory usage will
Roy Smith [EMAIL PROTECTED] wrote:
...
Is there any easy way to find out how much memory a Python object takes?
No, but there are a few early attempts out there at supplying SOME ways
(not necessarily easy, but SOME). For example, PySizer, at
http://pysizer.8325.org/.
Alex
--
In article [EMAIL PROTECTED],
[EMAIL PROTECTED] (Alex Martelli) wrote:
Roy Smith [EMAIL PROTECTED] wrote:
...
Is there any easy way to find out how much memory a Python object takes?
No, but there are a few early attempts out there at supplying SOME ways
(not necessarily easy, but
Roy Smith wrote:
I've already discovered one (very) surprising thing -- if I build a dict
containing all my regexes (takes about 3 minutes on my PowerBook) and
pickle them to a file, re-loading the pickle takes just about as long as
compiling them did in the first place.
the internal RE byte
There is a function mx_sizeof() in the mx.Tools module from eGenix
which may be helpful. More at
http://www.egenix.com/files/python/eGenix-mx-Extensions.html#mxTools
/Jean Brouwers
PS) This is an approximation for memory usage which is useful in
certain, simple cases.
Each built-in type has
The name of the function in mx.Tools is sizeof() and not mx_sizeof().
My apologies.
Also, it turns out that the return value of mx.Tools.sizeof() function
is non-aligned. For example mx.Tools.sizeof(abcde) returns 29 which
is fine, but not entirely accurate.
/Jean Brouwers
--
[Fredrik Lundh]
the internal RE byte code format is version dependent, so pickle
stores the patterns instead.
Oh! Nice to know. That explains why, when I was learning Python, my
initial experiment with pickles left me with the (probably wrong)
feeling that they were not worth the trouble.