On 4/23/13 10:54 PM, Gordon wrote:


On Tuesday, 23 April 2013 00:00:33 UTC+8, Volker Braun wrote:

    The first question is, are you actually running out of ram? The
    garbage collector seems to have triggered full collections at the
    4gb mark, and memory fragmentation might have left you with 900mb of
    address space that is mainly empty. Also, do you really need all 4
    million graphs in memory simultaneously? Use @parallel to iterate
    over them if thats all you need.


No, I am not running out of RAM with the small sample, but I will
certainly run out of RAM if everything scales as it appears to be doing.

The real issue for me is that the file of 4.2 million graphs occupies
about 0.6 Gb on disk, and so when I was roughly working out what to do,
it never occurred to me that there would be the slightest problem in
storing it in memory (my machine has 16Gb RAM). Ultimately I need to
create matroids from the graphs and keep only pairwise non-isomorphic
ones. With those sizes, it seemed that it would be easy to just store
everything in memory, and work from there.  I could write a more
complicated and clever routine to work in batches or use some additional
theory etc, but I like to do that only when necessary.

So when this happened to my tiny sample file, I assumed that I must be
doing something spectacularly stupid, for example accidentally calling a
method that keeps producing new objects (rather than mutating an
existing object), hence the posting.



I did some more experiments: I took a file with about 1 million lines
and deleted all references to Graph, so that it was just a big bunch of
tuples, and I wrapped it up into one big array rather than repeatedly
calling "append").

gs = [
[(0,1,0),(0,1,1),(0,1,2),(0,1,3),(0,1,4),(0,1,5),(0,1,6),(0,2,7),(0,2,8),(0,2,9),(0,1,10),(0,1,11),(1,2,12),(1,2,13),(1,2,14),(3,4,15)],
[(0,1,0),(0,1,1),(0,1,2),(0,1,3),(0,1,4),(0,1,5),(0,1,6),(0,2,7),(0,2,8),(0,2,9),(0,1,10),(0,1,11),(1,2,12),(1,2,13),(1,3,14),(2,4,15)],
... a million more lines

This file occupies 157Mb on disk.

I made a variant of this file suitable for input into another computer
algebra system, in this case Magma

gs := [
[[0,1,0],[0,1,1],[0,1,2],[0,1,3],[0,1,4],[0,1,5],[0,1,6],[0,2,7],[0,2,8],[0,2,9],[0,1,10],[0,1,11],[1,2,12],[1,2,13],[1,2,14],[3,4,15]],
[[0,1,0],[0,1,1],[0,1,2],[0,1,3],[0,1,4],[0,1,5],[0,1,6],[0,2,7],[0,2,8],[0,2,9],[0,1,10],[0,1,11],[1,2,12],[1,2,13],[1,3,14],[2,4,15]],
... a million more lines


Then I tried

load "tst.magma" (in Magma)
%runfile tst.sage (in Sage)

to see the difference...

With Magma, it took 45 seconds to read in the file, and the memory usage
(as reported by ps) went up seemingly monotonically from about 10Mb to
about 4.8Gb over that time period.
With Sage, I had to kill the job after 12 minutes because the process
had blown out to 12Gb or Real Mem and 36 Gb of virtual memory and the
computer was barely responsive.


This is making it hard for me to work with large data sets, but perhaps
Sage is simply the wrong tool for this job?

I think if you used python ints instead of Sage Integers, it would make a big difference in the storage. Try renaming that test file to tst.py and runnig %runfile tst.py. Then Sage won't preparse the integers to be Sage Integers (based on MPIR), but instead will interpret them as straight python ints.

Some preliminary experiments suggest that %runfile tst.py should take dramatically less time and memory.

You'll probably have even more memory savings, probably, if you do something like:

import numpy
gs = numpy.asarray(dtype=numpy.int8, a=[
<<million lines>>
])


(assuming your integers fit into an 8-bit integer)

Thanks,

Jason


Thanks,

Jason

--
Jason Grout



--
You received this message because you are subscribed to the Google Groups 
"sage-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-support?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to