On Tuesday, 23 April 2013 00:00:33 UTC+8, Volker Braun wrote:
>
> The first question is, are you actually running out of ram? The garbage 
> collector seems to have triggered full collections at the 4gb mark, and 
> memory fragmentation might have left you with 900mb of address space that 
> is mainly empty. Also, do you really need all 4 million graphs in memory 
> simultaneously? Use @parallel to iterate over them if thats all you need.
>
>
No, I am not running out of RAM with the small sample, but I will certainly 
run out of RAM if everything scales as it appears to be doing.

The real issue for me is that the file of 4.2 million graphs occupies about 
0.6 Gb on disk, and so when I was roughly working out what to do, it never 
occurred to me that there would be the slightest problem in storing it in 
memory (my machine has 16Gb RAM). Ultimately I need to create matroids from 
the graphs and keep only pairwise non-isomorphic ones. With those sizes, it 
seemed that it would be easy to just store everything in memory, and work 
from there.  I could write a more complicated and clever routine to work in 
batches or use some additional theory etc, but I like to do that only when 
necessary.

So when this happened to my tiny sample file, I assumed that I must be 
doing something spectacularly stupid, for example accidentally calling a 
method that keeps producing new objects (rather than mutating an existing 
object), hence the posting.



I did some more experiments: I took a file with about 1 million lines and 
deleted all references to Graph, so that it was just a big bunch of tuples, 
and I wrapped it up into one big array rather than repeatedly calling 
"append").

gs = [
[(0,1,0),(0,1,1),(0,1,2),(0,1,3),(0,1,4),(0,1,5),(0,1,6),(0,2,7),(0,2,8),(0,2,9),(0,1,10),(0,1,11),(1,2,12),(1,2,13),(1,2,14),(3,4,15)],
[(0,1,0),(0,1,1),(0,1,2),(0,1,3),(0,1,4),(0,1,5),(0,1,6),(0,2,7),(0,2,8),(0,2,9),(0,1,10),(0,1,11),(1,2,12),(1,2,13),(1,3,14),(2,4,15)],
... a million more lines

This file occupies 157Mb on disk.

I made a variant of this file suitable for input into another computer 
algebra system, in this case Magma

gs := [
[[0,1,0],[0,1,1],[0,1,2],[0,1,3],[0,1,4],[0,1,5],[0,1,6],[0,2,7],[0,2,8],[0,2,9],[0,1,10],[0,1,11],[1,2,12],[1,2,13],[1,2,14],[3,4,15]],
[[0,1,0],[0,1,1],[0,1,2],[0,1,3],[0,1,4],[0,1,5],[0,1,6],[0,2,7],[0,2,8],[0,2,9],[0,1,10],[0,1,11],[1,2,12],[1,2,13],[1,3,14],[2,4,15]],
... a million more lines


Then I tried 

load "tst.magma" (in Magma)
%runfile tst.sage (in Sage)

to see the difference...

With Magma, it took 45 seconds to read in the file, and the memory usage 
(as reported by ps) went up seemingly monotonically from about 10Mb to 
about 4.8Gb over that time period.
With Sage, I had to kill the job after 12 minutes because the process had 
blown out to 12Gb or Real Mem and 36 Gb of virtual memory and the computer 
was barely responsive.


This is making it hard for me to work with large data sets, but perhaps 
Sage is simply the wrong tool for this job?




-- 
You received this message because you are subscribed to the Google Groups 
"sage-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-support?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to