On Tuesday, 23 April 2013 00:00:33 UTC+8, Volker Braun wrote: > > The first question is, are you actually running out of ram? The garbage > collector seems to have triggered full collections at the 4gb mark, and > memory fragmentation might have left you with 900mb of address space that > is mainly empty. Also, do you really need all 4 million graphs in memory > simultaneously? Use @parallel to iterate over them if thats all you need. > > No, I am not running out of RAM with the small sample, but I will certainly run out of RAM if everything scales as it appears to be doing.
The real issue for me is that the file of 4.2 million graphs occupies about 0.6 Gb on disk, and so when I was roughly working out what to do, it never occurred to me that there would be the slightest problem in storing it in memory (my machine has 16Gb RAM). Ultimately I need to create matroids from the graphs and keep only pairwise non-isomorphic ones. With those sizes, it seemed that it would be easy to just store everything in memory, and work from there. I could write a more complicated and clever routine to work in batches or use some additional theory etc, but I like to do that only when necessary. So when this happened to my tiny sample file, I assumed that I must be doing something spectacularly stupid, for example accidentally calling a method that keeps producing new objects (rather than mutating an existing object), hence the posting. I did some more experiments: I took a file with about 1 million lines and deleted all references to Graph, so that it was just a big bunch of tuples, and I wrapped it up into one big array rather than repeatedly calling "append"). gs = [ [(0,1,0),(0,1,1),(0,1,2),(0,1,3),(0,1,4),(0,1,5),(0,1,6),(0,2,7),(0,2,8),(0,2,9),(0,1,10),(0,1,11),(1,2,12),(1,2,13),(1,2,14),(3,4,15)], [(0,1,0),(0,1,1),(0,1,2),(0,1,3),(0,1,4),(0,1,5),(0,1,6),(0,2,7),(0,2,8),(0,2,9),(0,1,10),(0,1,11),(1,2,12),(1,2,13),(1,3,14),(2,4,15)], ... a million more lines This file occupies 157Mb on disk. I made a variant of this file suitable for input into another computer algebra system, in this case Magma gs := [ [[0,1,0],[0,1,1],[0,1,2],[0,1,3],[0,1,4],[0,1,5],[0,1,6],[0,2,7],[0,2,8],[0,2,9],[0,1,10],[0,1,11],[1,2,12],[1,2,13],[1,2,14],[3,4,15]], [[0,1,0],[0,1,1],[0,1,2],[0,1,3],[0,1,4],[0,1,5],[0,1,6],[0,2,7],[0,2,8],[0,2,9],[0,1,10],[0,1,11],[1,2,12],[1,2,13],[1,3,14],[2,4,15]], ... a million more lines Then I tried load "tst.magma" (in Magma) %runfile tst.sage (in Sage) to see the difference... With Magma, it took 45 seconds to read in the file, and the memory usage (as reported by ps) went up seemingly monotonically from about 10Mb to about 4.8Gb over that time period. With Sage, I had to kill the job after 12 minutes because the process had blown out to 12Gb or Real Mem and 36 Gb of virtual memory and the computer was barely responsive. This is making it hard for me to work with large data sets, but perhaps Sage is simply the wrong tool for this job? -- You received this message because you are subscribed to the Google Groups "sage-support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sage-support?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
