There is actually some way to solve the problem and still have garbage collection enabled. There is the gc.set_threshold function which you could use to make the collection get performed less often. Another way is to disable it and manually call gc.collect() somewhere in your code where it makes sense. In both cases, don't forget to put gc back into the state you found it after you finnish (this might not be the default state). The garbage collection is doing good things, but apparently way to often, in your code.
On Sep 20, 3:14 pm, dmharvey <dmhar...@cims.nyu.edu> wrote: > On Sep 20, 3:47 am, koffie <m.derickx.stud...@gmail.com> wrote: > > > > > > > David: you are 100% right about the garbage collector. > > > The following example sort of shows what is going on: > > > from time import time > > import gc > > K = Qp(13, 10) > > M = [K.random_element() for j in range(200)] > > save(M,"thing.sobj") > > L = [] > > for i in range(4000): > > L.append(load("thing.sobj")) > > for i in range(40): > > print gc.get_count() > > last = time() > > L.append(load("thing.sobj")) > > now = time() > > print "took %.2f seconds" % (now - last) > > last = now > > > Gives as output: > > > (361, 4, 10) > > took 0.00 seconds > > (562, 4, 10) > > took 0.00 seconds > > (62, 5, 10) > > took 0.00 seconds > > (262, 5, 10) > > took 0.00 seconds > > (463, 5, 10) > > took 0.00 seconds > > (664, 5, 10) > > took 0.00 seconds > > (165, 6, 10) > > took 0.00 seconds > > (366, 6, 10) > > took 0.00 seconds > > (567, 6, 10) > > took 0.00 seconds > > (68, 7, 10) > > took 0.00 seconds > > (269, 7, 10) > > took 0.00 seconds > > (470, 7, 10) > > took 0.00 seconds > > (671, 7, 10) > > took 0.00 seconds > > (172, 8, 10) > > took 0.00 seconds > > (373, 8, 10) > > took 0.00 seconds > > (574, 8, 10) > > took 0.00 seconds > > (75, 9, 10) > > took 0.00 seconds > > (276, 9, 10) > > took 0.00 seconds > > (477, 9, 10) > > took 0.00 seconds > > (678, 9, 10) > > took 0.00 seconds > > (179, 10, 10) > > took 0.00 seconds > > (380, 10, 10) > > took 0.00 seconds > > (581, 10, 10) > > took 0.00 seconds > > (82, 11, 10) > > took 0.00 seconds > > (283, 11, 10) > > took 0.00 seconds > > (484, 11, 10) > > took 0.00 seconds > > (685, 11, 10) > > took 0.00 seconds > > (186, 0, 11) > > took 0.00 seconds > > (387, 0, 11) > > took 0.00 seconds > > (588, 0, 11) > > took 0.24 seconds > > (90, 0, 0) > > took 0.00 seconds > > (295, 0, 0) > > took 0.00 seconds > > (496, 0, 0) > > took 0.00 seconds > > (697, 0, 0) > > took 0.00 seconds > > (198, 1, 0) > > took 0.00 seconds > > (399, 1, 0) > > took 0.00 seconds > > (600, 1, 0) > > took 0.00 seconds > > (101, 2, 0) > > took 0.00 seconds > > (302, 2, 0) > > took 0.00 seconds > > (503, 2, 0) > > took 0.00 seconds > > > Now is an incremental garbage collector and gc.get_count() outputs > > (A,B,C). Here A is the amount of objects created since last collection > > (the objects in level1 history). When a exceeds 700 it wil run it's > > garbage collection algorithm on these 700 objects, and puts non > > colectable objects of the 700 objects in it's level2 history (causing > > no measurable overhead as the output above shows). The number b is the > > amount of times the garbage collector has run on its level1 objects. > > When this number exceeds 10 the gc wil run on the objects in its > > level1 history. Now c is the amount of times gc has run on it's level2 > > objects. When this exceeds 10 it will run on all objects in memory. > > And the test code above shows that this is exactly where we get the > > overhead. > > > Notice that I had to put: > > for i in range(4000): > > L.append(load("thing.sobj")) > > There to have enough objects in memory to make the overhead visible. > > So the conclusion is that I wouldn't want to worry about it that much, > > since the overhead is so small compared to the number of objects in > > memory. Every run of the loop adds about 200 objects, this leads to > > the conclusion that you need about 800000 objects in memory to get an > > overhead of 0.24 seconds. Furthermore this overhead will only be there > > only once in the creation of 70000 objects. The only reason that the > > overhead is really visible and significant in your example is because > > your code is not realy typical code. Most of the time when programs > > create that much objects, they also have to do calculations on them so > > the overhead really gets insignificant with respect to the calculation > > time. > > > The reason why the theorem about N=500 holds is probably because > > python makes all small numbers unique. As the following example shows: > > L1=range(245,265); L2=range(245,265) > > for i in xrange(20): > > print L1[i],L2[i],L1[i] is L2[i] > > > has as output: > > > 245 245 True > > 246 246 True > > 247 247 True > > 248 248 True > > 249 249 True > > 250 250 True > > 251 251 True > > 252 252 True > > 253 253 True > > 254 254 True > > 255 255 True > > 256 256 True > > 257 257 False > > 258 258 False > > 259 259 False > > 260 260 False > > 261 261 False > > 262 262 False > > 263 263 False > > 264 264 False > > Ah this explains a lot. > > You are right about the small integers; for integer mod rings it looks > like someone decided to do a similar trick with caching small values > for small rings: > > sage: for n in range(490, 510): > ....: R = Integers(n) > ....: print n, R(3) is R(3) > ....: > 490 True > 491 True > 492 True > 493 True > 494 True > 495 True > 496 True > 497 True > 498 True > 499 True > 500 False > 501 False > 502 False > 503 False > 504 False > 505 False > 506 False > 507 False > 508 False > 509 False > > This means that most of the 50x slowdown I'm seeing is probably due to > something similar to a ticket I reported recently (http:// > trac.sagemath.org/sage_trac/ticket/9887). > > But anyway back to the main problem re: garbage collection. > > You are right I am in a slightly unusual use case, but actually not > that weird. I have a program that forks into say 12 worker tasks that > do some computation (perhaps for a few hours), and generate a bunch of > matrices saved as sobj files (say 1000 of them). When the workers are > finished, the main task loads in all the matrices to finish the > computation. The loading should take only a minute or two based on the > cost of a single load; but because of this garbage collection business > it actually takes hours! I guess there is no way around this apart > from disabling the gc, possibly only during the matrix load phase or > possibly over the whole computation. > > thanks everyone for helping to clear up this mystery! > > david -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org