eh, this is not what I need I need a max of TIME it took for a gc-minor and the TOTAL time it took for a gc-minor (per query) (ideally same for gc-walkroots and gc-collect-step)
On Mon, Mar 17, 2014 at 4:19 PM, Martin Koch <[email protected]> wrote: > Here are the collated results of running each query. For each run, I count > how many of each of the pypy debug lines i get. I.e. there were 668 runs > that printed 58 loglines that contain "{gc-minor" which was eventually > followed by "gc-minor}". I have also counted if the query was slow; > interestingly, not all the queries with many gc-minors were slow (but all > slow queries had a gc-minor). > > Please let me know if this is unclear :) > > 668 gc-minor:58 gc-minor-walkroots:58 > 10 gc-minor:58 gc-minor-walkroots:58 gc-collect-step:5 > 140 gc-minor:59 gc-minor-walkroots:59 > 1 gc-minor:8441 gc-minor-walkroots:8441 gc-collect-step:8403 > 1 gc-minor:9300 gc-minor-walkroots:9300 gc-collect-step:9249 > 9 gc-minor:9643 slow:1 gc-minor-walkroots:9643 gc-collect-step:9589 > 1 gc-minor:9644 slow:1 gc-minor-walkroots:9644 gc-collect-step:9590 > 10 gc-minor:9647 slow:1 gc-minor-walkroots:9647 gc-collect-step:9609 > 1 gc-minor:9663 gc-minor-walkroots:9663 gc-collect-step:9614 > 1 jit-backend-dump:5 gc-minor:58 gc-minor-walkroots:58 > 1 jit-log-compiling-loop:1 gc-collect-step:8991 jit-backend-dump:78 > jit-backend:3 jit-log-noopt-loop:6 jit-log-virtualstate:3 gc-minor:9030 > jit-tracing:3 gc-minor-walkroots:9030 jit-optimize:6 > jit-log-short-preamble:2 jit-backend-addr:3 jit-log-opt-loop:1 > jit-mem-looptoken-alloc:3 jit-abort:3 jit-log-rewritten-bridge:2 > jit-log-rewritten-loop:1 jit-log-opt-bridge:2 jit-log-compiling-bridge:2 > jit-resume:84 > 1 jit-log-compiling-loop:1 jit-backend-dump:13 jit-backend:1 > jit-log-noopt-loop:2 gc-minor:60 jit-tracing:1 gc-minor-walkroots:60 > jit-optimize:2 jit-log-short-preamble:1 jit-backend-addr:1 > jit-log-opt-loop:1 jit-mem-looptoken-alloc:1 jit-log-rewritten-loop:1 > jit-resume:14 > 1 jit-log-compiling-loop:1 jit-backend-dump:73 jit-backend:3 > jit-log-noopt-loop:6 jit-log-virtualstate:3 gc-minor:60 jit-tracing:3 > gc-minor-walkroots:60 jit-optimize:6 jit-log-short-preamble:2 > jit-backend-addr:3 jit-log-opt-loop:1 jit-mem-looptoken-alloc:3 jit-abort:3 > jit-log-rewritten-bridge:2 jit-log-rewritten-loop:1 jit-log-opt-bridge:2 > jit-log-compiling-bridge:2 jit-resume:84 > 2 jit-log-compiling-loop:1 jit-backend-dump:78 jit-backend:3 > jit-log-noopt-loop:6 jit-log-virtualstate:3 gc-minor:61 jit-tracing:3 > gc-minor-walkroots:61 jit-optimize:6 jit-log-short-preamble:2 > jit-backend-addr:3 jit-log-opt-loop:1 jit-mem-looptoken-alloc:3 jit-abort:3 > jit-log-rewritten-bridge:2 jit-log-rewritten-loop:1 jit-log-opt-bridge:2 > jit-log-compiling-bridge:2 jit-resume:84 > 1 jit-log-short-preamble:2 jit-log-compiling-loop:2 > jit-backend-dump:92 jit-log-noopt-loop:7 jit-log-virtualstate:3 gc-minor:61 > jit-tracing:4 gc-minor-walkroots:61 jit-optimize:7 jit-backend:4 > jit-backend-addr:4 jit-log-opt-loop:2 jit-mem-looptoken-alloc:4 jit-abort:3 > jit-log-rewritten-bridge:2 jit-log-rewritten-loop:2 jit-log-opt-bridge:2 > jit-log-compiling-bridge:2 jit-resume:104 > > > Thanks, > /Martin > > > > On Mon, Mar 17, 2014 at 2:23 PM, Maciej Fijalkowski <[email protected]> > wrote: >> >> On Mon, Mar 17, 2014 at 3:20 PM, Maciej Fijalkowski <[email protected]> >> wrote: >> > are you *sure* it's the walkroots that take that long and not >> > something else (like gc-minor)? More of those mean that you allocate a >> > lot more surviving objects. Can you do two things: >> > >> > a) take a max of gc-minor (and gc-minor-stackwalk), per request >> > b) take the sum of those >> > >> > and plot them >> >> ^^^ or just paste the results actually >> >> > >> > On Mon, Mar 17, 2014 at 3:18 PM, Martin Koch <[email protected]> wrote: >> >> Well, then it works out to around 2.5GHz, which seems reasonable. But >> >> it >> >> doesn't alter the conclusion from the previous email: The slow queries >> >> then >> >> all have a duration around 34*10^9 units, 'normal' queries 1*10^9 >> >> units, or >> >> .4 seconds at this conversion. Also, the log shows that a slow query >> >> performs many more gc-minor operations than a 'normal' one: 9600 >> >> gc-collect-step/gc-minor/gc-minor-walkroots operations vs 58. >> >> >> >> So the question becomes: Why do we get this large spike in >> >> gc-minor-walkroots, and, in particular, is there any way to avoid it :) >> >> ? >> >> >> >> Thanks, >> >> /Martin >> >> >> >> >> >> On Mon, Mar 17, 2014 at 1:53 PM, Maciej Fijalkowski <[email protected]> >> >> wrote: >> >>> >> >>> I think it's the cycles of your CPU >> >>> >> >>> On Mon, Mar 17, 2014 at 2:48 PM, Martin Koch <[email protected]> wrote: >> >>> > What is the unit? Perhaps I'm being thick here, but I can't >> >>> > correlate it >> >>> > with seconds (which the program does print out). Slow runs are >> >>> > around 13 >> >>> > seconds, but are around 34*10^9(dec), 0x800000000 timestamp units >> >>> > (e.g. >> >>> > from >> >>> > 0x2b994c9d31889c to 0x2b9944ab8c4f49). >> >>> > >> >>> > >> >>> > >> >>> > On Mon, Mar 17, 2014 at 12:09 PM, Maciej Fijalkowski >> >>> > <[email protected]> >> >>> > wrote: >> >>> >> >> >>> >> The number of lines is nonsense. This is a timestamp in hex. >> >>> >> >> >>> >> On Mon, Mar 17, 2014 at 12:46 PM, Martin Koch <[email protected]> >> >>> >> wrote: >> >>> >> > Based On Maciej's suggestion, I tried the following >> >>> >> > >> >>> >> > PYPYLOG=- pypy mem.py 10000000 > out >> >>> >> > >> >>> >> > This generates a logfile which looks something like this >> >>> >> > >> >>> >> > start--> >> >>> >> > [2b99f1981b527e] {gc-minor >> >>> >> > [2b99f1981ba680] {gc-minor-walkroots >> >>> >> > [2b99f1981c2e02] gc-minor-walkroots} >> >>> >> > [2b99f19890d750] gc-minor} >> >>> >> > [snip] >> >>> >> > ... >> >>> >> > <--stop >> >>> >> > >> >>> >> > >> >>> >> > It turns out that the culprit is a lot of MINOR collections. >> >>> >> > >> >>> >> > I base this on the following observations: >> >>> >> > >> >>> >> > I can't understand the format of the timestamp on each logline >> >>> >> > (the >> >>> >> > "[2b99f1981b527e]"). From what I can see in the code, this should >> >>> >> > be >> >>> >> > output >> >>> >> > from time.clock(), but that doesn't return a number like that >> >>> >> > when I >> >>> >> > run >> >>> >> > pypy interactively >> >>> >> > Instead, I count the number of debug lines between start--> and >> >>> >> > the >> >>> >> > corresponding <--stop. >> >>> >> > Most runs have a few hundred lines of output between start/stop >> >>> >> > All slow runs have very close to 57800 lines out output between >> >>> >> > start/stop >> >>> >> > One such sample does 9609 gc-collect-step operations, 9647 >> >>> >> > gc-minor >> >>> >> > operations, and 9647 gc-minor-walkroots operations. >> >>> >> > >> >>> >> > >> >>> >> > Thanks, >> >>> >> > /Martin >> >>> >> > >> >>> >> > >> >>> >> > On Mon, Mar 17, 2014 at 8:21 AM, Maciej Fijalkowski >> >>> >> > <[email protected]> >> >>> >> > wrote: >> >>> >> >> >> >>> >> >> there is an environment variable PYPYLOG=gc:- (where - is >> >>> >> >> stdout) >> >>> >> >> which will do that for you btw. >> >>> >> >> >> >>> >> >> maybe you can find out what's that using profiling or valgrind? >> >>> >> >> >> >>> >> >> On Sun, Mar 16, 2014 at 11:34 PM, Martin Koch <[email protected]> >> >>> >> >> wrote: >> >>> >> >> > I have tried getting the pypy source and building my own >> >>> >> >> > version >> >>> >> >> > of >> >>> >> >> > pypy. I >> >>> >> >> > have modified >> >>> >> >> > rpython/memory/gc/incminimark.py:major_collection_step() >> >>> >> >> > to >> >>> >> >> > print out when it starts and when it stops. Apparently, the >> >>> >> >> > slow >> >>> >> >> > queries >> >>> >> >> > do >> >>> >> >> > NOT occur during major_collection_step; at least, I have not >> >>> >> >> > observed >> >>> >> >> > major >> >>> >> >> > step output during a query execution. So, apparently, >> >>> >> >> > something >> >>> >> >> > else >> >>> >> >> > is >> >>> >> >> > blocking. This could be another aspect of the GC, but it could >> >>> >> >> > also >> >>> >> >> > be >> >>> >> >> > anything else. >> >>> >> >> > >> >>> >> >> > Just to be sure, I have tried running the same application in >> >>> >> >> > python >> >>> >> >> > with >> >>> >> >> > garbage collection disabled. I don't see the problem there, so >> >>> >> >> > it >> >>> >> >> > is >> >>> >> >> > somehow >> >>> >> >> > related to either GC or the runtime somehow. >> >>> >> >> > >> >>> >> >> > Cheers, >> >>> >> >> > /Martin >> >>> >> >> > >> >>> >> >> > >> >>> >> >> > On Fri, Mar 14, 2014 at 4:19 PM, Martin Koch <[email protected]> >> >>> >> >> > wrote: >> >>> >> >> >> >> >>> >> >> >> We have hacked up a small sample that seems to exhibit the >> >>> >> >> >> same >> >>> >> >> >> issue. >> >>> >> >> >> >> >>> >> >> >> We basically generate a linked list of objects. To increase >> >>> >> >> >> connectedness, >> >>> >> >> >> elements in the list hold references (dummy_links) to 10 >> >>> >> >> >> randomly >> >>> >> >> >> chosen >> >>> >> >> >> previous elements in the list. >> >>> >> >> >> >> >>> >> >> >> We then time a function that traverses 50000 elements from >> >>> >> >> >> the >> >>> >> >> >> list >> >>> >> >> >> from a >> >>> >> >> >> random start point. If the traversal reaches the end of the >> >>> >> >> >> list, >> >>> >> >> >> we >> >>> >> >> >> instead >> >>> >> >> >> traverse one of the dummy links. Thus, exactly 50K elements >> >>> >> >> >> are >> >>> >> >> >> traversed >> >>> >> >> >> every time. To generate some garbage, we build a list holding >> >>> >> >> >> the >> >>> >> >> >> traversed >> >>> >> >> >> elements and a dummy list of characters. >> >>> >> >> >> >> >>> >> >> >> Timings for the last 100 runs are stored in a circular >> >>> >> >> >> buffer. If >> >>> >> >> >> the >> >>> >> >> >> elapsed time for the last run is more than twice the average >> >>> >> >> >> time, >> >>> >> >> >> we >> >>> >> >> >> print >> >>> >> >> >> out a line with the elapsed time, the threshold, and the 90% >> >>> >> >> >> runtime >> >>> >> >> >> (we >> >>> >> >> >> would like to see that the mean runtime does not increase >> >>> >> >> >> with >> >>> >> >> >> the >> >>> >> >> >> number of >> >>> >> >> >> elements in the list, but that the max time does increase >> >>> >> >> >> (linearly >> >>> >> >> >> with the >> >>> >> >> >> number of object, i guess); traversing 50K elements should be >> >>> >> >> >> independent of >> >>> >> >> >> the memory size). >> >>> >> >> >> >> >>> >> >> >> We have tried monitoring memory consumption by external >> >>> >> >> >> inspection, >> >>> >> >> >> but >> >>> >> >> >> cannot consistently verify that memory is deallocated at the >> >>> >> >> >> same >> >>> >> >> >> time >> >>> >> >> >> that >> >>> >> >> >> we see slow requests. Perhaps the pypy runtime doesn't always >> >>> >> >> >> return >> >>> >> >> >> freed >> >>> >> >> >> pages back to the OS? >> >>> >> >> >> >> >>> >> >> >> Using top, we observe that 10M elements allocates around 17GB >> >>> >> >> >> after >> >>> >> >> >> building, 20M elements 26GB, 30M elements 28GB (and grows to >> >>> >> >> >> 35GB >> >>> >> >> >> shortly >> >>> >> >> >> after building). >> >>> >> >> >> >> >>> >> >> >> Here is output from a few runs with different number of >> >>> >> >> >> elements: >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> pypy mem.py 10000000 >> >>> >> >> >> start build >> >>> >> >> >> end build 84.142424 >> >>> >> >> >> that took a long time elapsed: 13.230586 slow_threshold: >> >>> >> >> >> 1.495401 >> >>> >> >> >> 90th_quantile_runtime: 0.421558 >> >>> >> >> >> that took a long time elapsed: 13.016531 slow_threshold: >> >>> >> >> >> 1.488160 >> >>> >> >> >> 90th_quantile_runtime: 0.423441 >> >>> >> >> >> that took a long time elapsed: 13.032537 slow_threshold: >> >>> >> >> >> 1.474563 >> >>> >> >> >> 90th_quantile_runtime: 0.419817 >> >>> >> >> >> >> >>> >> >> >> pypy mem.py 20000000 >> >>> >> >> >> start build >> >>> >> >> >> end build 180.823105 >> >>> >> >> >> that took a long time elapsed: 27.346064 slow_threshold: >> >>> >> >> >> 2.295146 >> >>> >> >> >> 90th_quantile_runtime: 0.434726 >> >>> >> >> >> that took a long time elapsed: 26.028852 slow_threshold: >> >>> >> >> >> 2.283927 >> >>> >> >> >> 90th_quantile_runtime: 0.374190 >> >>> >> >> >> that took a long time elapsed: 25.432279 slow_threshold: >> >>> >> >> >> 2.279631 >> >>> >> >> >> 90th_quantile_runtime: 0.371502 >> >>> >> >> >> >> >>> >> >> >> pypy mem.py 30000000 >> >>> >> >> >> start build >> >>> >> >> >> end build 276.217811 >> >>> >> >> >> that took a long time elapsed: 40.993855 slow_threshold: >> >>> >> >> >> 3.188464 >> >>> >> >> >> 90th_quantile_runtime: 0.459891 >> >>> >> >> >> that took a long time elapsed: 41.693553 slow_threshold: >> >>> >> >> >> 3.183003 >> >>> >> >> >> 90th_quantile_runtime: 0.393654 >> >>> >> >> >> that took a long time elapsed: 39.679769 slow_threshold: >> >>> >> >> >> 3.190782 >> >>> >> >> >> 90th_quantile_runtime: 0.393677 >> >>> >> >> >> that took a long time elapsed: 43.573411 slow_threshold: >> >>> >> >> >> 3.239637 >> >>> >> >> >> 90th_quantile_runtime: 0.393654 >> >>> >> >> >> >> >>> >> >> >> Code below >> >>> >> >> >> >> >>> >> >> >> -------------------------------------------------------------- >> >>> >> >> >> import time >> >>> >> >> >> from random import randint, choice >> >>> >> >> >> import sys >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> allElems = {} >> >>> >> >> >> >> >>> >> >> >> class Node: >> >>> >> >> >> def __init__(self, v_): >> >>> >> >> >> self.v = v_ >> >>> >> >> >> self.next = None >> >>> >> >> >> self.dummy_data = [randint(0,100) >> >>> >> >> >> for _ in xrange(randint(50,100))] >> >>> >> >> >> allElems[self.v] = self >> >>> >> >> >> if self.v > 0: >> >>> >> >> >> self.dummy_links = [allElems[randint(0, >> >>> >> >> >> self.v-1)] >> >>> >> >> >> for _ >> >>> >> >> >> in >> >>> >> >> >> xrange(10)] >> >>> >> >> >> else: >> >>> >> >> >> self.dummy_links = [self] >> >>> >> >> >> >> >>> >> >> >> def set_next(self, l): >> >>> >> >> >> self.next = l >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> def follow(node): >> >>> >> >> >> acc = [] >> >>> >> >> >> count = 0 >> >>> >> >> >> cur = node >> >>> >> >> >> assert node.v is not None >> >>> >> >> >> assert cur is not None >> >>> >> >> >> while count < 50000: >> >>> >> >> >> # return a value; generate some garbage >> >>> >> >> >> acc.append((cur.v, >> >>> >> >> >> [choice("abcdefghijklmnopqrstuvwxyz") >> >>> >> >> >> for >> >>> >> >> >> x >> >>> >> >> >> in >> >>> >> >> >> xrange(100)])) >> >>> >> >> >> >> >>> >> >> >> # if we have reached the end, chose a random link >> >>> >> >> >> cur = choice(cur.dummy_links) if cur.next is None >> >>> >> >> >> else >> >>> >> >> >> cur.next >> >>> >> >> >> count += 1 >> >>> >> >> >> >> >>> >> >> >> return acc >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> def build(num_elems): >> >>> >> >> >> start = time.time() >> >>> >> >> >> print "start build" >> >>> >> >> >> root = Node(0) >> >>> >> >> >> cur = root >> >>> >> >> >> for x in xrange(1, num_elems): >> >>> >> >> >> e = Node(x) >> >>> >> >> >> cur.next = e >> >>> >> >> >> cur = e >> >>> >> >> >> print "end build %f" % (time.time() - start) >> >>> >> >> >> return root >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> num_timings = 100 >> >>> >> >> >> if __name__ == "__main__": >> >>> >> >> >> num_elems = int(sys.argv[1]) >> >>> >> >> >> build(num_elems) >> >>> >> >> >> total = 0 >> >>> >> >> >> timings = [0.0] * num_timings # run times for the last >> >>> >> >> >> num_timings >> >>> >> >> >> runs >> >>> >> >> >> i = 0 >> >>> >> >> >> beginning = time.time() >> >>> >> >> >> while time.time() - beginning < 600: >> >>> >> >> >> start = time.time() >> >>> >> >> >> elem = allElems[randint(0, num_elems - 1)] >> >>> >> >> >> assert(elem is not None) >> >>> >> >> >> >> >>> >> >> >> lst = follow(elem) >> >>> >> >> >> >> >>> >> >> >> total += choice(lst)[0] # use the return value for >> >>> >> >> >> something >> >>> >> >> >> >> >>> >> >> >> end = time.time() >> >>> >> >> >> >> >>> >> >> >> elapsed = end-start >> >>> >> >> >> timings[i % num_timings] = elapsed >> >>> >> >> >> if (i > num_timings): >> >>> >> >> >> slow_time = 2 * sum(timings)/num_timings # slow >> >>> >> >> >> defined >> >>> >> >> >> as >> >>> >> >> >> > >> >>> >> >> >> 2*avg run time >> >>> >> >> >> if (elapsed > slow_time): >> >>> >> >> >> print "that took a long time elapsed: %f >> >>> >> >> >> slow_threshold: >> >>> >> >> >> %f 90th_quantile_runtime: %f" % \ >> >>> >> >> >> (elapsed, slow_time, >> >>> >> >> >> sorted(timings)[int(num_timings*.9)]) >> >>> >> >> >> i += 1 >> >>> >> >> >> print total >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> On Thu, Mar 13, 2014 at 7:45 PM, Maciej Fijalkowski >> >>> >> >> >> <[email protected]> >> >>> >> >> >> wrote: >> >>> >> >> >>> >> >>> >> >> >>> On Thu, Mar 13, 2014 at 1:45 PM, Martin Koch <[email protected]> >> >>> >> >> >>> wrote: >> >>> >> >> >>> > Hi Armin, Maciej >> >>> >> >> >>> > >> >>> >> >> >>> > Thanks for responding. >> >>> >> >> >>> > >> >>> >> >> >>> > I'm in the process of trying to determine what (if any) of >> >>> >> >> >>> > the >> >>> >> >> >>> > code >> >>> >> >> >>> > I'm >> >>> >> >> >>> > in a >> >>> >> >> >>> > position to share, and I'll get back to you. >> >>> >> >> >>> > >> >>> >> >> >>> > Allowing hinting to the GC would be good. Even better >> >>> >> >> >>> > would be >> >>> >> >> >>> > a >> >>> >> >> >>> > means >> >>> >> >> >>> > to >> >>> >> >> >>> > allow me to (transparently) allocate objects in unmanaged >> >>> >> >> >>> > memory, >> >>> >> >> >>> > but I >> >>> >> >> >>> > would expect that to be a tall order :) >> >>> >> >> >>> > >> >>> >> >> >>> > Thanks, >> >>> >> >> >>> > /Martin >> >>> >> >> >>> >> >>> >> >> >>> Hi Martin. >> >>> >> >> >>> >> >>> >> >> >>> Note that in case you want us to do the work of isolating >> >>> >> >> >>> the >> >>> >> >> >>> problem, >> >>> >> >> >>> we do offer paid support to do that (then we can sign NDAs >> >>> >> >> >>> and >> >>> >> >> >>> stuff). >> >>> >> >> >>> Otherwise we would be more than happy to fix bugs once you >> >>> >> >> >>> isolate >> >>> >> >> >>> a >> >>> >> >> >>> part you can share freely :) >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> >> >> >> > > _______________________________________________ pypy-dev mailing list [email protected] https://mail.python.org/mailman/listinfo/pypy-dev
