Ah. I had misunderstood. I'll get back to you on that :) thanks /Martin
> On 17/03/2014, at 15.21, Maciej Fijalkowski <fij...@gmail.com> wrote: > > eh, this is not what I need > > I need a max of TIME it took for a gc-minor and the TOTAL time it took > for a gc-minor (per query) (ideally same for gc-walkroots and > gc-collect-step) > >> On Mon, Mar 17, 2014 at 4:19 PM, Martin Koch <m...@issuu.com> wrote: >> Here are the collated results of running each query. For each run, I count >> how many of each of the pypy debug lines i get. I.e. there were 668 runs >> that printed 58 loglines that contain "{gc-minor" which was eventually >> followed by "gc-minor}". I have also counted if the query was slow; >> interestingly, not all the queries with many gc-minors were slow (but all >> slow queries had a gc-minor). >> >> Please let me know if this is unclear :) >> >> 668 gc-minor:58 gc-minor-walkroots:58 >> 10 gc-minor:58 gc-minor-walkroots:58 gc-collect-step:5 >> 140 gc-minor:59 gc-minor-walkroots:59 >> 1 gc-minor:8441 gc-minor-walkroots:8441 gc-collect-step:8403 >> 1 gc-minor:9300 gc-minor-walkroots:9300 gc-collect-step:9249 >> 9 gc-minor:9643 slow:1 gc-minor-walkroots:9643 gc-collect-step:9589 >> 1 gc-minor:9644 slow:1 gc-minor-walkroots:9644 gc-collect-step:9590 >> 10 gc-minor:9647 slow:1 gc-minor-walkroots:9647 gc-collect-step:9609 >> 1 gc-minor:9663 gc-minor-walkroots:9663 gc-collect-step:9614 >> 1 jit-backend-dump:5 gc-minor:58 gc-minor-walkroots:58 >> 1 jit-log-compiling-loop:1 gc-collect-step:8991 jit-backend-dump:78 >> jit-backend:3 jit-log-noopt-loop:6 jit-log-virtualstate:3 gc-minor:9030 >> jit-tracing:3 gc-minor-walkroots:9030 jit-optimize:6 >> jit-log-short-preamble:2 jit-backend-addr:3 jit-log-opt-loop:1 >> jit-mem-looptoken-alloc:3 jit-abort:3 jit-log-rewritten-bridge:2 >> jit-log-rewritten-loop:1 jit-log-opt-bridge:2 jit-log-compiling-bridge:2 >> jit-resume:84 >> 1 jit-log-compiling-loop:1 jit-backend-dump:13 jit-backend:1 >> jit-log-noopt-loop:2 gc-minor:60 jit-tracing:1 gc-minor-walkroots:60 >> jit-optimize:2 jit-log-short-preamble:1 jit-backend-addr:1 >> jit-log-opt-loop:1 jit-mem-looptoken-alloc:1 jit-log-rewritten-loop:1 >> jit-resume:14 >> 1 jit-log-compiling-loop:1 jit-backend-dump:73 jit-backend:3 >> jit-log-noopt-loop:6 jit-log-virtualstate:3 gc-minor:60 jit-tracing:3 >> gc-minor-walkroots:60 jit-optimize:6 jit-log-short-preamble:2 >> jit-backend-addr:3 jit-log-opt-loop:1 jit-mem-looptoken-alloc:3 jit-abort:3 >> jit-log-rewritten-bridge:2 jit-log-rewritten-loop:1 jit-log-opt-bridge:2 >> jit-log-compiling-bridge:2 jit-resume:84 >> 2 jit-log-compiling-loop:1 jit-backend-dump:78 jit-backend:3 >> jit-log-noopt-loop:6 jit-log-virtualstate:3 gc-minor:61 jit-tracing:3 >> gc-minor-walkroots:61 jit-optimize:6 jit-log-short-preamble:2 >> jit-backend-addr:3 jit-log-opt-loop:1 jit-mem-looptoken-alloc:3 jit-abort:3 >> jit-log-rewritten-bridge:2 jit-log-rewritten-loop:1 jit-log-opt-bridge:2 >> jit-log-compiling-bridge:2 jit-resume:84 >> 1 jit-log-short-preamble:2 jit-log-compiling-loop:2 >> jit-backend-dump:92 jit-log-noopt-loop:7 jit-log-virtualstate:3 gc-minor:61 >> jit-tracing:4 gc-minor-walkroots:61 jit-optimize:7 jit-backend:4 >> jit-backend-addr:4 jit-log-opt-loop:2 jit-mem-looptoken-alloc:4 jit-abort:3 >> jit-log-rewritten-bridge:2 jit-log-rewritten-loop:2 jit-log-opt-bridge:2 >> jit-log-compiling-bridge:2 jit-resume:104 >> >> >> Thanks, >> /Martin >> >> >> >> On Mon, Mar 17, 2014 at 2:23 PM, Maciej Fijalkowski <fij...@gmail.com> >> wrote: >>> >>> On Mon, Mar 17, 2014 at 3:20 PM, Maciej Fijalkowski <fij...@gmail.com> >>> wrote: >>>> are you *sure* it's the walkroots that take that long and not >>>> something else (like gc-minor)? More of those mean that you allocate a >>>> lot more surviving objects. Can you do two things: >>>> >>>> a) take a max of gc-minor (and gc-minor-stackwalk), per request >>>> b) take the sum of those >>>> >>>> and plot them >>> >>> ^^^ or just paste the results actually >>> >>>> >>>>> On Mon, Mar 17, 2014 at 3:18 PM, Martin Koch <m...@issuu.com> wrote: >>>>> Well, then it works out to around 2.5GHz, which seems reasonable. But >>>>> it >>>>> doesn't alter the conclusion from the previous email: The slow queries >>>>> then >>>>> all have a duration around 34*10^9 units, 'normal' queries 1*10^9 >>>>> units, or >>>>> .4 seconds at this conversion. Also, the log shows that a slow query >>>>> performs many more gc-minor operations than a 'normal' one: 9600 >>>>> gc-collect-step/gc-minor/gc-minor-walkroots operations vs 58. >>>>> >>>>> So the question becomes: Why do we get this large spike in >>>>> gc-minor-walkroots, and, in particular, is there any way to avoid it :) >>>>> ? >>>>> >>>>> Thanks, >>>>> /Martin >>>>> >>>>> >>>>> On Mon, Mar 17, 2014 at 1:53 PM, Maciej Fijalkowski <fij...@gmail.com> >>>>> wrote: >>>>>> >>>>>> I think it's the cycles of your CPU >>>>>> >>>>>>> On Mon, Mar 17, 2014 at 2:48 PM, Martin Koch <m...@issuu.com> wrote: >>>>>>> What is the unit? Perhaps I'm being thick here, but I can't >>>>>>> correlate it >>>>>>> with seconds (which the program does print out). Slow runs are >>>>>>> around 13 >>>>>>> seconds, but are around 34*10^9(dec), 0x800000000 timestamp units >>>>>>> (e.g. >>>>>>> from >>>>>>> 0x2b994c9d31889c to 0x2b9944ab8c4f49). >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 17, 2014 at 12:09 PM, Maciej Fijalkowski >>>>>>> <fij...@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> The number of lines is nonsense. This is a timestamp in hex. >>>>>>>> >>>>>>>> On Mon, Mar 17, 2014 at 12:46 PM, Martin Koch <m...@issuu.com> >>>>>>>> wrote: >>>>>>>>> Based On Maciej's suggestion, I tried the following >>>>>>>>> >>>>>>>>> PYPYLOG=- pypy mem.py 10000000 > out >>>>>>>>> >>>>>>>>> This generates a logfile which looks something like this >>>>>>>>> >>>>>>>>> start--> >>>>>>>>> [2b99f1981b527e] {gc-minor >>>>>>>>> [2b99f1981ba680] {gc-minor-walkroots >>>>>>>>> [2b99f1981c2e02] gc-minor-walkroots} >>>>>>>>> [2b99f19890d750] gc-minor} >>>>>>>>> [snip] >>>>>>>>> ... >>>>>>>>> <--stop >>>>>>>>> >>>>>>>>> >>>>>>>>> It turns out that the culprit is a lot of MINOR collections. >>>>>>>>> >>>>>>>>> I base this on the following observations: >>>>>>>>> >>>>>>>>> I can't understand the format of the timestamp on each logline >>>>>>>>> (the >>>>>>>>> "[2b99f1981b527e]"). From what I can see in the code, this should >>>>>>>>> be >>>>>>>>> output >>>>>>>>> from time.clock(), but that doesn't return a number like that >>>>>>>>> when I >>>>>>>>> run >>>>>>>>> pypy interactively >>>>>>>>> Instead, I count the number of debug lines between start--> and >>>>>>>>> the >>>>>>>>> corresponding <--stop. >>>>>>>>> Most runs have a few hundred lines of output between start/stop >>>>>>>>> All slow runs have very close to 57800 lines out output between >>>>>>>>> start/stop >>>>>>>>> One such sample does 9609 gc-collect-step operations, 9647 >>>>>>>>> gc-minor >>>>>>>>> operations, and 9647 gc-minor-walkroots operations. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> /Martin >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Mar 17, 2014 at 8:21 AM, Maciej Fijalkowski >>>>>>>>> <fij...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> there is an environment variable PYPYLOG=gc:- (where - is >>>>>>>>>> stdout) >>>>>>>>>> which will do that for you btw. >>>>>>>>>> >>>>>>>>>> maybe you can find out what's that using profiling or valgrind? >>>>>>>>>> >>>>>>>>>> On Sun, Mar 16, 2014 at 11:34 PM, Martin Koch <m...@issuu.com> >>>>>>>>>> wrote: >>>>>>>>>>> I have tried getting the pypy source and building my own >>>>>>>>>>> version >>>>>>>>>>> of >>>>>>>>>>> pypy. I >>>>>>>>>>> have modified >>>>>>>>>>> rpython/memory/gc/incminimark.py:major_collection_step() >>>>>>>>>>> to >>>>>>>>>>> print out when it starts and when it stops. Apparently, the >>>>>>>>>>> slow >>>>>>>>>>> queries >>>>>>>>>>> do >>>>>>>>>>> NOT occur during major_collection_step; at least, I have not >>>>>>>>>>> observed >>>>>>>>>>> major >>>>>>>>>>> step output during a query execution. So, apparently, >>>>>>>>>>> something >>>>>>>>>>> else >>>>>>>>>>> is >>>>>>>>>>> blocking. This could be another aspect of the GC, but it could >>>>>>>>>>> also >>>>>>>>>>> be >>>>>>>>>>> anything else. >>>>>>>>>>> >>>>>>>>>>> Just to be sure, I have tried running the same application in >>>>>>>>>>> python >>>>>>>>>>> with >>>>>>>>>>> garbage collection disabled. I don't see the problem there, so >>>>>>>>>>> it >>>>>>>>>>> is >>>>>>>>>>> somehow >>>>>>>>>>> related to either GC or the runtime somehow. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> /Martin >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Mar 14, 2014 at 4:19 PM, Martin Koch <m...@issuu.com> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> We have hacked up a small sample that seems to exhibit the >>>>>>>>>>>> same >>>>>>>>>>>> issue. >>>>>>>>>>>> >>>>>>>>>>>> We basically generate a linked list of objects. To increase >>>>>>>>>>>> connectedness, >>>>>>>>>>>> elements in the list hold references (dummy_links) to 10 >>>>>>>>>>>> randomly >>>>>>>>>>>> chosen >>>>>>>>>>>> previous elements in the list. >>>>>>>>>>>> >>>>>>>>>>>> We then time a function that traverses 50000 elements from >>>>>>>>>>>> the >>>>>>>>>>>> list >>>>>>>>>>>> from a >>>>>>>>>>>> random start point. If the traversal reaches the end of the >>>>>>>>>>>> list, >>>>>>>>>>>> we >>>>>>>>>>>> instead >>>>>>>>>>>> traverse one of the dummy links. Thus, exactly 50K elements >>>>>>>>>>>> are >>>>>>>>>>>> traversed >>>>>>>>>>>> every time. To generate some garbage, we build a list holding >>>>>>>>>>>> the >>>>>>>>>>>> traversed >>>>>>>>>>>> elements and a dummy list of characters. >>>>>>>>>>>> >>>>>>>>>>>> Timings for the last 100 runs are stored in a circular >>>>>>>>>>>> buffer. If >>>>>>>>>>>> the >>>>>>>>>>>> elapsed time for the last run is more than twice the average >>>>>>>>>>>> time, >>>>>>>>>>>> we >>>>>>>>>>>> print >>>>>>>>>>>> out a line with the elapsed time, the threshold, and the 90% >>>>>>>>>>>> runtime >>>>>>>>>>>> (we >>>>>>>>>>>> would like to see that the mean runtime does not increase >>>>>>>>>>>> with >>>>>>>>>>>> the >>>>>>>>>>>> number of >>>>>>>>>>>> elements in the list, but that the max time does increase >>>>>>>>>>>> (linearly >>>>>>>>>>>> with the >>>>>>>>>>>> number of object, i guess); traversing 50K elements should be >>>>>>>>>>>> independent of >>>>>>>>>>>> the memory size). >>>>>>>>>>>> >>>>>>>>>>>> We have tried monitoring memory consumption by external >>>>>>>>>>>> inspection, >>>>>>>>>>>> but >>>>>>>>>>>> cannot consistently verify that memory is deallocated at the >>>>>>>>>>>> same >>>>>>>>>>>> time >>>>>>>>>>>> that >>>>>>>>>>>> we see slow requests. Perhaps the pypy runtime doesn't always >>>>>>>>>>>> return >>>>>>>>>>>> freed >>>>>>>>>>>> pages back to the OS? >>>>>>>>>>>> >>>>>>>>>>>> Using top, we observe that 10M elements allocates around 17GB >>>>>>>>>>>> after >>>>>>>>>>>> building, 20M elements 26GB, 30M elements 28GB (and grows to >>>>>>>>>>>> 35GB >>>>>>>>>>>> shortly >>>>>>>>>>>> after building). >>>>>>>>>>>> >>>>>>>>>>>> Here is output from a few runs with different number of >>>>>>>>>>>> elements: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> pypy mem.py 10000000 >>>>>>>>>>>> start build >>>>>>>>>>>> end build 84.142424 >>>>>>>>>>>> that took a long time elapsed: 13.230586 slow_threshold: >>>>>>>>>>>> 1.495401 >>>>>>>>>>>> 90th_quantile_runtime: 0.421558 >>>>>>>>>>>> that took a long time elapsed: 13.016531 slow_threshold: >>>>>>>>>>>> 1.488160 >>>>>>>>>>>> 90th_quantile_runtime: 0.423441 >>>>>>>>>>>> that took a long time elapsed: 13.032537 slow_threshold: >>>>>>>>>>>> 1.474563 >>>>>>>>>>>> 90th_quantile_runtime: 0.419817 >>>>>>>>>>>> >>>>>>>>>>>> pypy mem.py 20000000 >>>>>>>>>>>> start build >>>>>>>>>>>> end build 180.823105 >>>>>>>>>>>> that took a long time elapsed: 27.346064 slow_threshold: >>>>>>>>>>>> 2.295146 >>>>>>>>>>>> 90th_quantile_runtime: 0.434726 >>>>>>>>>>>> that took a long time elapsed: 26.028852 slow_threshold: >>>>>>>>>>>> 2.283927 >>>>>>>>>>>> 90th_quantile_runtime: 0.374190 >>>>>>>>>>>> that took a long time elapsed: 25.432279 slow_threshold: >>>>>>>>>>>> 2.279631 >>>>>>>>>>>> 90th_quantile_runtime: 0.371502 >>>>>>>>>>>> >>>>>>>>>>>> pypy mem.py 30000000 >>>>>>>>>>>> start build >>>>>>>>>>>> end build 276.217811 >>>>>>>>>>>> that took a long time elapsed: 40.993855 slow_threshold: >>>>>>>>>>>> 3.188464 >>>>>>>>>>>> 90th_quantile_runtime: 0.459891 >>>>>>>>>>>> that took a long time elapsed: 41.693553 slow_threshold: >>>>>>>>>>>> 3.183003 >>>>>>>>>>>> 90th_quantile_runtime: 0.393654 >>>>>>>>>>>> that took a long time elapsed: 39.679769 slow_threshold: >>>>>>>>>>>> 3.190782 >>>>>>>>>>>> 90th_quantile_runtime: 0.393677 >>>>>>>>>>>> that took a long time elapsed: 43.573411 slow_threshold: >>>>>>>>>>>> 3.239637 >>>>>>>>>>>> 90th_quantile_runtime: 0.393654 >>>>>>>>>>>> >>>>>>>>>>>> Code below >>>>>>>>>>>> >>>>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>>>> import time >>>>>>>>>>>> from random import randint, choice >>>>>>>>>>>> import sys >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> allElems = {} >>>>>>>>>>>> >>>>>>>>>>>> class Node: >>>>>>>>>>>> def __init__(self, v_): >>>>>>>>>>>> self.v = v_ >>>>>>>>>>>> self.next = None >>>>>>>>>>>> self.dummy_data = [randint(0,100) >>>>>>>>>>>> for _ in xrange(randint(50,100))] >>>>>>>>>>>> allElems[self.v] = self >>>>>>>>>>>> if self.v > 0: >>>>>>>>>>>> self.dummy_links = [allElems[randint(0, >>>>>>>>>>>> self.v-1)] >>>>>>>>>>>> for _ >>>>>>>>>>>> in >>>>>>>>>>>> xrange(10)] >>>>>>>>>>>> else: >>>>>>>>>>>> self.dummy_links = [self] >>>>>>>>>>>> >>>>>>>>>>>> def set_next(self, l): >>>>>>>>>>>> self.next = l >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> def follow(node): >>>>>>>>>>>> acc = [] >>>>>>>>>>>> count = 0 >>>>>>>>>>>> cur = node >>>>>>>>>>>> assert node.v is not None >>>>>>>>>>>> assert cur is not None >>>>>>>>>>>> while count < 50000: >>>>>>>>>>>> # return a value; generate some garbage >>>>>>>>>>>> acc.append((cur.v, >>>>>>>>>>>> [choice("abcdefghijklmnopqrstuvwxyz") >>>>>>>>>>>> for >>>>>>>>>>>> x >>>>>>>>>>>> in >>>>>>>>>>>> xrange(100)])) >>>>>>>>>>>> >>>>>>>>>>>> # if we have reached the end, chose a random link >>>>>>>>>>>> cur = choice(cur.dummy_links) if cur.next is None >>>>>>>>>>>> else >>>>>>>>>>>> cur.next >>>>>>>>>>>> count += 1 >>>>>>>>>>>> >>>>>>>>>>>> return acc >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> def build(num_elems): >>>>>>>>>>>> start = time.time() >>>>>>>>>>>> print "start build" >>>>>>>>>>>> root = Node(0) >>>>>>>>>>>> cur = root >>>>>>>>>>>> for x in xrange(1, num_elems): >>>>>>>>>>>> e = Node(x) >>>>>>>>>>>> cur.next = e >>>>>>>>>>>> cur = e >>>>>>>>>>>> print "end build %f" % (time.time() - start) >>>>>>>>>>>> return root >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> num_timings = 100 >>>>>>>>>>>> if __name__ == "__main__": >>>>>>>>>>>> num_elems = int(sys.argv[1]) >>>>>>>>>>>> build(num_elems) >>>>>>>>>>>> total = 0 >>>>>>>>>>>> timings = [0.0] * num_timings # run times for the last >>>>>>>>>>>> num_timings >>>>>>>>>>>> runs >>>>>>>>>>>> i = 0 >>>>>>>>>>>> beginning = time.time() >>>>>>>>>>>> while time.time() - beginning < 600: >>>>>>>>>>>> start = time.time() >>>>>>>>>>>> elem = allElems[randint(0, num_elems - 1)] >>>>>>>>>>>> assert(elem is not None) >>>>>>>>>>>> >>>>>>>>>>>> lst = follow(elem) >>>>>>>>>>>> >>>>>>>>>>>> total += choice(lst)[0] # use the return value for >>>>>>>>>>>> something >>>>>>>>>>>> >>>>>>>>>>>> end = time.time() >>>>>>>>>>>> >>>>>>>>>>>> elapsed = end-start >>>>>>>>>>>> timings[i % num_timings] = elapsed >>>>>>>>>>>> if (i > num_timings): >>>>>>>>>>>> slow_time = 2 * sum(timings)/num_timings # slow >>>>>>>>>>>> defined >>>>>>>>>>>> as >>>>>>>>>>>>> >>>>>>>>>>>> 2*avg run time >>>>>>>>>>>> if (elapsed > slow_time): >>>>>>>>>>>> print "that took a long time elapsed: %f >>>>>>>>>>>> slow_threshold: >>>>>>>>>>>> %f 90th_quantile_runtime: %f" % \ >>>>>>>>>>>> (elapsed, slow_time, >>>>>>>>>>>> sorted(timings)[int(num_timings*.9)]) >>>>>>>>>>>> i += 1 >>>>>>>>>>>> print total >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Mar 13, 2014 at 7:45 PM, Maciej Fijalkowski >>>>>>>>>>>> <fij...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Mar 13, 2014 at 1:45 PM, Martin Koch <m...@issuu.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> Hi Armin, Maciej >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for responding. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm in the process of trying to determine what (if any) of >>>>>>>>>>>>>> the >>>>>>>>>>>>>> code >>>>>>>>>>>>>> I'm >>>>>>>>>>>>>> in a >>>>>>>>>>>>>> position to share, and I'll get back to you. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Allowing hinting to the GC would be good. Even better >>>>>>>>>>>>>> would be >>>>>>>>>>>>>> a >>>>>>>>>>>>>> means >>>>>>>>>>>>>> to >>>>>>>>>>>>>> allow me to (transparently) allocate objects in unmanaged >>>>>>>>>>>>>> memory, >>>>>>>>>>>>>> but I >>>>>>>>>>>>>> would expect that to be a tall order :) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> /Martin >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Martin. >>>>>>>>>>>>> >>>>>>>>>>>>> Note that in case you want us to do the work of isolating >>>>>>>>>>>>> the >>>>>>>>>>>>> problem, >>>>>>>>>>>>> we do offer paid support to do that (then we can sign NDAs >>>>>>>>>>>>> and >>>>>>>>>>>>> stuff). >>>>>>>>>>>>> Otherwise we would be more than happy to fix bugs once you >>>>>>>>>>>>> isolate >>>>>>>>>>>>> a >>>>>>>>>>>>> part you can share freely :) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >> >> _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev