On Mon, Mar 17, 2014 at 3:20 PM, Maciej Fijalkowski <[email protected]> wrote: > are you *sure* it's the walkroots that take that long and not > something else (like gc-minor)? More of those mean that you allocate a > lot more surviving objects. Can you do two things: > > a) take a max of gc-minor (and gc-minor-stackwalk), per request > b) take the sum of those > > and plot them
^^^ or just paste the results actually > > On Mon, Mar 17, 2014 at 3:18 PM, Martin Koch <[email protected]> wrote: >> Well, then it works out to around 2.5GHz, which seems reasonable. But it >> doesn't alter the conclusion from the previous email: The slow queries then >> all have a duration around 34*10^9 units, 'normal' queries 1*10^9 units, or >> .4 seconds at this conversion. Also, the log shows that a slow query >> performs many more gc-minor operations than a 'normal' one: 9600 >> gc-collect-step/gc-minor/gc-minor-walkroots operations vs 58. >> >> So the question becomes: Why do we get this large spike in >> gc-minor-walkroots, and, in particular, is there any way to avoid it :) ? >> >> Thanks, >> /Martin >> >> >> On Mon, Mar 17, 2014 at 1:53 PM, Maciej Fijalkowski <[email protected]> >> wrote: >>> >>> I think it's the cycles of your CPU >>> >>> On Mon, Mar 17, 2014 at 2:48 PM, Martin Koch <[email protected]> wrote: >>> > What is the unit? Perhaps I'm being thick here, but I can't correlate it >>> > with seconds (which the program does print out). Slow runs are around 13 >>> > seconds, but are around 34*10^9(dec), 0x800000000 timestamp units (e.g. >>> > from >>> > 0x2b994c9d31889c to 0x2b9944ab8c4f49). >>> > >>> > >>> > >>> > On Mon, Mar 17, 2014 at 12:09 PM, Maciej Fijalkowski <[email protected]> >>> > wrote: >>> >> >>> >> The number of lines is nonsense. This is a timestamp in hex. >>> >> >>> >> On Mon, Mar 17, 2014 at 12:46 PM, Martin Koch <[email protected]> wrote: >>> >> > Based On Maciej's suggestion, I tried the following >>> >> > >>> >> > PYPYLOG=- pypy mem.py 10000000 > out >>> >> > >>> >> > This generates a logfile which looks something like this >>> >> > >>> >> > start--> >>> >> > [2b99f1981b527e] {gc-minor >>> >> > [2b99f1981ba680] {gc-minor-walkroots >>> >> > [2b99f1981c2e02] gc-minor-walkroots} >>> >> > [2b99f19890d750] gc-minor} >>> >> > [snip] >>> >> > ... >>> >> > <--stop >>> >> > >>> >> > >>> >> > It turns out that the culprit is a lot of MINOR collections. >>> >> > >>> >> > I base this on the following observations: >>> >> > >>> >> > I can't understand the format of the timestamp on each logline (the >>> >> > "[2b99f1981b527e]"). From what I can see in the code, this should be >>> >> > output >>> >> > from time.clock(), but that doesn't return a number like that when I >>> >> > run >>> >> > pypy interactively >>> >> > Instead, I count the number of debug lines between start--> and the >>> >> > corresponding <--stop. >>> >> > Most runs have a few hundred lines of output between start/stop >>> >> > All slow runs have very close to 57800 lines out output between >>> >> > start/stop >>> >> > One such sample does 9609 gc-collect-step operations, 9647 gc-minor >>> >> > operations, and 9647 gc-minor-walkroots operations. >>> >> > >>> >> > >>> >> > Thanks, >>> >> > /Martin >>> >> > >>> >> > >>> >> > On Mon, Mar 17, 2014 at 8:21 AM, Maciej Fijalkowski >>> >> > <[email protected]> >>> >> > wrote: >>> >> >> >>> >> >> there is an environment variable PYPYLOG=gc:- (where - is stdout) >>> >> >> which will do that for you btw. >>> >> >> >>> >> >> maybe you can find out what's that using profiling or valgrind? >>> >> >> >>> >> >> On Sun, Mar 16, 2014 at 11:34 PM, Martin Koch <[email protected]> wrote: >>> >> >> > I have tried getting the pypy source and building my own version >>> >> >> > of >>> >> >> > pypy. I >>> >> >> > have modified >>> >> >> > rpython/memory/gc/incminimark.py:major_collection_step() >>> >> >> > to >>> >> >> > print out when it starts and when it stops. Apparently, the slow >>> >> >> > queries >>> >> >> > do >>> >> >> > NOT occur during major_collection_step; at least, I have not >>> >> >> > observed >>> >> >> > major >>> >> >> > step output during a query execution. So, apparently, something >>> >> >> > else >>> >> >> > is >>> >> >> > blocking. This could be another aspect of the GC, but it could >>> >> >> > also >>> >> >> > be >>> >> >> > anything else. >>> >> >> > >>> >> >> > Just to be sure, I have tried running the same application in >>> >> >> > python >>> >> >> > with >>> >> >> > garbage collection disabled. I don't see the problem there, so it >>> >> >> > is >>> >> >> > somehow >>> >> >> > related to either GC or the runtime somehow. >>> >> >> > >>> >> >> > Cheers, >>> >> >> > /Martin >>> >> >> > >>> >> >> > >>> >> >> > On Fri, Mar 14, 2014 at 4:19 PM, Martin Koch <[email protected]> >>> >> >> > wrote: >>> >> >> >> >>> >> >> >> We have hacked up a small sample that seems to exhibit the same >>> >> >> >> issue. >>> >> >> >> >>> >> >> >> We basically generate a linked list of objects. To increase >>> >> >> >> connectedness, >>> >> >> >> elements in the list hold references (dummy_links) to 10 randomly >>> >> >> >> chosen >>> >> >> >> previous elements in the list. >>> >> >> >> >>> >> >> >> We then time a function that traverses 50000 elements from the >>> >> >> >> list >>> >> >> >> from a >>> >> >> >> random start point. If the traversal reaches the end of the list, >>> >> >> >> we >>> >> >> >> instead >>> >> >> >> traverse one of the dummy links. Thus, exactly 50K elements are >>> >> >> >> traversed >>> >> >> >> every time. To generate some garbage, we build a list holding the >>> >> >> >> traversed >>> >> >> >> elements and a dummy list of characters. >>> >> >> >> >>> >> >> >> Timings for the last 100 runs are stored in a circular buffer. If >>> >> >> >> the >>> >> >> >> elapsed time for the last run is more than twice the average >>> >> >> >> time, >>> >> >> >> we >>> >> >> >> print >>> >> >> >> out a line with the elapsed time, the threshold, and the 90% >>> >> >> >> runtime >>> >> >> >> (we >>> >> >> >> would like to see that the mean runtime does not increase with >>> >> >> >> the >>> >> >> >> number of >>> >> >> >> elements in the list, but that the max time does increase >>> >> >> >> (linearly >>> >> >> >> with the >>> >> >> >> number of object, i guess); traversing 50K elements should be >>> >> >> >> independent of >>> >> >> >> the memory size). >>> >> >> >> >>> >> >> >> We have tried monitoring memory consumption by external >>> >> >> >> inspection, >>> >> >> >> but >>> >> >> >> cannot consistently verify that memory is deallocated at the same >>> >> >> >> time >>> >> >> >> that >>> >> >> >> we see slow requests. Perhaps the pypy runtime doesn't always >>> >> >> >> return >>> >> >> >> freed >>> >> >> >> pages back to the OS? >>> >> >> >> >>> >> >> >> Using top, we observe that 10M elements allocates around 17GB >>> >> >> >> after >>> >> >> >> building, 20M elements 26GB, 30M elements 28GB (and grows to 35GB >>> >> >> >> shortly >>> >> >> >> after building). >>> >> >> >> >>> >> >> >> Here is output from a few runs with different number of elements: >>> >> >> >> >>> >> >> >> >>> >> >> >> pypy mem.py 10000000 >>> >> >> >> start build >>> >> >> >> end build 84.142424 >>> >> >> >> that took a long time elapsed: 13.230586 slow_threshold: >>> >> >> >> 1.495401 >>> >> >> >> 90th_quantile_runtime: 0.421558 >>> >> >> >> that took a long time elapsed: 13.016531 slow_threshold: >>> >> >> >> 1.488160 >>> >> >> >> 90th_quantile_runtime: 0.423441 >>> >> >> >> that took a long time elapsed: 13.032537 slow_threshold: >>> >> >> >> 1.474563 >>> >> >> >> 90th_quantile_runtime: 0.419817 >>> >> >> >> >>> >> >> >> pypy mem.py 20000000 >>> >> >> >> start build >>> >> >> >> end build 180.823105 >>> >> >> >> that took a long time elapsed: 27.346064 slow_threshold: >>> >> >> >> 2.295146 >>> >> >> >> 90th_quantile_runtime: 0.434726 >>> >> >> >> that took a long time elapsed: 26.028852 slow_threshold: >>> >> >> >> 2.283927 >>> >> >> >> 90th_quantile_runtime: 0.374190 >>> >> >> >> that took a long time elapsed: 25.432279 slow_threshold: >>> >> >> >> 2.279631 >>> >> >> >> 90th_quantile_runtime: 0.371502 >>> >> >> >> >>> >> >> >> pypy mem.py 30000000 >>> >> >> >> start build >>> >> >> >> end build 276.217811 >>> >> >> >> that took a long time elapsed: 40.993855 slow_threshold: >>> >> >> >> 3.188464 >>> >> >> >> 90th_quantile_runtime: 0.459891 >>> >> >> >> that took a long time elapsed: 41.693553 slow_threshold: >>> >> >> >> 3.183003 >>> >> >> >> 90th_quantile_runtime: 0.393654 >>> >> >> >> that took a long time elapsed: 39.679769 slow_threshold: >>> >> >> >> 3.190782 >>> >> >> >> 90th_quantile_runtime: 0.393677 >>> >> >> >> that took a long time elapsed: 43.573411 slow_threshold: >>> >> >> >> 3.239637 >>> >> >> >> 90th_quantile_runtime: 0.393654 >>> >> >> >> >>> >> >> >> Code below >>> >> >> >> -------------------------------------------------------------- >>> >> >> >> import time >>> >> >> >> from random import randint, choice >>> >> >> >> import sys >>> >> >> >> >>> >> >> >> >>> >> >> >> allElems = {} >>> >> >> >> >>> >> >> >> class Node: >>> >> >> >> def __init__(self, v_): >>> >> >> >> self.v = v_ >>> >> >> >> self.next = None >>> >> >> >> self.dummy_data = [randint(0,100) >>> >> >> >> for _ in xrange(randint(50,100))] >>> >> >> >> allElems[self.v] = self >>> >> >> >> if self.v > 0: >>> >> >> >> self.dummy_links = [allElems[randint(0, self.v-1)] >>> >> >> >> for _ >>> >> >> >> in >>> >> >> >> xrange(10)] >>> >> >> >> else: >>> >> >> >> self.dummy_links = [self] >>> >> >> >> >>> >> >> >> def set_next(self, l): >>> >> >> >> self.next = l >>> >> >> >> >>> >> >> >> >>> >> >> >> def follow(node): >>> >> >> >> acc = [] >>> >> >> >> count = 0 >>> >> >> >> cur = node >>> >> >> >> assert node.v is not None >>> >> >> >> assert cur is not None >>> >> >> >> while count < 50000: >>> >> >> >> # return a value; generate some garbage >>> >> >> >> acc.append((cur.v, [choice("abcdefghijklmnopqrstuvwxyz") >>> >> >> >> for >>> >> >> >> x >>> >> >> >> in >>> >> >> >> xrange(100)])) >>> >> >> >> >>> >> >> >> # if we have reached the end, chose a random link >>> >> >> >> cur = choice(cur.dummy_links) if cur.next is None else >>> >> >> >> cur.next >>> >> >> >> count += 1 >>> >> >> >> >>> >> >> >> return acc >>> >> >> >> >>> >> >> >> >>> >> >> >> def build(num_elems): >>> >> >> >> start = time.time() >>> >> >> >> print "start build" >>> >> >> >> root = Node(0) >>> >> >> >> cur = root >>> >> >> >> for x in xrange(1, num_elems): >>> >> >> >> e = Node(x) >>> >> >> >> cur.next = e >>> >> >> >> cur = e >>> >> >> >> print "end build %f" % (time.time() - start) >>> >> >> >> return root >>> >> >> >> >>> >> >> >> >>> >> >> >> num_timings = 100 >>> >> >> >> if __name__ == "__main__": >>> >> >> >> num_elems = int(sys.argv[1]) >>> >> >> >> build(num_elems) >>> >> >> >> total = 0 >>> >> >> >> timings = [0.0] * num_timings # run times for the last >>> >> >> >> num_timings >>> >> >> >> runs >>> >> >> >> i = 0 >>> >> >> >> beginning = time.time() >>> >> >> >> while time.time() - beginning < 600: >>> >> >> >> start = time.time() >>> >> >> >> elem = allElems[randint(0, num_elems - 1)] >>> >> >> >> assert(elem is not None) >>> >> >> >> >>> >> >> >> lst = follow(elem) >>> >> >> >> >>> >> >> >> total += choice(lst)[0] # use the return value for >>> >> >> >> something >>> >> >> >> >>> >> >> >> end = time.time() >>> >> >> >> >>> >> >> >> elapsed = end-start >>> >> >> >> timings[i % num_timings] = elapsed >>> >> >> >> if (i > num_timings): >>> >> >> >> slow_time = 2 * sum(timings)/num_timings # slow >>> >> >> >> defined >>> >> >> >> as >>> >> >> >> > >>> >> >> >> 2*avg run time >>> >> >> >> if (elapsed > slow_time): >>> >> >> >> print "that took a long time elapsed: %f >>> >> >> >> slow_threshold: >>> >> >> >> %f 90th_quantile_runtime: %f" % \ >>> >> >> >> (elapsed, slow_time, >>> >> >> >> sorted(timings)[int(num_timings*.9)]) >>> >> >> >> i += 1 >>> >> >> >> print total >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> On Thu, Mar 13, 2014 at 7:45 PM, Maciej Fijalkowski >>> >> >> >> <[email protected]> >>> >> >> >> wrote: >>> >> >> >>> >>> >> >> >>> On Thu, Mar 13, 2014 at 1:45 PM, Martin Koch <[email protected]> >>> >> >> >>> wrote: >>> >> >> >>> > Hi Armin, Maciej >>> >> >> >>> > >>> >> >> >>> > Thanks for responding. >>> >> >> >>> > >>> >> >> >>> > I'm in the process of trying to determine what (if any) of the >>> >> >> >>> > code >>> >> >> >>> > I'm >>> >> >> >>> > in a >>> >> >> >>> > position to share, and I'll get back to you. >>> >> >> >>> > >>> >> >> >>> > Allowing hinting to the GC would be good. Even better would be >>> >> >> >>> > a >>> >> >> >>> > means >>> >> >> >>> > to >>> >> >> >>> > allow me to (transparently) allocate objects in unmanaged >>> >> >> >>> > memory, >>> >> >> >>> > but I >>> >> >> >>> > would expect that to be a tall order :) >>> >> >> >>> > >>> >> >> >>> > Thanks, >>> >> >> >>> > /Martin >>> >> >> >>> >>> >> >> >>> Hi Martin. >>> >> >> >>> >>> >> >> >>> Note that in case you want us to do the work of isolating the >>> >> >> >>> problem, >>> >> >> >>> we do offer paid support to do that (then we can sign NDAs and >>> >> >> >>> stuff). >>> >> >> >>> Otherwise we would be more than happy to fix bugs once you >>> >> >> >>> isolate >>> >> >> >>> a >>> >> >> >>> part you can share freely :) >>> >> >> >> >>> >> >> >> >>> >> >> > >>> >> > >>> >> > >>> > >>> > >> >> _______________________________________________ pypy-dev mailing list [email protected] https://mail.python.org/mailman/listinfo/pypy-dev
