What is the unit? Perhaps I'm being thick here, but I can't correlate it with seconds (which the program does print out). Slow runs are around 13 seconds, but are around 34*10^9(dec), 0x800000000 timestamp units (e.g. from 0x2b994c9d31889c to 0x2b9944ab8c4f49).
On Mon, Mar 17, 2014 at 12:09 PM, Maciej Fijalkowski <[email protected]>wrote: > The number of lines is nonsense. This is a timestamp in hex. > > On Mon, Mar 17, 2014 at 12:46 PM, Martin Koch <[email protected]> wrote: > > Based On Maciej's suggestion, I tried the following > > > > PYPYLOG=- pypy mem.py 10000000 > out > > > > This generates a logfile which looks something like this > > > > start--> > > [2b99f1981b527e] {gc-minor > > [2b99f1981ba680] {gc-minor-walkroots > > [2b99f1981c2e02] gc-minor-walkroots} > > [2b99f19890d750] gc-minor} > > [snip] > > ... > > <--stop > > > > > > It turns out that the culprit is a lot of MINOR collections. > > > > I base this on the following observations: > > > > I can't understand the format of the timestamp on each logline (the > > "[2b99f1981b527e]"). From what I can see in the code, this should be > output > > from time.clock(), but that doesn't return a number like that when I run > > pypy interactively > > Instead, I count the number of debug lines between start--> and the > > corresponding <--stop. > > Most runs have a few hundred lines of output between start/stop > > All slow runs have very close to 57800 lines out output between > start/stop > > One such sample does 9609 gc-collect-step operations, 9647 gc-minor > > operations, and 9647 gc-minor-walkroots operations. > > > > > > Thanks, > > /Martin > > > > > > On Mon, Mar 17, 2014 at 8:21 AM, Maciej Fijalkowski <[email protected]> > > wrote: > >> > >> there is an environment variable PYPYLOG=gc:- (where - is stdout) > >> which will do that for you btw. > >> > >> maybe you can find out what's that using profiling or valgrind? > >> > >> On Sun, Mar 16, 2014 at 11:34 PM, Martin Koch <[email protected]> wrote: > >> > I have tried getting the pypy source and building my own version of > >> > pypy. I > >> > have modified rpython/memory/gc/incminimark.py:major_collection_step() > >> > to > >> > print out when it starts and when it stops. Apparently, the slow > queries > >> > do > >> > NOT occur during major_collection_step; at least, I have not observed > >> > major > >> > step output during a query execution. So, apparently, something else > is > >> > blocking. This could be another aspect of the GC, but it could also be > >> > anything else. > >> > > >> > Just to be sure, I have tried running the same application in python > >> > with > >> > garbage collection disabled. I don't see the problem there, so it is > >> > somehow > >> > related to either GC or the runtime somehow. > >> > > >> > Cheers, > >> > /Martin > >> > > >> > > >> > On Fri, Mar 14, 2014 at 4:19 PM, Martin Koch <[email protected]> wrote: > >> >> > >> >> We have hacked up a small sample that seems to exhibit the same > issue. > >> >> > >> >> We basically generate a linked list of objects. To increase > >> >> connectedness, > >> >> elements in the list hold references (dummy_links) to 10 randomly > >> >> chosen > >> >> previous elements in the list. > >> >> > >> >> We then time a function that traverses 50000 elements from the list > >> >> from a > >> >> random start point. If the traversal reaches the end of the list, we > >> >> instead > >> >> traverse one of the dummy links. Thus, exactly 50K elements are > >> >> traversed > >> >> every time. To generate some garbage, we build a list holding the > >> >> traversed > >> >> elements and a dummy list of characters. > >> >> > >> >> Timings for the last 100 runs are stored in a circular buffer. If the > >> >> elapsed time for the last run is more than twice the average time, we > >> >> print > >> >> out a line with the elapsed time, the threshold, and the 90% runtime > >> >> (we > >> >> would like to see that the mean runtime does not increase with the > >> >> number of > >> >> elements in the list, but that the max time does increase (linearly > >> >> with the > >> >> number of object, i guess); traversing 50K elements should be > >> >> independent of > >> >> the memory size). > >> >> > >> >> We have tried monitoring memory consumption by external inspection, > but > >> >> cannot consistently verify that memory is deallocated at the same > time > >> >> that > >> >> we see slow requests. Perhaps the pypy runtime doesn't always return > >> >> freed > >> >> pages back to the OS? > >> >> > >> >> Using top, we observe that 10M elements allocates around 17GB after > >> >> building, 20M elements 26GB, 30M elements 28GB (and grows to 35GB > >> >> shortly > >> >> after building). > >> >> > >> >> Here is output from a few runs with different number of elements: > >> >> > >> >> > >> >> pypy mem.py 10000000 > >> >> start build > >> >> end build 84.142424 > >> >> that took a long time elapsed: 13.230586 slow_threshold: 1.495401 > >> >> 90th_quantile_runtime: 0.421558 > >> >> that took a long time elapsed: 13.016531 slow_threshold: 1.488160 > >> >> 90th_quantile_runtime: 0.423441 > >> >> that took a long time elapsed: 13.032537 slow_threshold: 1.474563 > >> >> 90th_quantile_runtime: 0.419817 > >> >> > >> >> pypy mem.py 20000000 > >> >> start build > >> >> end build 180.823105 > >> >> that took a long time elapsed: 27.346064 slow_threshold: 2.295146 > >> >> 90th_quantile_runtime: 0.434726 > >> >> that took a long time elapsed: 26.028852 slow_threshold: 2.283927 > >> >> 90th_quantile_runtime: 0.374190 > >> >> that took a long time elapsed: 25.432279 slow_threshold: 2.279631 > >> >> 90th_quantile_runtime: 0.371502 > >> >> > >> >> pypy mem.py 30000000 > >> >> start build > >> >> end build 276.217811 > >> >> that took a long time elapsed: 40.993855 slow_threshold: 3.188464 > >> >> 90th_quantile_runtime: 0.459891 > >> >> that took a long time elapsed: 41.693553 slow_threshold: 3.183003 > >> >> 90th_quantile_runtime: 0.393654 > >> >> that took a long time elapsed: 39.679769 slow_threshold: 3.190782 > >> >> 90th_quantile_runtime: 0.393677 > >> >> that took a long time elapsed: 43.573411 slow_threshold: 3.239637 > >> >> 90th_quantile_runtime: 0.393654 > >> >> > >> >> Code below > >> >> -------------------------------------------------------------- > >> >> import time > >> >> from random import randint, choice > >> >> import sys > >> >> > >> >> > >> >> allElems = {} > >> >> > >> >> class Node: > >> >> def __init__(self, v_): > >> >> self.v = v_ > >> >> self.next = None > >> >> self.dummy_data = [randint(0,100) > >> >> for _ in xrange(randint(50,100))] > >> >> allElems[self.v] = self > >> >> if self.v > 0: > >> >> self.dummy_links = [allElems[randint(0, self.v-1)] for _ > in > >> >> xrange(10)] > >> >> else: > >> >> self.dummy_links = [self] > >> >> > >> >> def set_next(self, l): > >> >> self.next = l > >> >> > >> >> > >> >> def follow(node): > >> >> acc = [] > >> >> count = 0 > >> >> cur = node > >> >> assert node.v is not None > >> >> assert cur is not None > >> >> while count < 50000: > >> >> # return a value; generate some garbage > >> >> acc.append((cur.v, [choice("abcdefghijklmnopqrstuvwxyz") for > x > >> >> in > >> >> xrange(100)])) > >> >> > >> >> # if we have reached the end, chose a random link > >> >> cur = choice(cur.dummy_links) if cur.next is None else > cur.next > >> >> count += 1 > >> >> > >> >> return acc > >> >> > >> >> > >> >> def build(num_elems): > >> >> start = time.time() > >> >> print "start build" > >> >> root = Node(0) > >> >> cur = root > >> >> for x in xrange(1, num_elems): > >> >> e = Node(x) > >> >> cur.next = e > >> >> cur = e > >> >> print "end build %f" % (time.time() - start) > >> >> return root > >> >> > >> >> > >> >> num_timings = 100 > >> >> if __name__ == "__main__": > >> >> num_elems = int(sys.argv[1]) > >> >> build(num_elems) > >> >> total = 0 > >> >> timings = [0.0] * num_timings # run times for the last > num_timings > >> >> runs > >> >> i = 0 > >> >> beginning = time.time() > >> >> while time.time() - beginning < 600: > >> >> start = time.time() > >> >> elem = allElems[randint(0, num_elems - 1)] > >> >> assert(elem is not None) > >> >> > >> >> lst = follow(elem) > >> >> > >> >> total += choice(lst)[0] # use the return value for something > >> >> > >> >> end = time.time() > >> >> > >> >> elapsed = end-start > >> >> timings[i % num_timings] = elapsed > >> >> if (i > num_timings): > >> >> slow_time = 2 * sum(timings)/num_timings # slow defined > as > >> >> > > >> >> 2*avg run time > >> >> if (elapsed > slow_time): > >> >> print "that took a long time elapsed: %f > >> >> slow_threshold: > >> >> %f 90th_quantile_runtime: %f" % \ > >> >> (elapsed, slow_time, > >> >> sorted(timings)[int(num_timings*.9)]) > >> >> i += 1 > >> >> print total > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> On Thu, Mar 13, 2014 at 7:45 PM, Maciej Fijalkowski < > [email protected]> > >> >> wrote: > >> >>> > >> >>> On Thu, Mar 13, 2014 at 1:45 PM, Martin Koch <[email protected]> wrote: > >> >>> > Hi Armin, Maciej > >> >>> > > >> >>> > Thanks for responding. > >> >>> > > >> >>> > I'm in the process of trying to determine what (if any) of the > code > >> >>> > I'm > >> >>> > in a > >> >>> > position to share, and I'll get back to you. > >> >>> > > >> >>> > Allowing hinting to the GC would be good. Even better would be a > >> >>> > means > >> >>> > to > >> >>> > allow me to (transparently) allocate objects in unmanaged memory, > >> >>> > but I > >> >>> > would expect that to be a tall order :) > >> >>> > > >> >>> > Thanks, > >> >>> > /Martin > >> >>> > >> >>> Hi Martin. > >> >>> > >> >>> Note that in case you want us to do the work of isolating the > problem, > >> >>> we do offer paid support to do that (then we can sign NDAs and > stuff). > >> >>> Otherwise we would be more than happy to fix bugs once you isolate a > >> >>> part you can share freely :) > >> >> > >> >> > >> > > > > > >
_______________________________________________ pypy-dev mailing list [email protected] https://mail.python.org/mailman/listinfo/pypy-dev
