Re: Tremendous slowdown due to garbage collection

2008-05-01 Thread Dieter Maurer
John Nagle [EMAIL PROTECTED] writes on Mon, 28 Apr 2008 11:41:41 -0700: Dieter Maurer wrote: Christian Heimes [EMAIL PROTECTED] writes on Sat, 12 Apr 2008 18:47:32 +0200: [EMAIL PROTECTED] schrieb: which made me suggest to use these as defaults, but then We observed similar very bad

Re: Tremendous slowdown due to garbage collection

2008-04-30 Thread s0suk3
On Apr 12, 11:11 am, [EMAIL PROTECTED] wrote: I should have been more specific about possible fixes. python2.5 -m timeit 'gc.disable();l=[(i,) for i in range(200)]' 10 loops, best of 3: 662 msec per loop python2.5 -m timeit 'gc.enable();l=[(i,) for i in range(200)]' 10

Re: Tremendous slowdown due to garbage collection

2008-04-30 Thread Aaron Watters
I do not argue that Python's default GC parameters must change -- only that applications with lots of objects may want to consider a reconfiguration. I would argue that changing the GC to some sort of adaptive strategy should at least be investigated. Having an app which doesn't need gc

Re: Tremendous slowdown due to garbage collection

2008-04-28 Thread Dieter Maurer
Martin v. Löwis wrote at 2008-4-27 19:33 +0200: Martin said it but nevertheless it might not be true. We observed similar very bad behaviour -- in a Web application server. Apparently, the standard behaviour is far from optimal when the system contains a large number of objects and

Re: Tremendous slowdown due to garbage collection

2008-04-28 Thread John Nagle
Dieter Maurer wrote: Christian Heimes [EMAIL PROTECTED] writes on Sat, 12 Apr 2008 18:47:32 +0200: [EMAIL PROTECTED] schrieb: which made me suggest to use these as defaults, but then We observed similar very bad behaviour -- in a Web application server. Apparently, the standard behaviour is

Re: Tremendous slowdown due to garbage collection

2008-04-28 Thread Martin v. Löwis
I do not argue that Python's default GC parameters must change -- only that applications with lots of objects may want to consider a reconfiguration. That's exactly what I was trying to say: it's not that the parameters are useful for *all* applications (that's why they are tunable

Re: Tremendous slowdown due to garbage collection

2008-04-27 Thread Dieter Maurer
Christian Heimes [EMAIL PROTECTED] writes on Sat, 12 Apr 2008 18:47:32 +0200: [EMAIL PROTECTED] schrieb: which made me suggest to use these as defaults, but then Martin v. Löwis wrote that No, the defaults are correct for typical applications. At that point I felt lost and as the

Re: Tremendous slowdown due to garbage collection

2008-04-27 Thread Martin v. Löwis
Martin said it but nevertheless it might not be true. We observed similar very bad behaviour -- in a Web application server. Apparently, the standard behaviour is far from optimal when the system contains a large number of objects and occationally, large numbers of objects are created in a

Re: Tremendous slowdown due to garbage collection

2008-04-27 Thread Terry Reedy
Dieter Maurer [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] | We observed similar very bad behaviour -- in a Web application server. | Apparently, the standard behaviour is far from optimal when the | system contains a large number of objects and occationally, large | numbers of

Re: Tremendous slowdown due to garbage collection

2008-04-27 Thread Paul Rubin
Terry Reedy [EMAIL PROTECTED] writes: Can this alternative be made easier by adding a context manager to gc module to use with 'with' statements? Something like with gc.delay() as dummy: block That sonuds worth adding as a hack, but really I hope there can be an improved gc someday. --

Re: Tremendous slowdown due to garbage collection

2008-04-15 Thread Aaron Watters
On Apr 14, 11:18 pm, Carl Banks [EMAIL PROTECTED] wrote: However, that is for the OP to decide. The reason I don't like the sort of question I posed is it's presumptuous--maybe the OP already considered and rejected this, and has taken steps to ensure the in memory data structure won't be

Re: Tremendous slowdown due to garbage collection

2008-04-15 Thread Paul Rubin
Aaron Watters [EMAIL PROTECTED] writes: Even with Btree's if you jump around in the tree the performance can be awful. The Linux file cache really helps. The simplest approach is to just cat the index files to /dev/null a few times an hour. Slightly faster (what I do with Solr) is mmap the

Re: Tremendous slowdown due to garbage collection

2008-04-14 Thread Aaron Watters
A question often asked--and I am not a big a fan of these sorts of questions, but it is worth thinking about--of people who are creating very large data structures in Python is Why are you doing that? That is, you should consider whether some kind of database solution would be better. You

Re: Tremendous slowdown due to garbage collection

2008-04-14 Thread Carl Banks
On Apr 14, 4:27 pm, Aaron Watters [EMAIL PROTECTED] wrote: A question often asked--and I am not a big a fan of these sorts of questions, but it is worth thinking about--of people who are creating very large data structures in Python is Why are you doing that? That is, you should consider

Re: Tremendous slowdown due to garbage collection

2008-04-13 Thread Rhamphoryncus
On Apr 12, 6:58 pm, Steve Holden [EMAIL PROTECTED] wrote: Paul Rubin wrote: Steve Holden [EMAIL PROTECTED] writes: I believe you are making surmises outside your range of competence there. While your faith in the developers is touching, the garbage collection scheme is something that has

Re: Tremendous slowdown due to garbage collection

2008-04-12 Thread Steve Holden
[...] I would suggest to configure the default behaviour of the garbage collector in such a way that this squared complexity is avoided without requiring specific knowledge and intervention by the user. Not being an expert in these details I would like to ask the gurus how this could be done.

Re: Tremendous slowdown due to garbage collection

2008-04-12 Thread Carl Banks
On Apr 12, 7:02 am, [EMAIL PROTECTED] wrote: I would suggest to configure the default behaviour of the garbage collector in such a way that this squared complexity is avoided without requiring specific knowledge and intervention by the user. Not being an expert in these details I would like to

Re: Tremendous slowdown due to garbage collection

2008-04-12 Thread andreas . eisele
I should have been more specific about possible fixes. python2.5 -m timeit 'gc.disable();l=[(i,) for i in range(200)]' 10 loops, best of 3: 662 msec per loop python2.5 -m timeit 'gc.enable();l=[(i,) for i in range(200)]' 10 loops, best of 3: 15.2 sec per loop In the latter

Re: Tremendous slowdown due to garbage collection

2008-04-12 Thread John Nagle
[EMAIL PROTECTED] wrote: In an application dealing with very large text files, I need to create dictionaries indexed by tuples of words (bi-, tri-, n-grams) or nested dictionaries. The number of different data structures in memory grows into orders beyond 1E7. It turns out that the default

Re: Tremendous slowdown due to garbage collection

2008-04-12 Thread Christian Heimes
[EMAIL PROTECTED] schrieb: which made me suggest to use these as defaults, but then Martin v. Löwis wrote that No, the defaults are correct for typical applications. At that point I felt lost and as the general wish in that thread was to move discussion to comp.lang.python, I brought it

Re: Tremendous slowdown due to garbage collection

2008-04-12 Thread andreas . eisele
Sorry, I have to correct my last posting again: Disabling the gc may not be a good idea in a real application; I suggest you to play with the gc.set_threshold function and set larger values, at least while building the dictionary. (700, 1000, 10) seems to yield good results. python2.5

Re: Tremendous slowdown due to garbage collection

2008-04-12 Thread andreas . eisele
Martin said that the default settings for the cyclic gc works for most people. I agree. Your test case has found a pathologic corner case which is *not* typical for common application but typical for an artificial benchmark. I agree that my corner is not typical, but I strongly disagree

Re: Tremendous slowdown due to garbage collection

2008-04-12 Thread Steve Holden
[EMAIL PROTECTED] wrote: Martin said that the default settings for the cyclic gc works for most people. I agree. Your test case has found a pathologic corner case which is *not* typical for common application but typical for an artificial benchmark. I agree that my corner is not

Re: Tremendous slowdown due to garbage collection

2008-04-12 Thread Paul Rubin
Steve Holden [EMAIL PROTECTED] writes: I believe you are making surmises outside your range of competence there. While your faith in the developers is touching, the garbage collection scheme is something that has received a lot of attention with respect to performance under typical workloads

Re: Tremendous slowdown due to garbage collection

2008-04-12 Thread Steve Holden
Paul Rubin wrote: Steve Holden [EMAIL PROTECTED] writes: I believe you are making surmises outside your range of competence there. While your faith in the developers is touching, the garbage collection scheme is something that has received a lot of attention with respect to performance under

Re: Tremendous slowdown due to garbage collection

2008-04-12 Thread Martin v. Löwis
I still don't see what is so good about defaults that lead to O(N*N) computation for a O(N) problem, and I like Amaury's suggestion a lot, so I would like to see comments on its disadvantages. Please don't tell me that O(N*N) is good enough. For N1E7 it isn't. Please understand that changing