Kevin Jacobs <[EMAIL PROTECTED]> wrote:

I can say with complete certainty that of the 20+ programmers I've had working for me, many who have used Python for 3+ years, not a single one would think to question the garbage collector if they observed the kind of quadratic time complexity I've demonstrated. This is not because they are stupid, but because they have only a vague idea that Python even has a garbage collector, never mind that it could be behaving badly for such innocuous looking code.

As I understand it, gc is needed now more that ever because new style classes make reference cycles more common. On the other hand, greatly increased RAM size (from some years ago) makes megaobject bursts possible. Such large bursts move the hidden quadratic do-nothing drag out of the relatively flat part of the curve (total time just double or triple what it should be) to where it can really bite. Leaving aside what you do for your local group, can we better warn Python programmers now, for the upcoming 2.5, 2.6, and 3.0 releases?

Paragraph 3 of the Reference Manual chapter on Data Model(3.0 version) says:
"Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable. (Implementation note: the current implementation uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage.)"
I am not sure what to add here, (especially for those who do not read it;-).

The Library Manual gc section says "Since the collector supplements the reference counting already used in Python, you can disable the collector if you are sure your program does not create reference cycles." Perhaps it should also say "You should disable when creating millions of objects without cycles".

The installed documentation set (on Windows, at least) include some Python HOWTOs. If one were added on Space Management (implementations, problems, and solutions), would your developers read it?

Maybe we should consider more carefully before declaring the status quo sufficient. Average developers do allocate millions of objects in bursts and super-linear time complexity for such operations is not acceptable. Thankfully I am around to help my programmers work around such issues or else they'd be pushing to switch to Java, Ruby, C#, or whatever since Python was inexplicably "too slow" for "real work". This being open source, I'm certainly willing to help in the effort to do so, but not if potential solutions will be ruled out as being unnecessary.

To me, 'sufficient' (time-dependent) and 'necessary' are either too vague or too strict to being about what you want -- change. This is the third thread I have read (here + c.l.p) on default-mode gc problems (but all in the last couple of years or so). So, especially with the nice table someone posted recently, on time with and without gc, and considering that installed RAM continues to grow, I am persuaded that default behavior improvement that does not negatively impact the vast majority would be desirable.

Terry Jan Reedy

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to