Re: [Python-Dev] Fixing the XML batteries

2011-12-16 Thread Nick Coghlan
On Fri, Dec 16, 2011 at 4:53 PM, Stefan Behnel stefan...@behnel.de wrote:
 If these changes are considered acceptable, I'll copy the above over to the
 documentation bug I opened at

 http://bugs.python.org/issue11379

 Can these doc changes go into both 2.7 and 3.3? Given that there is no
 important difference between the implementations, I don't see why the
 documentation should differ in Py2.

Your suggested tweaks look good to me and could go into all of 2.7, 3.2 and 3.3

 b) cElementTree should finally loose it's special status as a separate
 library and disappear as an accelerator module behind ElementTree.

 There was no opposition and a general agreement on this in the thread,
 except for the warning that Fredrik Lundh should have a word in this. I
 wrote him an e-mail and didn't get a response so far. We can wait a little
 longer, I guess, there's still time before 3.3beta.

Having ElementTree implicitly do from _elementtree import * is a 3.3
only change, though. (Note that xml.etree.cElementTree isn't the
actual acceleration module - that honor already goes to
_elementtree. The only bit missing is the automatic import in
xml.etree.ElementTree and the appropriate test updates to ensure the
Python version still gets tested)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] French sprint this week-end

2011-12-16 Thread Stefan Krah
Victor Stinner victor.stin...@haypocalc.com wrote:
 Do you know simple task to start contributing to Python? Something  
 useful and not boring if possible :-) There is the easy tag on the bug  
 tracker, but many issues have a long history, already have a patch, etc.  
 Do know other generic task like improving code coverage or support of  
 some rare platforms?

On some buildbots compiler warnings are starting to accumulate. Installing
a recent version of gcc and fixing those might be a good task. If the
participants are new to buildbot, it might even be interesting for them. :)


Stefan Krah


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] French sprint this week-end

2011-12-16 Thread Eli Bendersky
On Fri, Dec 16, 2011 at 11:00, Stefan Krah ste...@bytereef.org wrote:

 Victor Stinner victor.stin...@haypocalc.com wrote:
  Do you know simple task to start contributing to Python? Something
  useful and not boring if possible :-) There is the easy tag on the bug
  tracker, but many issues have a long history, already have a patch, etc.
  Do know other generic task like improving code coverage or support of
  some rare platforms?

 On some buildbots compiler warnings are starting to accumulate. Installing
 a recent version of gcc and fixing those might be a good task. If the
 participants are new to buildbot, it might even be interesting for them. :)


Do we have buildbots that build Python with Clang instead of GCC? The
reason I'm asking is that Clang's diagnostics are usually better, and
fixing all its warnings could nicely complement fixing GCC's qualms.

Eli
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] French sprint this week-end

2011-12-16 Thread Dirkjan Ochtman
On Fri, Dec 16, 2011 at 10:17, Eli Bendersky eli...@gmail.com wrote:
 Do we have buildbots that build Python with Clang instead of GCC? The reason
 I'm asking is that Clang's diagnostics are usually better, and fixing all
 its warnings could nicely complement fixing GCC's qualms.

The box running my buildslave has clang installed, so someone with
access to the buildmaster could probably set that up without too much
trouble.

Cheers,

Dirkjan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A new dict for Xmas?

2011-12-16 Thread Mark Shannon

Greg Ewing wrote:

Mark Shannon wrote:

I have a new dict implementation which allows sharing of keys between 
objects of the same class.


We already have the __slots__ mechanism for memory savings.
Have you done any comparisons with that?



You can't make Python programmers use slots, neither can you
automatically change existing programs.

Are you suggesting that because the __slots__ mechanism exists,
the dict implementation doesn't have to be efficient?


Seems to me that __slots__ ought to save even more memory,
since it eliminates the per-instance dict altogether rather
than just the keys half of it.



Of course using __slots__ saves more memory,
but people don't use them much.

Cheers,
Mark.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing the XML batteries

2011-12-16 Thread Stefan Behnel

Stefan Behnel, 14.12.2011 20:41:

It's clear from the
discussion that there are still users and that new code is still being
written that uses MiniDOM. However, I would argue that this cannot possibly
be performance critical code and that it only deals with somewhat small
documents. I say that because MiniDOM is evidently not suitable for large
documents or performance critical applications, so this is the only
explanation I have why the performance problems would not be obvious in the
cases where it is still being used. And if they do show, it appears to be
much more likely that users rewrite their code using ElementTree or lxml
than that they try to fix MiniDOM's performance issues.


Out of curiosity, I reran my benchmarks under PyPy 1.7.

http://blog.behnel.de/index.php?p=210

In short: MiniDOM performs substantially better there, both in terms of 
time and space. That by itself doesn't make PyPy an interesting platform 
for XML processing (using lxml in CPython is way faster), but I found it 
interesting to note that the problem is not strictly inherent in MiniDOM. 
It also depends a lot on the runtime environment, even when it comes to 
memory usage.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing the XML batteries

2011-12-16 Thread Baptiste Carvello
Le 16/12/2011 07:53, Stefan Behnel a écrit :

 Additionally, the documentation on the xml.sax page would benefit from
 the following paragraph:
 
 
 [[Note: The xml.sax package provides an implementation of the SAX
 interface whose API is similar to that in other programming languages.
 Users who are unfamiliar with the SAX interface or who would like to
 write less code for efficient stream processing of XML files should
 consider using the iterparse() function in the xml.etree.ElementTree
 module instead.]]
 
 

A small caveat to note about iterparse(), which I otherwise like a lot:
when processing very big data (I encountered this with a region-wide
openstreetmap XML dump), you have to remove the processed nodes from the
root element. Otherwise, its memory footprint increases with the size of
the document.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2011-12-16 Thread Python tracker

ACTIVITY SUMMARY (2011-12-09 - 2011-12-16)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open3175 ( +6)
  closed 0 (+40)
  total  25395 (+46)

Open issues with patches: 1360 


Issues opened (31)
==

#11886: test_time.test_tzset() fails on x86 FreeBSD 7.2 3.x: AEST ti
http://bugs.python.org/issue11886  reopened by haypo

#13571: Backup files support in IDLE
http://bugs.python.org/issue13571  opened by maniram.maniram

#13572: import _curses fails because of UnicodeDecodeError('utf8' code
http://bugs.python.org/issue13572  opened by haypo

#13573: csv.writer uses str() for floats instead of repr()
http://bugs.python.org/issue13573  opened by rhettinger

#13574: refresh example in doc for Extending and Embedding
http://bugs.python.org/issue13574  opened by flox

#13576: Handling of broken condcoms in HTMLParser
http://bugs.python.org/issue13576  opened by ezio.melotti

#13577: __qualname__ is not present on builtin methods and functions
http://bugs.python.org/issue13577  opened by meador.inge

#13578: Add subprocess.iter_output() convenience function
http://bugs.python.org/issue13578  opened by ncoghlan

#13579: string.Formatter doesn't understand the !a conversion specifie
http://bugs.python.org/issue13579  opened by ncoghlan

#13581: help() appears to be broken; doesn't display __doc__ for class
http://bugs.python.org/issue13581  opened by christopherthemagnificent

#13582: IDLE and pythonw.exe stderr problem
http://bugs.python.org/issue13582  opened by serwy

#13583: sqlite3.Row doesn't support slice indexes
http://bugs.python.org/issue13583  opened by xapple

#13585: Add contextlib.CleanupManager
http://bugs.python.org/issue13585  opened by Nikratio

#13586: Replace selected not working/consistent with find
http://bugs.python.org/issue13586  opened by marco

#13587: Correcting the typos error in Doc/howto/urllib2.rst
http://bugs.python.org/issue13587  opened by Bithin.A

#13588: Change name of internal closure functions in importlib
http://bugs.python.org/issue13588  opened by brett.cannon

#13589: Aifc low level serialization primitives fix
http://bugs.python.org/issue13589  opened by Oleg.Plakhotnyuk

#13590: Prebuilt python-2.7.2 binaries for macosx can not compile c ex
http://bugs.python.org/issue13590  opened by teamnoir

#13592: repr(regex) doesn't include actual regex
http://bugs.python.org/issue13592  opened by dwt

#13594: Aifc markers write fix
http://bugs.python.org/issue13594  opened by Oleg.Plakhotnyuk

#13598: string.Formatter doesn't support empty curly braces {}
http://bugs.python.org/issue13598  opened by maniram.maniram

#13601: sys.stderr should be unbuffered (or always line-buffered)
http://bugs.python.org/issue13601  opened by pitrou

#13604: update PEP 393 (match implementation)
http://bugs.python.org/issue13604  opened by Jim.Jewett

#13605: document argparse's nargs=REMAINDER
http://bugs.python.org/issue13605  opened by bethard

#13607: Move generator specific sections out of ceval.
http://bugs.python.org/issue13607  opened by ron_adam

#13608: remove born-deprecated PyUnicode_AsUnicodeAndSize
http://bugs.python.org/issue13608  opened by Jim.Jewett

#13609: Add os.get_terminal_size() function
http://bugs.python.org/issue13609  opened by denilsonsa

#13610: On Python parsing numbers.
http://bugs.python.org/issue13610  opened by Jean-Michel.Fauth

#13611: Integrate ElementC14N module into xml.etree package
http://bugs.python.org/issue13611  opened by scoder

#13612: xml.etree.ElementTree says unknown encoding of a regular encod
http://bugs.python.org/issue13612  opened by dongying

#13613: Small error in regular expression poker hand example
http://bugs.python.org/issue13613  opened by Eddie E



Most recent 15 issues with no replies (15)
==

#13611: Integrate ElementC14N module into xml.etree package
http://bugs.python.org/issue13611

#13608: remove born-deprecated PyUnicode_AsUnicodeAndSize
http://bugs.python.org/issue13608

#13605: document argparse's nargs=REMAINDER
http://bugs.python.org/issue13605

#13594: Aifc markers write fix
http://bugs.python.org/issue13594

#13590: Prebuilt python-2.7.2 binaries for macosx can not compile c ex
http://bugs.python.org/issue13590

#13587: Correcting the typos error in Doc/howto/urllib2.rst
http://bugs.python.org/issue13587

#13586: Replace selected not working/consistent with find
http://bugs.python.org/issue13586

#13583: sqlite3.Row doesn't support slice indexes
http://bugs.python.org/issue13583

#13579: string.Formatter doesn't understand the !a conversion specifie
http://bugs.python.org/issue13579

#13576: Handling of broken condcoms in HTMLParser
http://bugs.python.org/issue13576

#13574: refresh example in doc for Extending and Embedding
http://bugs.python.org/issue13574

#13565: test_multiprocessing.test_notify_all() hangs on AMD64 Snow Le

[Python-Dev] A new dict for Xmas?

2011-12-16 Thread Jim Jewett
 Greg Ewing wrote:
 Mark Shannon wrote:

 I have a new dict implementation which allows sharing of keys between
 objects of the same class.

 We already have the __slots__ mechanism for memory savings.
 Have you done any comparisons with that?

 You can't make Python programmers use slots, neither can you
 automatically change existing programs.

The automatic change is exactly what a dictionary upgrade provides.

I haven't read your patch in detail yet, but it sounds like you're
replacing the array of keys + array of values with just an array of
values, and getting the numerical index from a single per-class array
of keys.

That would normally be sensible (so thanks!), but it isn't a drop-in
replacement.  If you have a Data class intended to take arbitrary
per-instance attributes, it just forces them all to keep resizing up,
even though individual instances would be small with the current dict.

How is this more extreme than replacing a pure dict with some
auto-calculated slots and an other_attrs dict that would normally
remain empty?

[It may be harder to implement, because of the difficulty of
calculating the slots in advance ... but I don't see it as any worse,
once implemented.]

Of course, maybe your shared dict just points to sequential array
positions (rather than matching the key position) ... in which case,
it may well beat slots, though the the Data class would still be a
problem.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A new dict for Xmas?

2011-12-16 Thread Terry Reedy

On 12/16/2011 5:03 AM, Mark Shannon wrote:


Of course using __slots__ saves more memory,
but people don't use them much.


Do you think the stdlib should be using __slots__ more?

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A new dict for Xmas?

2011-12-16 Thread Mark Shannon

Jim Jewett wrote:

Greg Ewing wrote:

Mark Shannon wrote:



I have a new dict implementation which allows sharing of keys between
objects of the same class.



We already have the __slots__ mechanism for memory savings.
Have you done any comparisons with that?



You can't make Python programmers use slots, neither can you
automatically change existing programs.


The automatic change is exactly what a dictionary upgrade provides.

I haven't read your patch in detail yet, but it sounds like you're
replacing the array of keys + array of values with just an array of
values, and getting the numerical index from a single per-class array
of keys.


Each dictionary has key/hash/values as before, but instead of on array,
they are broken into two: a key/hash array and a value array.
The key/hash arrays can be shared amongst dicts,
this happens for well behaved classes and completely empty dicts,
other wise each dict gets two arrays.



That would normally be sensible (so thanks!), but it isn't a drop-in
replacement.  If you have a Data class intended to take arbitrary


It is a drop in replacement. It conforms to the current API.


per-instance attributes, it just forces them all to keep resizing up,
even though individual instances would be small with the current dict.
There is a cut-off point, at the moment it's quite unsophisticated about 
how it does this, but it could easily be improved.

Suggestions are welcome.



How is this more extreme than replacing a pure dict with some
auto-calculated slots and an other_attrs dict that would normally
remain empty?


Its less extreme, but equally effective.



[It may be harder to implement, because of the difficulty of
calculating the slots in advance ... but I don't see it as any worse,
once implemented.]

Its a trade of between ease of implementation as effectiveness.
I think the shared key/hash array approach gets most the advantages of
a full map implementation (like PyPy or V8) with much less hassle.



Of course, maybe your shared dict just points to sequential array
positions (rather than matching the key position) ... in which case,
it may well beat slots, though the the Data class would still be a
problem.


It won't beat slots, mainly due to the extra space required to minimise 
collisions, but it is a lot more compact than the present approach.


For a well behaved class with lots of instances, each with 3 or 4 
attributes (ie the minimum size dict) its cuts the space used by the 
per-instance dict from 136 bytes (32bit machine) to 64 bytes plus the 
shared key/hash array. Slots would only require 12 or 16 bytes.


(When verifying these numbers I found a bug in the resizing,
which I have just fixed)

The next enhancement would be to store the naked value array directly 
into an instance, trimming the space cost down to just 32 bytes, but 
this would cause compatibility issues as the (internal) API would need 
to change.


Cheers,
Mark.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A new dict for Xmas?

2011-12-16 Thread Mark Shannon

Terry Reedy wrote:

On 12/16/2011 5:03 AM, Mark Shannon wrote:


Of course using __slots__ saves more memory,
but people don't use them much.


Do you think the stdlib should be using __slots__ more?


For some things yes, but where it's critical slots are already used.
Take the ordered dict, the nodes in that use slots.

The advantage of improving things in the VM is that
we don't have to rewrite half of the stdlib.

Cheers,
Mark.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com