Re: [Python-Dev] Is this safe enough? Re: [Python-checkins] cpython: _Py_Identifier are always ASCII strings

2012-02-07 Thread Martin v. Löwis
 _Py_IDENTIFIER(xxx) defines a variable called PyId_xxx, so xxx can
 only be ASCII: the C language doesn't accept non-ASCII identifiers.

That's not exactly true. In C89, source code is in the source character
set, which is implementation-defined, except that it must contain
the basic character set. I believe that it allows for
implementation-defined characters in identifiers. In C99, this is
extended to include universal character names (\u escapes). They may
appear in identifiers
as long as the characters named are listed in annex D.59 (which I cannot
locate).

In C 2011, annexes D.1 and D.2 specify the characters that you can use
in an identifier:

D.1 Ranges of characters allowed
1. 00A8, 00AA, 00AD, 00AF, 00B2−00B5, 00B7−00BA, 00BC−00BE, 00C0−00D6,
00D8−00F6, 00F8−00FF
2. 0100−167F, 1681−180D, 180F−1FFF
3. 200B−200D, 202A−202E, 203F−2040, 2054, 2060−206F
4. 2070−218F, 2460−24FF, 2776−2793, 2C00−2DFF, 2E80−2FFF
5. 3004−3007, 3021−302F, 3031−303F
6. 3040−D7FF
7. F900−FD3D, FD40−FDCF, FDF0−FE44, FE47−FFFD
8. 1−1FFFD, 2−2FFFD, 3−3FFFD, 4−4FFFD, 5−5FFFD,
6−6FFFD, 7−7FFFD, 8−8FFFD, 9−9FFFD, A−AFFFD,
B−BFFFD, C−CFFFD, D−DFFFD, E−EFFFD

D.2 Ranges of characters disallowed initially
1. 0300−036F, 1DC0−1DFF, 20D0−20FF, FE20−FE2F

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is this safe enough? Re: [Python-checkins] cpython: _Py_Identifier are always ASCII strings

2012-02-07 Thread Victor Stinner
2012/2/7 Martin v. Löwis mar...@v.loewis.de:
 _Py_IDENTIFIER(xxx) defines a variable called PyId_xxx, so xxx can
 only be ASCII: the C language doesn't accept non-ASCII identifiers.

 That's not exactly true. In C89, source code is in the source character
 set, which is implementation-defined, except that it must contain
 the basic character set. I believe that it allows for
 implementation-defined characters in identifiers.

Hum, I hope that these C89 compilers use UTF-8.

 In C99, this is
 extended to include universal character names (\u escapes). They may
 appear in identifiers
 as long as the characters named are listed in annex D.59 (which I cannot
 locate).

Does C99 specify the encoding? Can we expect UTF-8?

Python is supposed to work on many platforms ans so support a lot of
compilers, not only compilers supporting non-ASCII identifiers.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] [Python-ideas] matrix operations on dict :)

2012-02-07 Thread Mark Janssen
On Mon, Feb 6, 2012 at 6:12 PM, Steven D'Aprano st...@pearwood.info wrote:

 On Mon, Feb 06, 2012 at 09:01:29PM +0100, julien tayon wrote:
  Hello,
 
  Proposing vector operations on dict, and acknowledging there was an
  homeomorphism from rooted n-ary trees to dict, was inducing the
  possibility of making matrix of dict / trees.

 This seems interesting to me, but I don't see that they are important
 enough to be built-in to dicts. [...]


 Otherwise, this looks rather like a library of functions looking for a
 use. It might help if you demonstrate what concrete problems this helps
 you solve.


I have the problem looking for this solution!

The application for this functionality is in coding a fractal graph (or
multigraph in the literature).  This is the most powerful structure that
Computer Science has ever conceived.  If you look at the evolution of data
structures in compsci, the fractal graph is the ultimate.  From lists to
trees to graphs to multigraphs.  The latter elements can always encompass
the former with only O(1) extra cost.

It has the potential to encode *any* relationship from the very small to
the very large (as well as across or *laterally*) in one unified structure.
 Optimize this one data structure and the whole standard library could be
refactored and simplified by an order of magnitude.  Not only that, it will
pave the way for the re-factored internet that's being worked on which
creates a content-centric Internet beyond the graph-level, hypertext
internet.

Believe, it will be awesome.

Slowing down

mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] importlib quest

2012-02-07 Thread Brett Cannon
On Mon, Feb 6, 2012 at 14:49, Antoine Pitrou solip...@pitrou.net wrote:

 On Mon, 6 Feb 2012 09:57:56 -0500
 Brett Cannon br...@python.org wrote:
  Thanks for any help people can provide me on this now 5 year quest to get
  this work finished.

 Do you have any plan to solve the performance issue?


I have not even looked at performance or attempted to profile the code, so
I suspect there is room for improvement.



 $ ./python -m timeit -s import sys; mod='struct' \
  __import__(mod); del sys.modules[mod]
 1 loops, best of 3: 75.3 usec per loop
 $ ./python -m timeit -s import sys; mod='struct'; from importlib import
 __import__ \
  __import__(mod); del sys.modules[mod]
 1000 loops, best of 3: 421 usec per loop

 Startup time is already much worse in 3.3 than in 2.7. With such a
 slowdown in importing fresh modules, applications using many batteries
 (third-party or not) will be heavily impacted.


I have a benchmark suite for importing modules directly at
importlib.test.benchmark, but it doesn't explicitly cover searching far
down sys.path. I will see if any of the existing tests implicitly do that
and if not add it.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is this safe enough? Re: [Python-checkins] cpython: _Py_Identifier are always ASCII strings

2012-02-07 Thread Gregory P. Smith
Why do we still care about C89?  It is 2012 and we're talking about
Python 3.  What compiler on what platform that anyone actually cares
about does not support C99?

-gps
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is this safe enough? Re: [Python-checkins] cpython: _Py_Identifier are always ASCII strings

2012-02-07 Thread Amaury Forgeot d'Arc
2012/2/7 Gregory P. Smith g...@krypto.org

 Why do we still care about C89?  It is 2012 and we're talking about
 Python 3.  What compiler on what platform that anyone actually cares
 about does not support C99?


The Microsoft compilers on Windows do not support C99:
- Declarations must be at the start of a block
- No designated initializers for structures
- Ascii-only identifiers:
http://msdn.microsoft.com/en-us/library/e7f8y25b.aspx

-- 
Amaury Forgeot d'Arc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Brett Cannon
I'm going to start this off with the caveat that
hg.python.org/sandbox/bcannon#bootstrap_importlib is not completely at
feature parity, but getting there shouldn't be hard. There is a FAILING
file that has a list of the tests that are not passing because importlib
bootstrapping and a comment as to why (I think) they are failing. But no
switch would ever happen until the test suite passes.

Anyway, to start this conversation I'm going to open with why I think
removing most of the C code in Python/import.c and replacing it with
importlib/_bootstrap.py is a positive thing.

One is maintainability. Antoine mentioned how if change occurs everyone is
going to have to be able to fix code  in importlib, and that's the point! I
don't know about the rest of you but I find Python code easier to work with
than C code (and if you don't you might be subscribed to the wrong mailing
list =). I would assume the ability to make changes or to fix bugs will be
a lot easier with importlib than import.c. So maintainability should be
easier when it comes to imports.

Two is APIs. PEP 302 introduced this idea of an API for objects that can
perform imports so that people can control it, enhance it, introspect it,
etc. But as it stands right now, import.c implements none of PEP 302 for
any built-in import mechanism. This mostly stems from positive thing #1 I
just mentioned. but since I was able to do this code from scratch I was
able to design for (and extend) PEP 302 compliance in order to make sure
the entire import system was exposed cleanly. This means it is much easier
now to write a custom importer for quirky syntax, a different storage
mechanism, etc.

Third is multi-VM support. IronPython, Jython, and PyPy have all said they
would love importlib to become the default import implementation so that
all VMs have the same implementation. Some people have even said they will
use importlib regardless of what CPython does simply to ease their coding
burden, but obviously that still leads to the possibility of subtle
semantic differences that would go away if all VMs used the same
implementation. So switching would lead to one less possible semantic
difference between the various VMs.

So, that is the positives. What are the negatives? Performance, of course.

Now I'm going to be upfront and say I really did not want to have this
performance conversation now as I have done *NO* profiling or analysis of
the algorithms used in importlib in order to tune performance (e.g. the
function that handles case-sensitivity, which is on the critical path for
importing source code, has a platform check which could go away if I
instead had platform-specific versions of the function that were assigned
to a global variable at startup). I also know that people have a bad habit
of latching on to micro-benchmark numbers, especially for something like
import which involves startup or can easily be measured. I mean I wrote
importlib.test.benchmark to help measure performance changes in any
algorithmic changes I might make, but it isn't a real-world benchmark like
what Unladen Swallow gave us (e.g. the two start-up benchmarks that use
real-world apps -- hg and bzr -- aren't available on Python 3 so only
normal_startup and nosite_startup can be used ATM).

IOW I really do not look forward to someone saying importlib is so much
slower at importing a module containing ``pass`` when (a) that never
happens, and (b) most programs do not spend their time importing but
instead doing interesting work.

For instance, right now importlib does ``python -c import decimal``
(which, BTW, is the largest module in the stdlib) 25% slower on my machine
with a pydebug build (a non-debug build would probably be in my favor as I
have more Python objects being used in importlib and thus more sanity
checks). But if you do something (very) slightly more interesting like
``python -m calendar`` where is a slight amount of work then importlib is
currently only 16% slower. So it all depends on how we measure (as usual).

So, if there is going to be some baseline performance target I need to hit
to make people happy I would prefer to know what that (real-world)
benchmark is and what the performance target is going to be on a non-debug
build. And if people are not worried about the performance then I'm happy
with that as well. =)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Barry Warsaw
Brett, thanks for persevering on importlib!  Given how complicated imports are
in Python, I really appreciate you pushing this forward.  I've been knee deep
in both import.c and importlib at various times. ;)

On Feb 07, 2012, at 03:07 PM, Brett Cannon wrote:

One is maintainability. Antoine mentioned how if change occurs everyone is
going to have to be able to fix code  in importlib, and that's the point! I
don't know about the rest of you but I find Python code easier to work with
than C code (and if you don't you might be subscribed to the wrong mailing
list =). I would assume the ability to make changes or to fix bugs will be
a lot easier with importlib than import.c. So maintainability should be
easier when it comes to imports.

I think it's *really* critical that importlib be well-documented.  Not just
its API, but also design documents (what classes are there, and why it's
decomposed that way), descriptions of how to extend and subclass, maybe even
examples for doing some typical hooks.  Maybe even a guided tour or tutorial
for people digging into importlib for the first time.

So, that is the positives. What are the negatives? Performance, of course.

That's okay.  Get it complete, right, and usable first and then unleash the
Pythonic hoards to bang on performance.

IOW I really do not look forward to someone saying importlib is so much
slower at importing a module containing ``pass`` when (a) that never
happens, and (b) most programs do not spend their time importing but
instead doing interesting work.

Identifying the use cases are important here.  For example, even if it were a
lot slower, Mailman wouldn't care (*I* might care because it takes longer to
run my test, but my users wouldn't).  But Bazaar or Mercurial users would care
a lot.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Dirkjan Ochtman
On Tue, Feb 7, 2012 at 21:24, Barry Warsaw ba...@python.org wrote:
 Identifying the use cases are important here.  For example, even if it were a
 lot slower, Mailman wouldn't care (*I* might care because it takes longer to
 run my test, but my users wouldn't).  But Bazaar or Mercurial users would care
 a lot.

Yeah, startup performance getting worse kinda sucks for command-line
apps. And IIRC it's been getting worse over the past few releases...

Anyway, I think there was enough of a python3 port for Mercurial (from
various GSoC students) that you can probably run some of the very
simple commands (like hg parents or hg id), which should be enough for
your purposes, right?

Cheers,

Dirkjan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Antoine Pitrou
On Tue, 7 Feb 2012 15:07:24 -0500
Brett Cannon br...@python.org wrote:
 
 Now I'm going to be upfront and say I really did not want to have this
 performance conversation now as I have done *NO* profiling or analysis of
 the algorithms used in importlib in order to tune performance (e.g. the
 function that handles case-sensitivity, which is on the critical path for
 importing source code, has a platform check which could go away if I
 instead had platform-specific versions of the function that were assigned
 to a global variable at startup).

From a cursory look, I think you're gonna have to break (special-case)
some abstractions and have some inner loop coded in C for the common
cases.

That said, I think profiling and solving performance issues is critical
*before* integrating this work. It doesn't need to be done by you, but
the python-dev community shouldn't feel strong-armed to solve the issue.

 IOW I really do not look forward to someone saying importlib is so much
 slower at importing a module containing ``pass`` when (a) that never
 happens, and (b) most programs do not spend their time importing but
 instead doing interesting work.

Well, import time is so important that the Mercurial developers have
written an on-demand import mechanism, to reduce the latency of
command-line operations.

But it's not only important for Mercurial and the like. Even if you're
developing a Web app, making imports slower will make restarts slower,
and development more tedious in the first place.

 So, if there is going to be some baseline performance target I need to hit
 to make people happy I would prefer to know what that (real-world)
 benchmark is and what the performance target is going to be on a non-debug
 build.

- No significant slowdown in startup time.

- Within 25% of current performance when importing, say, the struct
  module (Lib/struct.py) from bytecode.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Paul Moore
On 7 February 2012 20:49, Antoine Pitrou solip...@pitrou.net wrote:
 Well, import time is so important that the Mercurial developers have
 written an on-demand import mechanism, to reduce the latency of
 command-line operations.

One question here, I guess - does the importlib integration do
anything to make writing on-demand import mechanisms easier (I'd
suspect not, but you never know...) If it did, then performance issues
might be somewhat less of a sticking point, as usual depending on use
cases.

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is this safe enough? Re: [Python-checkins] cpython: _Py_Identifier are always ASCII strings

2012-02-07 Thread Martin v. Löwis
 Does C99 specify the encoding? Can we expect UTF-8?

No, it's implementation-defined. However, that really doesn't matter
much for the macro (it does matter for the Mercurial repository):

The files on disk are mapped, in an implementation-defined manner,
into the source character set. All processing is done there, including
any stringification. Then, for string literals, the source character set
is converted into the execution character set. So for the definition of
the _Py_identifier macro, it really matters what the run-time encoding
of the stringified identifiers is.

 Python is supposed to work on many platforms ans so support a lot of
 compilers, not only compilers supporting non-ASCII identifiers.

And your point is?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is this safe enough? Re: [Python-checkins] cpython: _Py_Identifier are always ASCII strings

2012-02-07 Thread Martin v. Löwis
Am 07.02.2012 20:10, schrieb Gregory P. Smith:
 Why do we still care about C89?  It is 2012 and we're talking about
 Python 3.  What compiler on what platform that anyone actually cares
 about does not support C99?

As Amaury says: Visual Studio still doesn't support C99. The story is
both funny and sad: In Visual Studio 2002, the release notes included
a comment that they couldn't consider C99 (in 2002), because of lack of
time, and the standard came so quickly. In 2003, they kept this notice.
In VS 2005 (IIRC), they said that there is too little customer demand
for C99 so that they didn't implement it; they recommended to use C++
or C#, anyway. Now C2011 has been published.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread PJ Eby
On Tue, Feb 7, 2012 at 3:07 PM, Brett Cannon br...@python.org wrote:

 So, if there is going to be some baseline performance target I need to hit
 to make people happy I would prefer to know what that (real-world)
 benchmark is and what the performance target is going to be on a non-debug
 build. And if people are not worried about the performance then I'm happy
 with that as well. =)


One thing I'm a bit worried about is repeated imports, especially ones that
are inside frequently-called functions.  In today's versions of Python,
this is a performance win for command-line tool platform systems like
Mercurial and PEAK, where you want to delay importing as long as possible,
in case the code that needs the import is never called at all...  but, if
it *is* used, you may still need to use it a lot of times.

When writing that kind of code, I usually just unconditionally import
inside the function, because the C code check for an already-imported
module is faster than the Python if statement I'd have to clutter up my
otherwise-clean function with.

So, in addition to the things other people have mentioned as performance
targets, I'd like to keep the slowdown factor low for this type of scenario
as well.  Specifically, the slowdown shouldn't be so much as to motivate
lazy importers like Mercurial and PEAK to need to rewrite in-function
imports to do the already-imported check ourselves.  ;-)

(Disclaimer: I haven't actually seen Mercurial's delayed/dynamic import
code, so I can't say for 100% sure if they'd be affected the same way.)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is this safe enough? Re: [Python-checkins] cpython: _Py_Identifier are always ASCII strings

2012-02-07 Thread Victor Stinner
 I'd rather restore support for allowing UTF-8 source here (I don't think
 that requiring ASCII really improves much), than rename the macro.

Done, I reverted my change.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Brett Cannon
On Tue, Feb 7, 2012 at 15:49, Antoine Pitrou solip...@pitrou.net wrote:

 On Tue, 7 Feb 2012 15:07:24 -0500
 Brett Cannon br...@python.org wrote:
 
  Now I'm going to be upfront and say I really did not want to have this
  performance conversation now as I have done *NO* profiling or analysis of
  the algorithms used in importlib in order to tune performance (e.g. the
  function that handles case-sensitivity, which is on the critical path for
  importing source code, has a platform check which could go away if I
  instead had platform-specific versions of the function that were assigned
  to a global variable at startup).

 From a cursory look, I think you're gonna have to break (special-case)
 some abstractions and have some inner loop coded in C for the common
 cases.


Wouldn't shock me if it came to that, but obviously I would like to try to
avoid it.



 That said, I think profiling and solving performance issues is critical
 *before* integrating this work. It doesn't need to be done by you, but
 the python-dev community shouldn't feel strong-armed to solve the issue.


That part of the discussion I'm staying out of since I want to see this in
so I'm biased.


   IOW I really do not look forward to someone saying importlib is so much
  slower at importing a module containing ``pass`` when (a) that never
  happens, and (b) most programs do not spend their time importing but
  instead doing interesting work.

 Well, import time is so important that the Mercurial developers have
 written an on-demand import mechanism, to reduce the latency of
 command-line operations.


Sure, but they are a somewhat extreme case.



 But it's not only important for Mercurial and the like. Even if you're
 developing a Web app, making imports slower will make restarts slower,
 and development more tedious in the first place.


Fine, startup cost from a hard crash I can buy when you are getting 1000
QPS, but development more tedious?


   So, if there is going to be some baseline performance target I need to
 hit
  to make people happy I would prefer to know what that (real-world)
  benchmark is and what the performance target is going to be on a
 non-debug
  build.

 - No significant slowdown in startup time.


What's significant and measuring what exactly? I mean startup already has a
ton of imports as it is, so this would wash out the point of measuring
practically anything else for anything small. This is why I said I want a
benchmark to target which does actual work since flat-out startup time
measures nothing meaningful but busy work. I would get more out of code
that just stat'ed every file in Lib since at least that did some work.



 - Within 25% of current performance when importing, say, the struct
  module (Lib/struct.py) from bytecode.


Why struct? It's such a small module that it isn't really a typical module.
The median file size of Lib is 11K (e.g. tabnanny.py), not 238 bytes (which
is barely past Hello World). And is this just importing struct or is this
from startup, e.g. ``python -c import struct``?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Brett Cannon
On Tue, Feb 7, 2012 at 15:24, Barry Warsaw ba...@python.org wrote:

 Brett, thanks for persevering on importlib!  Given how complicated imports
 are
 in Python, I really appreciate you pushing this forward.  I've been knee
 deep
 in both import.c and importlib at various times. ;)

 On Feb 07, 2012, at 03:07 PM, Brett Cannon wrote:

 One is maintainability. Antoine mentioned how if change occurs everyone is
 going to have to be able to fix code  in importlib, and that's the point!
 I
 don't know about the rest of you but I find Python code easier to work
 with
 than C code (and if you don't you might be subscribed to the wrong mailing
 list =). I would assume the ability to make changes or to fix bugs will be
 a lot easier with importlib than import.c. So maintainability should be
 easier when it comes to imports.

 I think it's *really* critical that importlib be well-documented.  Not just
 its API, but also design documents (what classes are there, and why it's
 decomposed that way), descriptions of how to extend and subclass, maybe
 even
 examples for doing some typical hooks.  Maybe even a guided tour or
 tutorial
 for people digging into importlib for the first time.


That's fine and not difficult to do.



 So, that is the positives. What are the negatives? Performance, of course.

 That's okay.  Get it complete, right, and usable first and then unleash the
 Pythonic hoards to bang on performance.

 IOW I really do not look forward to someone saying importlib is so much
 slower at importing a module containing ``pass`` when (a) that never
 happens, and (b) most programs do not spend their time importing but
 instead doing interesting work.

 Identifying the use cases are important here.  For example, even if it
 were a
 lot slower, Mailman wouldn't care (*I* might care because it takes longer
 to
 run my test, but my users wouldn't).  But Bazaar or Mercurial users would
 care
 a lot.


Right, which is why I'm looking for some agreed upon, concrete benchmark I
can use which isn't fluff.

-Brett



 -Barry
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/brett%40python.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Brett Cannon
On Tue, Feb 7, 2012 at 16:19, Paul Moore p.f.mo...@gmail.com wrote:

 On 7 February 2012 20:49, Antoine Pitrou solip...@pitrou.net wrote:
  Well, import time is so important that the Mercurial developers have
  written an on-demand import mechanism, to reduce the latency of
  command-line operations.

 One question here, I guess - does the importlib integration do
 anything to make writing on-demand import mechanisms easier (I'd
 suspect not, but you never know...) If it did, then performance issues
 might be somewhat less of a sticking point, as usual depending on use
 cases.


Depends on what your feature set is. I have a fully working mixin you can
add to any loader which makes it lazy if you trigger the import on reading
an attribute from the module:
http://code.google.com/p/importers/source/browse/importers/lazy.py . But if
you want to trigger the import on *writing* an attribute then I have yet to
make that work in Python source (maybe people have an idea on how to make
that work since __setattr__ doesn't mix well with __getattribute__).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Brett Cannon
On Tue, Feb 7, 2012 at 15:28, Dirkjan Ochtman dirk...@ochtman.nl wrote:

 On Tue, Feb 7, 2012 at 21:24, Barry Warsaw ba...@python.org wrote:
  Identifying the use cases are important here.  For example, even if it
 were a
  lot slower, Mailman wouldn't care (*I* might care because it takes
 longer to
  run my test, but my users wouldn't).  But Bazaar or Mercurial users
 would care
  a lot.

 Yeah, startup performance getting worse kinda sucks for command-line
 apps. And IIRC it's been getting worse over the past few releases...

 Anyway, I think there was enough of a python3 port for Mercurial (from
 various GSoC students) that you can probably run some of the very
 simple commands (like hg parents or hg id), which should be enough for
 your purposes, right?


Possibly. Where is the code?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Brett Cannon
On Tue, Feb 7, 2012 at 16:51, PJ Eby p...@telecommunity.com wrote:

 On Tue, Feb 7, 2012 at 3:07 PM, Brett Cannon br...@python.org wrote:

 So, if there is going to be some baseline performance target I need to
 hit to make people happy I would prefer to know what that (real-world)
 benchmark is and what the performance target is going to be on a non-debug
 build. And if people are not worried about the performance then I'm happy
 with that as well. =)


 One thing I'm a bit worried about is repeated imports, especially ones
 that are inside frequently-called functions.  In today's versions of
 Python, this is a performance win for command-line tool platform systems
 like Mercurial and PEAK, where you want to delay importing as long as
 possible, in case the code that needs the import is never called at all...
  but, if it *is* used, you may still need to use it a lot of times.

 When writing that kind of code, I usually just unconditionally import
 inside the function, because the C code check for an already-imported
 module is faster than the Python if statement I'd have to clutter up my
 otherwise-clean function with.

 So, in addition to the things other people have mentioned as performance
 targets, I'd like to keep the slowdown factor low for this type of scenario
 as well.  Specifically, the slowdown shouldn't be so much as to motivate
 lazy importers like Mercurial and PEAK to need to rewrite in-function
 imports to do the already-imported check ourselves.  ;-)

 (Disclaimer: I haven't actually seen Mercurial's delayed/dynamic import
 code, so I can't say for 100% sure if they'd be affected the same way.)


IOW you want the sys.modules case fast, which I will never be able to match
compared to C code since that is pure execution with no I/O.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] which C language standard CPython must conform to

2012-02-07 Thread Gregory P. Smith
On Tue, Feb 7, 2012 at 1:41 PM, Martin v. Löwis mar...@v.loewis.de wrote:
 Am 07.02.2012 20:10, schrieb Gregory P. Smith:
 Why do we still care about C89?  It is 2012 and we're talking about
 Python 3.  What compiler on what platform that anyone actually cares
 about does not support C99?

 As Amaury says: Visual Studio still doesn't support C99. The story is
 both funny and sad: In Visual Studio 2002, the release notes included
 a comment that they couldn't consider C99 (in 2002), because of lack of
 time, and the standard came so quickly. In 2003, they kept this notice.
 In VS 2005 (IIRC), they said that there is too little customer demand
 for C99 so that they didn't implement it; they recommended to use C++
 or C#, anyway. Now C2011 has been published.

Thanks!  I've probably asked this question before.  Maybe I'll learn
this time. ;)

Some quick searching shows that there is at least hope Microsoft is on
board with C++11x (not so surprising, their crown jewels are written
in C++).  We should at some point demand a C++ compiler for CPython
and pick of subset of C++ features to allow use of but that is likely
reserved for the Python 4 timeframe (a topic for another thread and
time entirely, it isn't feasible for today's codebase).

In that timeframe another alternative Question may make sense to ask:
Do we need a single unified all-platform-from-one-codebase python
interpreter?

If we can get other VM implementations up to date language feature
wise and manage to sufficiently decouple standard library development
from CPython itself that becomes possibile.  One of the difficulties
with that would obviously be new language feature development if it
meant updating more than one VM at a time in order to ship an
implementation of a new pep.

-gps
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Antoine Pitrou
On Tue, 7 Feb 2012 17:24:21 -0500
Brett Cannon br...@python.org wrote:
 
 IOW you want the sys.modules case fast, which I will never be able to match
 compared to C code since that is pure execution with no I/O.

Why wouldn't continue using C code for that? It's trivial (just a dict
lookup).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Barry Warsaw
On Feb 07, 2012, at 09:19 PM, Paul Moore wrote:

One question here, I guess - does the importlib integration do
anything to make writing on-demand import mechanisms easier (I'd
suspect not, but you never know...) If it did, then performance issues
might be somewhat less of a sticking point, as usual depending on use
cases.

It might even be a feature-win if a standard on-demand import mechanism could
be added on top of importlib so all these projects wouldn't have to roll their
own.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Antoine Pitrou
On Tue, 7 Feb 2012 17:16:18 -0500
Brett Cannon br...@python.org wrote:
 
IOW I really do not look forward to someone saying importlib is so much
   slower at importing a module containing ``pass`` when (a) that never
   happens, and (b) most programs do not spend their time importing but
   instead doing interesting work.
 
  Well, import time is so important that the Mercurial developers have
  written an on-demand import mechanism, to reduce the latency of
  command-line operations.
 
 
 Sure, but they are a somewhat extreme case.

I don't think Mercurial is extreme. Any command-line tool written in
Python applies. For example, yum (Fedora's apt-get) is written in
Python. And I'm sure many people do small administration scripts in
Python. These tools may then be run in a loop by whatever other script.

  But it's not only important for Mercurial and the like. Even if you're
  developing a Web app, making imports slower will make restarts slower,
  and development more tedious in the first place.
 
 
 Fine, startup cost from a hard crash I can buy when you are getting 1000
 QPS, but development more tedious?

Well, waiting several seconds when reloading a development server is
tedious. Anyway, my point was that other cases (than command-line
tools) can be negatively impacted by import time.

So, if there is going to be some baseline performance target I need to
  hit
   to make people happy I would prefer to know what that (real-world)
   benchmark is and what the performance target is going to be on a
  non-debug
   build.
 
  - No significant slowdown in startup time.
 
 
 What's significant and measuring what exactly? I mean startup already has a
 ton of imports as it is, so this would wash out the point of measuring
 practically anything else for anything small.

I don't understand your sentence. Yes, startup has a ton of imports and
that's why I'm fearing it may be negatively impacted :)

(a ton being a bit less than 50 currently)

 This is why I said I want a
 benchmark to target which does actual work since flat-out startup time
 measures nothing meaningful but busy work.

Actual work can be very small in some cases. For example, if you run
hg branch I'm quite sure it doesn't do a lot of work except importing
many modules and then reading a single file in .hg (the one named
.hg/branch probably, but I'm not a Mercurial dev).

In the absence of more real world benchmarks, I think the startup
benchmarks in the benchmarks repo are a good baseline. 

That said you could also install my 3.x port of Twisted here:
https://bitbucket.org/pitrou/t3k/

and then run e.g. python3 bin/trial -h.

 I would get more out of code
 that just stat'ed every file in Lib since at least that did some work.

stat()ing files is not really representative of import work. There are
many indirections in the import machinery.
(actually, even import.c appears quite slower than a bunch of stat()
calls would imply)

  - Within 25% of current performance when importing, say, the struct
   module (Lib/struct.py) from bytecode.
 
 
 Why struct? It's such a small module that it isn't really a typical module.

Precisely to measure the overhead. Typical module size will vary
depending on development style. Some people may prefer writing many
small modules. Or they may be using many small libraries, or using
libraries that have adoptes such a development style.

Measuring the overhead on small modules will make sure we aren't overly
confident.

 The median file size of Lib is 11K (e.g. tabnanny.py), not 238 bytes (which
 is barely past Hello World). And is this just importing struct or is this
 from startup, e.g. ``python -c import struct``?

Just importing struct, as with the timeit snippets in the other thread.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Alex Gaynor
Brett Cannon brett at python.org writes:


 IOW you want the sys.modules case fast, which I will never be able to match 
compared to C code since that is pure execution with no I/O.
 


Sure you can: have a really fast Python VM.

Constructive: if you can run this code under PyPy it'd be easy to just:

$ pypy -mtimeit import struct
$ pypy -mtimeit -s import importlib importlib.import_module('struct')

Or whatever the right API is.

Alex

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Terry Reedy

On 2/7/2012 4:51 PM, PJ Eby wrote:


One thing I'm a bit worried about is repeated imports, especially ones
that are inside frequently-called functions.  In today's versions of
Python, this is a performance win for command-line tool platform
systems like Mercurial and PEAK, where you want to delay importing as
long as possible, in case the code that needs the import is never called
at all...  but, if it *is* used, you may still need to use it a lot of
times.

When writing that kind of code, I usually just unconditionally import
inside the function, because the C code check for an already-imported
module is faster than the Python if statement I'd have to clutter up
my otherwise-clean function with.


importlib could provide a parameterized decorator for functions that are 
the only consumers of an import. It could operate much like this:


def imps(mod):
def makewrap(f):
def wrapped(*args, **kwds):
print('first/only call to wrapper')
g = globals()
g[mod] = __import__(mod)
g[f.__name__] = f
f(*args, **kwds)
wrapped.__name__ = f.__name__
return wrapped
return makewrap

@imps('itertools')
def ic():
print(itertools.count)

ic()
ic()
#
first/only call to wrapper
class 'itertools.count'
class 'itertools.count'

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Add a new locale codec?

2012-02-07 Thread Victor Stinner
Hi,

I added PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize() and
PyUnicode_EncodeLocale() to Python 3.3 to fix bugs. I hesitate to
expose this codec in Python: it can be useful is some cases,
especially if you need to interact with C functions.

The glib library has functions using the *current* locale encoding,
g_locale_from_utf8() for example.

Related issue with more information:
http://bugs.python.org/issue13619

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread PJ Eby
On Tue, Feb 7, 2012 at 5:24 PM, Brett Cannon br...@python.org wrote:


 On Tue, Feb 7, 2012 at 16:51, PJ Eby p...@telecommunity.com wrote:

 On Tue, Feb 7, 2012 at 3:07 PM, Brett Cannon br...@python.org wrote:

 So, if there is going to be some baseline performance target I need to
 hit to make people happy I would prefer to know what that (real-world)
 benchmark is and what the performance target is going to be on a non-debug
 build. And if people are not worried about the performance then I'm happy
 with that as well. =)


 One thing I'm a bit worried about is repeated imports, especially ones
 that are inside frequently-called functions.  In today's versions of
 Python, this is a performance win for command-line tool platform systems
 like Mercurial and PEAK, where you want to delay importing as long as
 possible, in case the code that needs the import is never called at all...
  but, if it *is* used, you may still need to use it a lot of times.

 When writing that kind of code, I usually just unconditionally import
 inside the function, because the C code check for an already-imported
 module is faster than the Python if statement I'd have to clutter up my
 otherwise-clean function with.

 So, in addition to the things other people have mentioned as performance
 targets, I'd like to keep the slowdown factor low for this type of scenario
 as well.  Specifically, the slowdown shouldn't be so much as to motivate
 lazy importers like Mercurial and PEAK to need to rewrite in-function
 imports to do the already-imported check ourselves.  ;-)

 (Disclaimer: I haven't actually seen Mercurial's delayed/dynamic import
 code, so I can't say for 100% sure if they'd be affected the same way.)


 IOW you want the sys.modules case fast, which I will never be able to
 match compared to C code since that is pure execution with no I/O.


Couldn't you just prefix the __import__ function with something like this:

 ...
 try:
  module = sys.modules[name]
 except KeyError:
  # slow code path

(Admittedly, the import lock is still a problem; initially I thought you
could just skip it for this case, but the problem is that another thread
could be in the middle of executing the module.)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread PJ Eby
On Tue, Feb 7, 2012 at 6:40 PM, Terry Reedy tjre...@udel.edu wrote:

 importlib could provide a parameterized decorator for functions that are
 the only consumers of an import. It could operate much like this:

 def imps(mod):
def makewrap(f):
def wrapped(*args, **kwds):
print('first/only call to wrapper')
g = globals()
g[mod] = __import__(mod)
g[f.__name__] = f
f(*args, **kwds)
wrapped.__name__ = f.__name__
return wrapped
return makewrap

 @imps('itertools')
 def ic():
print(itertools.count)

 ic()
 ic()
 #
 first/only call to wrapper
 class 'itertools.count'
 class 'itertools.count'


If I were going to rewrite code, I'd just use lazy imports (see
http://pypi.python.org/pypi/Importing ).  They're even faster than this
approach (or using plain import statements), as they have zero per-call
function call overhead.  It's just that not everything I write can depend
on Importing.

Throw an equivalent into the stdlib, though, and I guess I wouldn't have to
worry about dependencies...

(To be clearer; I'm talking about the
http://peak.telecommunity.com/DevCenter/Importing#lazy-imports feature,
which sticks a dummy module subclass instance into sys.modules, whose
__gettattribute__ does a reload() of the module, forcing the normal import
process to run, after first changing the dummy object's type to something
that doesn't have the __getattribute__ any more.  This ensures that all
accesses after the first one are at normal module attribute access speed.
 That, and the whenImported decorator from Importing would probably be of
general stdlib usefulness too.)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Terry Reedy

On 2/7/2012 9:35 PM, PJ Eby wrote:

On Tue, Feb 7, 2012 at 6:40 PM, Terry Reedy tjre...@udel.edu
mailto:tjre...@udel.edu wrote:

importlib could provide a parameterized decorator for functions that
are the only consumers of an import. It could operate much like this:

def imps(mod):
def makewrap(f):
def wrapped(*args, **kwds):
print('first/only call to wrapper')
g = globals()
g[mod] = __import__(mod)
g[f.__name__] = f
f(*args, **kwds)
wrapped.__name__ = f.__name__
return wrapped
return makewrap

@imps('itertools')
def ic():
print(itertools.count)

ic()
ic()
#
first/only call to wrapper
class 'itertools.count'
class 'itertools.count'


If I were going to rewrite code, I'd just use lazy imports (see
http://pypi.python.org/pypi/Importing ).  They're even faster than this
approach (or using plain import statements), as they have zero per-call
function call overhead.


My code above and Importing, as I understand it, both delay imports 
until needed by using a dummy object that gets replaced at first access. 
(Now that I am reminded, sys.modules is the better place for the dummy 
objects. I just wanted to show that there is a simple solution (though 
more specialized) even for existing code.) The cost of delay, which 
might mean never, is a bit of one-time extra overhead. Both have no 
extra overhead after the first call. Unless delayed importing is made 
standard, both require a bit of extra code somewhere.



 It's just that not everything I write can depend on Importing.
Throw an equivalent into the stdlib, though, and I guess I wouldn't have
to worry about dependencies...


And that is what I think (agree?) should be done to counteract the 
likely slowdown from using importlib.



(To be clearer; I'm talking about the
http://peak.telecommunity.com/DevCenter/Importing#lazy-imports feature,
which sticks a dummy module subclass instance into sys.modules, whose
__gettattribute__ does a reload() of the module, forcing the normal
import process to run, after first changing the dummy object's type to
something that doesn't have the __getattribute__ any more.  This ensures
that all accesses after the first one are at normal module attribute
access speed.  That, and the whenImported decorator from Importing
would probably be of general stdlib usefulness too.)


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing the XML batteries

2012-02-07 Thread Eli Bendersky
 On one hand I agree that ET should be emphasized since it's the better
 API with a much faster implementation. But I also understand Martin's
 point of view that minidom has its place, so IMHO some sort of
 compromise should be reached. Perhaps we can recommend using ET for
 those not specifically interested in the DOM interface, but for those
 who *are*, minidom is still a good stdlib option (?).


 If you can, go ahead and write a patch saying something like that. It should
 not be hard to come up with something that is a definite improvement. Create
 a tracker issue for comment. but don't let it sit forever.



A tracker issue already exists for this -
http://bugs.python.org/issue11379 - I see no reason to open a new one.
I will add my opinion there - feel free to do that too.

 Since the current policy seems to be to hide C behind Python when there is
 both, I assume that finishing the transition here is something just not
 gotten around to yet. Open another issue if there is not one.


I will open a separate discussion on this.

Eli
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Nick Coghlan
On Wed, Feb 8, 2012 at 12:54 PM, Terry Reedy tjre...@udel.edu wrote:
 On 2/7/2012 9:35 PM, PJ Eby wrote:
  It's just that not everything I write can depend on Importing.
 Throw an equivalent into the stdlib, though, and I guess I wouldn't have
 to worry about dependencies...

 And that is what I think (agree?) should be done to counteract the likely
 slowdown from using importlib.

Yeah, this is one frequently reinvented wheel that could definitely do
with a standard implementation. Christian Heimes made an initial
attempt at such a thing years ago with PEP 369, but an importlib based
__import__ would let the implementation largely be pure Python (with
all the increase in power and flexibility that implies).

I'm not sure such an addition would help much with the base
interpreter start up time though - most of the modules we bring in are
because we're actually using them for some reason.

The other thing that shouldn't be underrated here is the value in
making the builtin import system PEP 302 compliant from a
*documentation* perspective. I've made occasional attempts at fully
documenting the import system over the years, and I always end up
giving up because the combination of the pre-PEP 302 builtin
mechanisms in import.c and the PEP 302 compliant mechanisms for things
like zipimport just degenerate into a mess of special cases that are
impossible to justify beyond nobody got around to fixing this yet.
The fact that we have an undocumented PEP 302 based reimplementation
of imports squirrelled away in pkgutil to make pkgutil and runpy work
is sheer insanity (replacing *that* with importlib might actually be a
good first step towards full integration).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] folding cElementTree behind ElementTree in 3.3

2012-02-07 Thread Eli Bendersky
Hello,

Here's a note from What's new in Python 3.0:

A common pattern in Python 2.x is to have one version of a module
implemented in pure Python, with an optional accelerated version
implemented as a C extension; for example, pickle and cPickle. This
places the burden of importing the accelerated version and falling
back on the pure Python version on each user of these modules. In
Python 3.0, the accelerated versions are considered implementation
details of the pure Python versions. Users should always import the
standard version, which attempts to import the accelerated version and
falls back to the pure Python version. The pickle / cPickle pair
received this treatment. The profile module is on the list for 3.1.
The StringIO module has been turned into a class in the io module.

Is there a good reason why xml.etree.ElementTree /
xml.etree.cElementTree did not receive this treatment?

In the case of this module, it's quite unfortunate because:

1. The accelerated module is much faster and memory efficient (see
recent benchmarks here: http://bugs.python.org/issue11379), and XML
processing is an area where processing matters
2. The accelerated module implements the same API
3. It's very hard to even find out about the existence of the
accelerated module. Its sole mention in the docs is this un-emphasized
line in http://docs.python.org/dev/py3k/library/xml.etree.elementtree.html:

A C implementation of this API is available as xml.etree.cElementTree.

Even to an experienced user who carefully reads the whole
documentation it's not easy to notice. For the typical user who just
jumps around to functions/methods he's interested in, it's essentially
invisible.

Eli
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3

2012-02-07 Thread Nick Coghlan
On Wed, Feb 8, 2012 at 1:59 PM, Eli Bendersky eli...@gmail.com wrote:
 Is there a good reason why xml.etree.ElementTree /
 xml.etree.cElementTree did not receive this treatment?

See PEP 360, which lists Externally Maintained Packages. In the past
we allowed additions to the standard library without requiring that
the standard library version become the master version. These days we
expect python.org to become the master version, perhaps with backports
and experimental features published on PyPI (cf. packaging vs
distutils2, unittest vs unittest, contextlib vs contextlib2).

ElementTree was one of the last of those externally maintained modules
added to the standard library - as documented in the PEP, it's still
officially maintained by Fredrik Lundh. Folding the two
implementations together in the standard library would mean officially
declaring that xml.etree is now an independently maintained fork of
Fredrik's version rather than just a snapshot in time of a
particular version (which is what it has been historically).

So the reasons for keeping these two separate to date isn't technical,
it's because Fredrik publishes them as separate modules.

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] requirements for moving __import__ over to importlib?

2012-02-07 Thread Eric Snow
On Tue, Feb 7, 2012 at 8:47 PM, Nick Coghlan ncogh...@gmail.com wrote:
 On Wed, Feb 8, 2012 at 12:54 PM, Terry Reedy tjre...@udel.edu wrote:
 On 2/7/2012 9:35 PM, PJ Eby wrote:
  It's just that not everything I write can depend on Importing.
 Throw an equivalent into the stdlib, though, and I guess I wouldn't have
 to worry about dependencies...

 And that is what I think (agree?) should be done to counteract the likely
 slowdown from using importlib.

 Yeah, this is one frequently reinvented wheel that could definitely do
 with a standard implementation. Christian Heimes made an initial
 attempt at such a thing years ago with PEP 369, but an importlib based
 __import__ would let the implementation largely be pure Python (with
 all the increase in power and flexibility that implies).

 I'm not sure such an addition would help much with the base
 interpreter start up time though - most of the modules we bring in are
 because we're actually using them for some reason.

 The other thing that shouldn't be underrated here is the value in
 making the builtin import system PEP 302 compliant from a
 *documentation* perspective. I've made occasional attempts at fully
 documenting the import system over the years, and I always end up
 giving up because the combination of the pre-PEP 302 builtin
 mechanisms in import.c and the PEP 302 compliant mechanisms for things
 like zipimport just degenerate into a mess of special cases that are
 impossible to justify beyond nobody got around to fixing this yet.
 The fact that we have an undocumented PEP 302 based reimplementation
 of imports squirrelled away in pkgutil to make pkgutil and runpy work
 is sheer insanity (replacing *that* with importlib might actually be a
 good first step towards full integration).

+1 on all counts

-eric
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3

2012-02-07 Thread Fred Drake
On Tue, Feb 7, 2012 at 11:31 PM, Eli Bendersky eli...@gmail.com wrote:
 Besides, in 
 http://mail.python.org/pipermail/python-dev/2011-December/114812.html
 Stefan Behnel said [...] Today, ET is *only* being maintained in the
 stdlib by Florent Xicluna [...]. Is this not true?

I don't know.  I took this to be an observation rather than a declaration
of intent by the package owner (Fredrik Lundh).

 P.S. Would declaring that xml.etree is now independently maintained by
 pydev be a bad thing? Why?

So long as Fredrik owns the package, I think forking it for the standard
library would be a bad thing, though not for technical reasons.  Fredrik
provided his libraries for the standard library in good faith, and we still
list him as the external maintainer.  Until *that* changes, forking would
be inappropriate.  I'd much rather see a discussion with Fredrik about the
future maintenance plan for ElementTree and cElementTree.


  -Fred

-- 
Fred L. Drake, Jr.    fdrake at acm.org
A person who won't read has no advantage over one who can't read.
   --Samuel Langhorne Clemens
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3

2012-02-07 Thread Eli Bendersky
On Wed, Feb 8, 2012 at 06:41, Fred Drake fdr...@acm.org wrote:
 On Tue, Feb 7, 2012 at 11:31 PM, Eli Bendersky eli...@gmail.com wrote:
 Besides, in 
 http://mail.python.org/pipermail/python-dev/2011-December/114812.html
 Stefan Behnel said [...] Today, ET is *only* being maintained in the
 stdlib by Florent Xicluna [...]. Is this not true?

 I don't know.  I took this to be an observation rather than a declaration
 of intent by the package owner (Fredrik Lundh).

 P.S. Would declaring that xml.etree is now independently maintained by
 pydev be a bad thing? Why?

 So long as Fredrik owns the package, I think forking it for the standard
 library would be a bad thing, though not for technical reasons.  Fredrik
 provided his libraries for the standard library in good faith, and we still
 list him as the external maintainer.  Until *that* changes, forking would
 be inappropriate.  I'd much rather see a discussion with Fredrik about the
 future maintenance plan for ElementTree and cElementTree.


Yes, I realize this is a loaded issue and I agree that all steps in
this direction should be taken with Fredrik's agreement.

However, to re-focus: The initial proposal of changing *the stdlib
import facade* for xml.etree.ElementTree to use the C accelerator
(_elementtree) by default. Will that somehow harm Fredrik's
sovereignty over ET? Are there any other problems hidden here? Because
if not, it appears like a change of only a few lines of code could
provide a significantly better XML processing experience in 3.3 for a
lot of users (and save some keystrokes for the ones who already know
to look for cElementTree).

Eli
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3

2012-02-07 Thread Brian Curtin
On Tue, Feb 7, 2012 at 22:15, Nick Coghlan ncogh...@gmail.com wrote:
 Folding the two
 implementations together in the standard library would mean officially
 declaring that xml.etree is now an independently maintained fork of
 Fredrik's version rather than just a snapshot in time of a
 particular version (which is what it has been historically).

Is ElementTree even still maintained externally? I seem to remember
Florent going through headaches to get changes into this area, and I
can't find an external repository for this code.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3

2012-02-07 Thread Eli Bendersky
On Wed, Feb 8, 2012 at 06:15, Nick Coghlan ncogh...@gmail.com wrote:
 On Wed, Feb 8, 2012 at 1:59 PM, Eli Bendersky eli...@gmail.com wrote:
 Is there a good reason why xml.etree.ElementTree /
 xml.etree.cElementTree did not receive this treatment?

 See PEP 360, which lists Externally Maintained Packages. In the past
 we allowed additions to the standard library without requiring that
 the standard library version become the master version. These days we
 expect python.org to become the master version, perhaps with backports
 and experimental features published on PyPI (cf. packaging vs
 distutils2, unittest vs unittest, contextlib vs contextlib2).

 ElementTree was one of the last of those externally maintained modules
 added to the standard library - as documented in the PEP, it's still
 officially maintained by Fredrik Lundh. Folding the two
 implementations together in the standard library would mean officially
 declaring that xml.etree is now an independently maintained fork of
 Fredrik's version rather than just a snapshot in time of a
 particular version (which is what it has been historically).

 So the reasons for keeping these two separate to date isn't technical,
 it's because Fredrik publishes them as separate modules.


The idea is to import the C module when xml.etree.ElementTree is
imported, falling back to the Python module if that fails for some
reason. So this is not modifying the modules, just the Python stdlib
facade for them.

Besides, in 
http://mail.python.org/pipermail/python-dev/2011-December/114812.html
Stefan Behnel said [...] Today, ET is *only* being maintained in the
stdlib by Florent
Xicluna [...]. Is this not true?

Eli

P.S. Would declaring that xml.etree is now independently maintained by
pydev be a bad thing? Why?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3

2012-02-07 Thread Fred Drake
On Tue, Feb 7, 2012 at 11:46 PM, Eli Bendersky eli...@gmail.com wrote:
 The initial proposal of changing *the stdlib
 import facade* for xml.etree.ElementTree to use the C accelerator
 (_elementtree) by default.

I guess this is one source of confusion: what are you referring to an
an import façade?  When I look in Lib/xml/etree/, I see the ElementTree,
ElementPath, and ElementInclude modules, and a wrapper for cElementTree's
extension module.

There isn't any sort of façade for ElementTree; are you proposing to add
one, perhaps in xml.etree/__init__.py?


  -Fred

-- 
Fred L. Drake, Jr.    fdrake at acm.org
A person who won't read has no advantage over one who can't read.
   --Samuel Langhorne Clemens
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3

2012-02-07 Thread Eli Bendersky
On Wed, Feb 8, 2012 at 07:10, Fred Drake fdr...@acm.org wrote:
 On Tue, Feb 7, 2012 at 11:46 PM, Eli Bendersky eli...@gmail.com wrote:
 The initial proposal of changing *the stdlib
 import facade* for xml.etree.ElementTree to use the C accelerator
 (_elementtree) by default.

 I guess this is one source of confusion: what are you referring to an
 an import façade?  When I look in Lib/xml/etree/, I see the ElementTree,
 ElementPath, and ElementInclude modules, and a wrapper for cElementTree's
 extension module.

 There isn't any sort of façade for ElementTree; are you proposing to add
 one, perhaps in xml.etree/__init__.py?


AFAICS ElementPath is a helper used by ElementTree, and cElementTree
has one of its own. It's not documented for stand-alone use.
ElementInclude also isn't documented and doesn't appear to be used
anywhere.

The facade can be added to xml/etree/ElementTree.py since that's the
only documented module. It can attempt to do:

from _elementtree import *

(which is what cElementTree.py) does, and on failure, just go on doing
what it does now.

Eli
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3

2012-02-07 Thread Stefan Behnel
Eli Bendersky, 08.02.2012 07:07:
 On Wed, Feb 8, 2012 at 07:10, Fred Drake wrote:
 On Tue, Feb 7, 2012 at 11:46 PM, Eli Bendersky wrote:
 The initial proposal of changing *the stdlib
 import facade* for xml.etree.ElementTree to use the C accelerator
 (_elementtree) by default.

 I guess this is one source of confusion: what are you referring to an
 an import façade?  When I look in Lib/xml/etree/, I see the ElementTree,
 ElementPath, and ElementInclude modules, and a wrapper for cElementTree's
 extension module.

 There isn't any sort of façade for ElementTree; are you proposing to add
 one, perhaps in xml.etree/__init__.py?
 
 
 AFAICS ElementPath is a helper used by ElementTree, and cElementTree
 has one of its own. It's not documented for stand-alone use.
 ElementInclude also isn't documented and doesn't appear to be used
 anywhere.
 
 The facade can be added to xml/etree/ElementTree.py since that's the
 only documented module. It can attempt to do:
 
 from _elementtree import *
 
 (which is what cElementTree.py) does, and on failure, just go on doing
 what it does now.

Basically, cElementTree (actually the accelerator module) reuses everything
from ElementTree that it does not implement itself, e.g. the serialiser or
the ElementPath implementation in ElementPath.py (which is not commonly
being used by itself anyway).

ElementInclude is meant to be independently imported by user code and works
with both implementations, although it uses plain ElementTree by default
and currently needs explicit configuring for cElementTree. It looks like
that need would vanish when ElementTree uses the accelerator module internally.

So, ElementTree.py is a superset of cElementTree's C module, and importing
that C module into ElementTree.py instead of only importing it into
cElementTree.py would just make ElementTree.py faster, that's basically it.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3

2012-02-07 Thread Eli Bendersky
 The facade can be added to xml/etree/ElementTree.py since that's the
 only documented module. It can attempt to do:

 from _elementtree import *

 (which is what cElementTree.py) does, and on failure, just go on doing
 what it does now.

 Basically, cElementTree (actually the accelerator module) reuses everything
 from ElementTree that it does not implement itself, e.g. the serialiser or
 the ElementPath implementation in ElementPath.py (which is not commonly
 being used by itself anyway).

 ElementInclude is meant to be independently imported by user code and works
 with both implementations, although it uses plain ElementTree by default
 and currently needs explicit configuring for cElementTree. It looks like
 that need would vanish when ElementTree uses the accelerator module 
 internally.

 So, ElementTree.py is a superset of cElementTree's C module, and importing
 that C module into ElementTree.py instead of only importing it into
 cElementTree.py would just make ElementTree.py faster, that's basically it.


Yep. Any objections from pydev?

Stefan, in the other thread (... XML batteries ) you said you will
contact Fredrik, did you manage to get hold of him?

Eli
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-07 Thread Simon Cross
Is the idea to have:

  bfoo.decode(locale)

be roughly equivalent to

  encoding = locale.getpreferredencoding(False)
  bfoo.decode(encoding)

?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3

2012-02-07 Thread Stefan Behnel
Fred Drake, 08.02.2012 05:41:
 On Tue, Feb 7, 2012 at 11:31 PM, Eli Bendersky wrote:
 Besides, in 
 http://mail.python.org/pipermail/python-dev/2011-December/114812.html
 Stefan Behnel said [...] Today, ET is *only* being maintained in the
 stdlib by Florent Xicluna [...]. Is this not true?
 
 I don't know.  I took this to be an observation rather than a declaration
 of intent by the package owner (Fredrik Lundh).

This observation resulted from the fact that Fredrik hasn't updated the
code in his public ElementTree repository(ies) since 2009, i.e. way before
the release of Python 2.7 and 3.2 that integrated these changes.

https://bitbucket.org/effbot/et-2009-provolone/overview

The integration of ElementTree 1.3 into the standard library was almost
exclusively done by Florent, with some supporting comments by Fredrik. Note
that ElementTree 1.3 has not even been officially released yet, so the only
final public release of it is in the standard library. Since then,
Florent has been actively working on bug tickets, most of which have not
received any reaction on the side of Fredrik.

That makes me consider it the reality that today, ET is only being
maintained in the stdlib.


 P.S. Would declaring that xml.etree is now independently maintained by
 pydev be a bad thing? Why?
 
 So long as Fredrik owns the package, I think forking it for the standard
 library would be a bad thing, though not for technical reasons.  Fredrik
 provided his libraries for the standard library in good faith, and we still
 list him as the external maintainer.  Until *that* changes, forking would
 be inappropriate.  I'd much rather see a discussion with Fredrik about the
 future maintenance plan for ElementTree and cElementTree.

I didn't get a response from him to my e-mails since early 2010. Maybe
others have more luck if they try, but I don't have the impression that
waiting another two years gets us anywhere interesting.

Given that it was two months ago that I started the Fixing the XML
batteries thread (and years since I brought up the topic for the first
time), it seems to be hard enough already to get anyone on python-dev
actually do something for Python's XML support, instead of just actively
discouraging those who invest time and work into it.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com