[issue12747] Move devguide into /Docs of cpython repo

2011-08-13 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

Actually if we move the devguide to Doc/ we will have to maintain a copy for 
each branch -- that's the real reason why it's in a separate repo.

So I think it's better to leave the devguide in a separate repo, and keep using 
it to document things that are not strictly dependent on specific Python 
releases.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12747
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12747] Move devguide into cpython repo

2011-08-13 Thread Eric Snow

Eric Snow ericsnowcurren...@gmail.com added the comment:

I suppose it doesn't have to be in Doc/.

--
title: Move devguide into /Docs of cpython repo - Move devguide into cpython 
repo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12747
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12747] Move devguide into cpython repo

2011-08-13 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

The possible options I see are:
  1) move it in Doc/ or some other dir -- but we will have
 to maintain it in all the branches;
  2) keep it only in the default branch -- but we will have
 to remove it from the old branch when we cut a release;
  3) make a separate branch for the devguide -- I'm not sure
 this makes even sense and/or if it solves anything;

Also if it's not in Doc/, where would the test.support doc go?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12747
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12747] Move devguide into cpython repo

2011-08-13 Thread Nick Coghlan

Nick Coghlan ncogh...@gmail.com added the comment:

I'd say the main reason the dev guide is in a separate repo is the historical 
one (i.e. Brett was working on it as a separate repo prior to the hg migration 
and we never merged it).

However, the version independent nature of the material is the main argument 
against merging it into the Docs tree - it's a document about the development 
community around CPython, not a document about CPython itself.

Personally, I'm happy with the resolution in the python-dev thread - tagging 
the test.support docs to keep them out of indices and search results, while 
leaving the dev guide in a separate version independent repo.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12747
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11866] race condition in threading._newname()

2011-08-13 Thread Peter Saveliev

Peter Saveliev svinota.savel...@gmail.com added the comment:

counter.next() is a C routine and it is atomic from Python's point of view — if 
I understand right.

The test shows that original threading.py leads to a (rare) race here, while 
with counter object there is no race condition.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11866
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11866] race condition in threading._newname()

2011-08-13 Thread Raymond Hettinger

Raymond Hettinger raymond.hettin...@gmail.com added the comment:

I think the patch is correct.

FWIW, my style is to prebind the next method, making the counter completely 
self-contained (like a closure):

+_counter = itertools.count().next
 def _newname(template=Thread-%d):
 global _counter
-_counter = _counter + 1
-return template % _counter
+return template % _counter()

--
assignee:  - rhettinger
nosy: +rhettinger
resolution:  - accepted

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11866
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12746] normalization is affected by unicode width

2011-08-13 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
components: +Unicode
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12746
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2857] Add java modified utf-8 codec

2011-08-13 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 Python does have other weird encodings like bz2 or rot13.

No, it has no more such weird encodings.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2857
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-13 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12731
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11513] chained exception/incorrect exception from tarfile.open on a non-existent file

2011-08-13 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 843cd43206b4 by Georg Brandl in branch '3.2':
Fix #11513: wrong exception handling for the case that GzipFile itself raises 
an IOError.
http://hg.python.org/cpython/rev/843cd43206b4

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11513
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11513] chained exception/incorrect exception from tarfile.open on a non-existent file

2011-08-13 Thread Georg Brandl

Georg Brandl ge...@python.org added the comment:

Fixed in 3.2/default.

2.7 has even more primitive error handling; should the gzopen() be adapted to 
the 3.x case?

--
nosy: +georg.brandl

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11513
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10799] Improve webbrowser (.open) doc and behavior

2011-08-13 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10799
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-13 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +haypo, loewis
stage:  - needs patch
versions: +Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12737
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12746] normalization is affected by unicode width

2011-08-13 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +haypo, lemburg

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12746
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12646] zlib.Decompress.decompress/flush do not raise any exceptions when given truncated input streams

2011-08-13 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset bb6c2d5c811d by Nadeem Vawda in branch 'default':
Issue #12646: Add an 'eof' attribute to zlib.Decompress.
http://hg.python.org/cpython/rev/bb6c2d5c811d

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12646
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12669] test_curses skipped on buildbots

2011-08-13 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 4358909ee221 by Nadeem Vawda in branch 'default':
Issue #12669: Fix test_curses so that it can run on the buildbots.
http://hg.python.org/cpython/rev/4358909ee221

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12669
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12646] zlib.Decompress.decompress/flush do not raise any exceptions when given truncated input streams

2011-08-13 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 65d61ed991d9 by Nadeem Vawda in branch 'default':
Fix incorrect comment in zlib.Decompress.flush().
http://hg.python.org/cpython/rev/65d61ed991d9

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12646
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12723] Provide an API in tkSimpleDialog for defining custom validation functions

2011-08-13 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

A bit of both, I think.  The current function is actually 'getvalue' and is 
responsible for retrieving the value, validating its type, and converting to 
that type (the current ones do both in the same operation).  It feels to me 
like a cleaner interface to decouple retrieval and validation/conversion, so 
that the validation function gets passed a string and returns the desired type. 
 But in that case, having the string dialog take the validation/coercion 
function makes the name of the askstring function just wrong.

So, I still think the cleaner API is to expose the class and let the 
application subclass to provide the validation function.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12723
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12646] zlib.Decompress.decompress/flush do not raise any exceptions when given truncated input streams

2011-08-13 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed
type: behavior - feature request

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12646
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-13 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Tom, note that nobody is arguing that what you are requesting is a bad thing :)

As far as I know, Matthew is the only one currently working on the regex 
support in Python.  (Other developers will commit small fixes if someone 
proposes a patch, but no one that I've seen other than Matthew is working on 
the deeper issues.)  If you want to help out that would be great.

And as far as this particular issue goes, yes the difference between the narrow 
and wide build has been a known issue for a long time, but has become less and 
less ignorable as unicode adoption has grown. Martin's PEP that Matthew 
references is the only proposed fix that I know of.  There is a GSoc project 
working on it, but I have no idea what the status is.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12740] Add struct.Struct.nmemb

2011-08-13 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

As a new feature, this could only go into 3.3.

--
nosy: +r.david.murray
versions:  -Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12740
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12740] Add struct.Struct.nmemb

2011-08-13 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

I had never heard of nmemb. nmembers would be less cryptic.
The patch needs a versionadded directive in the docs.

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12740
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12745] Python2 or Python3 page

2011-08-13 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

It is a wiki page, so you can edit it yourself (you probably need to register, 
though).
If you think your modifications would be too drastic, perhaps you want to 
launch a discussion on the python-dev mailing-list about that page and its 
current contents.

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12745
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12740] Add struct.Struct.nmemb

2011-08-13 Thread Georg Brandl

Georg Brandl ge...@python.org added the comment:

While we're at it, let's add str.pbrk() ;)

--
nosy: +georg.brandl

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12740
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12744] inefficient pickling of long integers on 64-bit builds

2011-08-13 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +alexandre.vassalotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12744
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9552] ssl build under Windows always rebuilds OpenSSL

2011-08-13 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
resolution:  - fixed
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9552
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12740] Add struct.Struct.nmemb

2011-08-13 Thread Raymond Hettinger

Raymond Hettinger raymond.hettin...@gmail.com added the comment:

How about __len__()?

--
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12740
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-13 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 However, because the \wc issues are bigger, Java addressed the tr18 RL1.2a
 issues differently, this time by creating a new compilation flag called
 UNICODE_CHARACTER_CLASSES (with corresponding embedded (?U) regex flag.)
 
 Truth be told, even Perl has secret pattern compilation flags to govern
 this sort of thing (ascii, locale, unicode), but we (well, I) hope you
 never have to use or even notice them.  
 
 That too might be a route forward for Python, although I am not quite sure
 how much flexibility and control of your lexical scope you have.  However,
 the from __future_ imports suggest you may have enough to do something
 slick so that only people who ask for it get it, and also importantly that
 they get it all over the place so don't have to add an extra flag or u'...'
 or whatever every single time.  

If the current behaviour is buggy or sub-optimal, I think we should
simply fix it (which might be done by replacing re with regex if
someone wants to shepherd its inclusion in the stdlib).

By the way, thanks for the detailed explanations, Tom.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12731
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12740] Add struct.Struct.nmemb

2011-08-13 Thread Meador Inge

Meador Inge mead...@gmail.com added the comment:

The functionality part of the patch looks reasonable.  However, the 
pseudo-randomization in the unit tests seems like a bad idea.  Say someone is 
adding a new feature X.  Runs the unit tests to find one of them failing.  Then 
runs them again to investigate and they are now passing.  Unit tests should be 
repeatable.

--
nosy: +meador.inge

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12740
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12744] inefficient pickling of long integers on 64-bit builds

2011-08-13 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 8e824e09924a by Antoine Pitrou in branch 'default':
Issue #12744: Fix inefficient representation of integers
http://hg.python.org/cpython/rev/8e824e09924a

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12744
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12744] inefficient pickling of long integers on 64-bit builds

2011-08-13 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
resolution:  - fixed
stage: needs patch - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12744
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11241] ctypes: subclassing an already subclassed ArrayType generates AttributeError

2011-08-13 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc amaur...@gmail.com added the comment:

Yes, the patch looks good!

--
resolution:  - accepted

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11241
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12659] Add tests for packaging.tests.support

2011-08-13 Thread Francisco Martín Brugué

Francisco Martín Brugué franci...@email.de added the comment:

I've started with test for “fake_dec” and “TempdirManager”. Please let me know 
if that in the line you want.

Thanks in advance

Francis

--
keywords: +patch
nosy: +francismb
Added file: http://bugs.python.org/file22895/issue12659_v1.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12659
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12747] Move devguide into cpython repo

2011-08-13 Thread Eric Snow

Eric Snow ericsnowcurren...@gmail.com added the comment:

That's fine.  The discussion had moved away from the devguide, so I figured it 
would be worth following up.  You guys have made some good points.

--
resolution:  - rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12747
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-13 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

David Murray rep...@bugs.python.org wrote:

 Tom, note that nobody is arguing that what you are requesting is a bad
 thing :)

There looked to be minor some resistance, based on absolute backwards
compatibility even if wrong, regarding changing anything *at all* in re,
even things that to my jaded seem like actual bugs.

There are bugs, and then there are bugs.

In my survey of Unicode support across 7 programming languages for OSCON

http://training.perl.com/OSCON/index.html

I came across a lot of weirdnesses, especially as first when the learning
curve was high.  Sure, I found it odd that unlike Java, Perl, and Ruby,
Python didn't offer regular casemapping on strings, only the simple
character-based mapping.  But that doesn't make it a bug, which is why
I filed it as an feature/enhancement request/wish, not as a bug.

I always count as bugs not handling Unicode text the way Unicode says
it must be handled.  Such things would be:

Emitting CESU-8 when told to emit UTF-8.

Violating the rule that UTF-8 must be in the shortest possible encoding.

Not treating a code point as a letter when the supported version of the
UCD says it is.  (This can happen if internal rules get out of sync
with the current UCD.)

Claiming one does the expected thing on Unicode for case-insensitive
matches when not doing what Unicode says you must minimally do: use at
least the simple casefolds, if not in fact the full ones.

Saying \w matches Unicode word characters when one's definition of
word characters differs from that of the supported version of the UCD.

Supporting Unicode vX.Y.Z is more than adding more characters.  All the
behaviors specified in the UCD have to be updated too, or else you are just
ISO 10646.  I believe some of Python's Unicode bugs happened because folks
weren't aware which things in Python were defined by the UCD or by various
UTS reports yet were not being directly tracked that way.  That's why its
important to always fully state which version of these things you follow.

Other bugs, many actually, are a result of the narrow/wide-build untransparency.

There is wiggle room in some of these.  For example, which is the one that
applies to re, in that you could -- in a sense -- remove the bug by no longer
claiming to do case-insensitive matches on Unicode.  I do not find that very
useful. Javascript works this way: it doesn't do Unicode casefolding.  Java you
have to ask nicely with the extra UNICODE_CASE flag, aka (?u), used with the
CASE_INSENSITIVE, aka (?i).

Sometimes languages provide different but equivalent interfaces to the same
functionality.  For example, you may not support the Unicode property
\p{NAME=foobar} in patterns but instead support \N{foobar} in patterns and
hopefully also in strings.  That's just fine.  On slightly shakier ground but
still I think defensible is how one approaches support for the standard UCD
properties:

  Case_FoldingSimple_Case_Folding
 Titlecase_MappingSimple_Titlecase_Mapping
 Uppercase_MappingSimple_Uppercase_Mapping
 Lowercase_MappingSimple_Lowercase_Mapping

One can support folding, for example, via (?i) and not have to
directly supporting a Case_Folding property like \p{Case_Folding=s},
since (?i)s should be the same thing as \p{Case_Folding=s}.

 As far as I know, Matthew is the only one currently working on the
 regex support in Python.  (Other developers will commit small fixes if
 someone proposes a patch, but no one that I've seen other than Matthew
 is working on the deeper issues.)  If you want to help out that would
 be great.

Yes, I actually would.  At least as I find time for it.  I'm a competent C
programmer and Matthew's C code is very well documented, but that's very
time consuming.  For bang-for-buck, I do best on test and doc work, making
sure things are actually working the way they say do.

I was pretty surprised and disappointed by how much trouble I had with
Unicode work in Python.  A bit of that is learning curve, a bit of it is
suboptimal defaults, but quite a bit of it is that things either don't work
the way Unicode says, or because something is altogether missing.  I'd like
to help at least make the Python documentation clearer about what it is
or is not doing in this regard.

But be warned: one reason that Java 1.7 handles Unicode more according to
the published Unicode Standard in its Character, String, and Pattern
classes is because when they said they'd be supporting Unicode 6.0.0,
I went through those classes and every time I found something in violation
of that Standard, I filed a bug report that included a documentation patch
explaining what they weren't doing right.  Rather than apply my rather
embarrassing doc patches, they instead fixed the code. :)

 And as far as this particular issue goes, yes the difference between
 the narrow and wide build has been a known issue for a long time, but
 

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-13 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 Here's why I say that Python uses UTF-16 not UCS-2 on its narrow builds.
 Perhaps someone could tell me why the Python documentation says it uses
 UCS-2 on a narrow build.

There's a disagreement on that point between several developers. See an example 
sub-thread at:
http://mail.python.org/pipermail/python-dev/2010-November/105751.html

 Since you are already using a variable-width encoding, why the
 supercilious attitude toward UTF-8?

I think you are reading too much into these decisions. It's simply that no-one 
took the time to write an alternative implementation and demonstrate its 
superiority. I also believe the original implementation was UCS-2 and surrogate 
support was added progressively during the years. Hence the terminological mess 
and the ad-hoc semantics.

I agree that going with UTF-8 and a clever indexing scheme would be a better 
solution.

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-13 Thread Matthew Barnett

Matthew Barnett pyt...@mrabarnett.plus.com added the comment:

There are occasions when you want to do string slicing, often of the form:

pos = my_str.index(x)
endpos = my_str.index(y)
substring = my_str[pos : endpos]

To me that suggests that if UTF-8 is used then it may be worth profiling to see 
whether caching the last 2 positions would be beneficial.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-13 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 There are occasions when you want to do string slicing, often of the form:
 
 pos = my_str.index(x)
 endpos = my_str.index(y)
 substring = my_str[pos : endpos]
 
 To me that suggests that if UTF-8 is used then it may be worth
 profiling to see whether caching the last 2 positions would be
 beneficial.

And/or a lookup table giving the byte offset of, say, every 16th
character. It gives you a O(1) lookup with a relatively reasonable
constant cost (you have to scan for less than 16 characters after the
lookup).

On small strings ( 256 UTF-8 bytes) the space overhead for the lookup
table would be 1/16. It could also be constructed lazily whenever more
than 2 positions are cached.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12740] Add struct.Struct.nmemb

2011-08-13 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
assignee:  - rhettinger
priority: normal - low

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12740
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12744] inefficient pickling of long integers on 64-bit builds

2011-08-13 Thread Raymond Hettinger

Raymond Hettinger raymond.hettin...@gmail.com added the comment:

Nice.

--
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12744
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-13 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Matthew Barnett rep...@bugs.python.org wrote
   on Sat, 13 Aug 2011 20:57:40 -: 

 There are occasions when you want to do string slicing, often of the form:

   pos = my_str.index(x)
   endpos = my_str.index(y)
   substring = my_str[pos : endpos]

Me, I would probably give the second call to index the first  
index position to guarantee the end comes after the start:

str  = for finding the biggest of all the strings
x_at = str.index(big)
y_at = str.index(the, x_at)
some = str[x_at:y_at]
print(GOT, some)

But here's a serious question: is that *actually* a common usage pattern
for accessing strings in Python?  I ask because it wouldn't even *occur* to
me to go at such a problem in that way.  I would have always just written
it this way instead:

import re
str  = for finding the biggest of all the strings
some = re.search((big.*?)the, str).group(1)
print(GOT, some)

I know I would use the pattern approach, just because that's 
how I always do such things in Perl:

$str  = for finding the biggest of all the strings;
($some) = $str =~ /(big.*?)the/;
print GOT $some\n;

Which is obviously a *whole* lot simpler than the index approach:

$str  = for finding the biggest of all the strings;
$x_at = index($str, big);
$y_at = index($str, the, $x_at);
$len  = $y_at - $x_at;
$some = substr($str, $x_at, $len);
print GOT $some\n;

With no arithmetic and no need for temporary variables (you can't really
escape needing x_at to pass to the second call to index), it's all a
lot more WYSIWIG.  See how much easier that is?  

Sure, it's a bit cleaner and less noisy in Perl than it is in Python by
virtue of Perl's integrated pattern matching, but I would still use
patterns in Python for this, not index.  

I honestly find the equivalent pattern operations a lot easier to read and write
and maintain than I find the index/substring version.  It's a visual thing.  
I find patterns a win in maintainability over all that busy index monkeywork.  
The index/rindex and substring approach is one I almost never ever turn to.
I bet I use pattern matching 100 or 500 times for each time I use index, and
maybe even more.

I happen to think in patterns.  I don't expect other people to do so.  But
because of this, I usually end up picking patterns even if they might be a
little bit slower, because I think the gain in flexibility and especially
maintability more than makes up for any minor performance concerns.

This might also show you why patterns are so important to me: they're one
of the most important tools we have for processing text.  Index isn't,
which is why I really don't care about whether it has O(1) access.  

 To me that suggests that if UTF-8 is used then it may be worth
 profiling to see whether caching the last 2 positions would be
 beneficial.

Notice how with the pattern approach, which is inherently sequential, you don't
have all that concern about running over the string more than once.  Once you
have the first piece (here, big), you proceed directly from there looking for
the second piece in a straightforward, WYSIWIG way.  There is no need to keep an
extra index or even two around on the string structure itself, going at it this 
way.

I would be pretty surprised if Perl could gain any speed by caching a pair of
MRU index values against its UTF-8 [but see footnote], because again, I think
the normal access pattern wouldn't make use of them.  Maybe Python programmers
don't think of strings the same way, though.  That, I really couldn't tell you.

But here's something to think about:

If it *is* true that you guys do all this index stuff that Perl programmers
just never see or do because of our differing comfort levels with regexes,
and so you think Python that might still benefit from that sort of caching 
because its culture has promoted a different access pattern, then that caching 
benefit would still apply even if you were retain the current UTF-16 
representation
instead of going to UTF-8 (which might want it) or to UTF-32 (which wouldn't).

After all, you have the same variable-width caching issue with UTF-16 as with
UTF-8, so if it makes sense to have an MRU cache mapping character indices to
byte indices, then it doesn't matter whether you use UTF-8 or UTF-16!

However, I'd want some passive comparative benchmarks using real programs with
real data, because I would be suspicious of incurring the memory cost of two
more pointers in every string in the whole program.  That's serious.

--tom

FOOTNOTE: The Perl 6 people are thinking about clever ways to set up byte
  offset indices.  You have to do this if you want O(1) access to the
  Nth element for elements that are not simple code points even if you
  use UTF-32.  That's because they want the default string element to be
  a user visible grapheme, not a code point.  I know they have clever
  

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-13 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Antoine Pitrou rep...@bugs.python.org wrote
   on Sat, 13 Aug 2011 21:09:52 -: 

 And/or a lookup table giving the byte offset of, say, every 16th
 character. It gives you a O(1) lookup with a relatively reasonable
 constant cost (you have to scan for less than 16 characters after the
 lookup).

 On small strings ( 256 UTF-8 bytes) the space overhead for the lookup
 table would be 1/16. It could also be constructed lazily whenever more
 than 2 positions are cached.

You really should talk to the Perl 6 people to see whether their current
strategy for caching offset maps for grapheme positions might be of use to
you.  Larry explained it to me once but I no longer recall any details.

I notice though that they don't seem to think it worth doing for UTF-8 
or UTF-16, just for their synthetic NFG (Grapheme Normalization Form)
strings, where it would be needed even if they used UTF-32 underneath.

--tom

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-13 Thread Matthew Barnett

Matthew Barnett pyt...@mrabarnett.plus.com added the comment:

You're right about starting the second search from where the first finished. 
Caching the position would be an advantage there.

The memory cost of extra pointers wouldn't be so bad if UTF-8 took less space 
than the current format.

Regex isn't used as much as in Perl. BTW, the current re module was introduced 
in Python 1.5, the previous regex and regsub modules being removed in Python 
2.5.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-13 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

 Here's why I say that Python uses UTF-16 not UCS-2 on its narrow builds.
 Perhaps someone could tell me why the Python documentation says it uses
 UCS-2 on a narrow build.

 There's a disagreement on that point between several developers. 
 See an example sub-thread at:

   http://mail.python.org/pipermail/python-dev/2010-November/105751.html

Some of those folks know what they're talking about, and some do not.

Most of the postings miss the mark.

Python uses UTF-16 for its narrow builds.  It does not use UCS-2.

The argument that it must be UCS-2 because it can store lone surrogates
in memory is spurious.

You have to read The Unicode Standard very *very* closely, but it is not
necessary that all internal buffers always be in well-formed UTF-whatever.
Otherwise it would be impossible to append a code unit at a time to buffer.
I could pull out the reference if I worked at it, because I've had to find
it before.  It's in there.  Trust me.  I know.

It is also spurious to pretend that because you can produce illegal output
when telling it to generate something in UTF-16 that it is somehow not using
UTF-16.  You have simply made a mistake.  You have generated something  that
you have promised you would not generate.   I have more to say about this below.

Finally, it is spurious to argue against UTF-16 because of the code unit
interface.  Java does exactly  the same thing as Python does *in all regards*
here, and no one pretends that Java is UCS-2.  Both are UTF-16.

It is simply a design error to pretend that the number of characters
is the number of code units instead of code points.  A terrible and
ugly one, but it does not mean you are UCS-2.

You are not.  Python uses UTF-16 on narrow builds.  

The ugly terrible design error is digusting and wrong, just as much in 
Python as in Java, and perhaps moreso because of the idiocy of narrow
builds even existing.  But that doesn't make them UCS-2.

If I could wave a magic wand, I would have Python undo its code unit
blunder and go back to code points, no matter what.  That means to stop
talking about serialization schemes and start talking about logical code
points.  It means that slicing and index and length and everything only
report true code points.  This horrible code unit botch from narrow builds
is most easily cured by moving to wide builds only.

However, there is more.

I haven't checked its UTF-16 codecs, but Python's UTF-8 codec is broken
in a bunch of ways.  You should be raising as exception in all kinds of
places and you aren't.  I can see I need to bug report this stuff to.  
I don't to be mean about this.  HONEST!  It's just the way it is.

Unicode currently reserves 66 code points as noncharacters, which it 
guarantees will never be in a legal UTF-anything stream.  I am not talking 
about surrogates, either.

To start with, no code point which when bitwise added with 0xFFFE returns
0xFFFE can never appear in a valid UTF-* stream, but Python allow this
without any error.

That means that both 0xNN_FFFE and 0xNN_ are illegal in all planes,
where NN is 00 through 10 in hex.  So that's 2 noncharacters times 17 
planes = 34 code points illegal for interchange that Python is passing 
through illegally.  

The remaining 32 nonsurrogate code points illegal for open interchange
are 0xFDD0 through 0xFDEF.  Those are not allowed either, but Python
doesn't seem to care.

You simply cannot say you are generating UTF-8 and then generate a byte
sequence that UTF-8 guarantees can never occur.  This is a violation.

***SIGH***

--tom

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12725] Docs: Odd phrase floating seconds in socket.html

2011-08-13 Thread Ben Hayden

Ben Hayden hayden...@gmail.com added the comment:

I made the suggested second change - both in the docs  the socketmodule.c 
file. If there's a different way to patch documentation, someone let me know. :D

--
keywords: +patch
nosy: +beardedp
Added file: http://bugs.python.org/file22896/issue12725.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12725
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12748] IDLE halts on osx when copy and paste

2011-08-13 Thread hy

New submission from hy hoyeung...@gmail.com:

The IDLE halts on os x when copy and paste
I tried in 10.6.8 and 10.7
Now I could only use IDLE in Windows in VMware

--
assignee: ronaldoussoren
components: IDLE, Macintosh
messages: 142046
nosy: hoyeung, ronaldoussoren
priority: normal
severity: normal
status: open
title: IDLE halts on osx when copy and paste
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12748
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-13 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

 It is simply a design error to pretend that the number of characters
 is the number of code units instead of code points.  A terrible and
 ugly one, but it does not mean you are UCS-2.

If you are referring to the value returned by len(unicode_string), it is the 
number of code units.  This is a matter of practicality beats purity.  
Returning the number of code units is O(1) (num_of_bytes/2).  To calculate the 
number of characters it's instead necessary to scan all the string looking for 
surrogates and then count any surrogate pair as 1 character.  It was therefore 
decided that it was not worth to slow down the common case just to be 100% 
accurate in the uncommon case.

That said it would be nice to have an API (maybe in unicodedata or as new str 
methods?) able to return the number of code units, code points, graphemes, etc, 
but I'm not sure that it should be the default behavior of len().

 The ugly terrible design error is digusting and wrong, just as much
 in Python as in Java, and perhaps moreso because of the idiocy of
 narrow builds even existing.

Again, wide builds use twice as much the space than narrow ones, but one the 
other hand you can have fast and correct behavior with e.g. len().  If people 
don't care about/don't need to use non-BMP chars and would rather use less 
space, they can do so.  Until we agree that the difference in space used/speed 
is no longer relevant and/or that non-BMP characters become common enough to 
prefer the correct behavior over the fast-but-inaccurate one, we will 
probably keep both.

 I haven't checked its UTF-16 codecs, but Python's UTF-8 codec is
 broken in a bunch of ways.  You should be raising as exception in
 all kinds of places and you aren't.

I am aware of some problems of the UTF-8 codec on Python 2.  It used to follow 
RFC 2279 until last year and now it's been updated to follow RFC 3629.
However, for backward compatibility, it still encodes/decodes surrogate pairs.  
This broken behavior has been kept because on Python 2, you can encode every 
code point with UTF-8, and decode it back without errors:
 x = [unichr(c).encode('utf-8') for c in range(0x11)]

and breaking this invariant would probably make more harm than good.  I 
proposed to add a real utf-8 codec on Python 2, but no one seems to care 
enough about it.

Also note that this is fixed in Python3:
 x = [chr(c).encode('utf-8') for c in range(0x11)]
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 
0: surrogates not allowed

  I can see I need to bug report this stuff to.  

If you find other places where it's broken (both on Python 2 and/or Python 3), 
please do and feel free to add me to the nosy.  If you can also provide a 
failing test case and/or point to the relevant parts of the Unicode standard, 
it would be great.

--
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12748] IDLE halts on osx when copy and paste

2011-08-13 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

Can you specify what version of Python are you using, how do you copy/paste 
(e.g. ctrl+c/v, from the menu), and if it halts regardless of what you 
copy/paste?

--
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12748
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12725] Docs: Odd phrase floating seconds in socket.html

2011-08-13 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset dfe6f0a603d2 by Ezio Melotti in branch '2.7':
#12725: fix working. Patch by Ben Hayden.
http://hg.python.org/cpython/rev/dfe6f0a603d2

New changeset ab3432a81c26 by Ezio Melotti in branch '3.2':
#12725: fix working. Patch by Ben Hayden.
http://hg.python.org/cpython/rev/ab3432a81c26

New changeset 49e9e34da512 by Ezio Melotti in branch 'default':
#12725: merge with 3.2.
http://hg.python.org/cpython/rev/49e9e34da512

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12725
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12725] Docs: Odd phrase floating seconds in socket.html

2011-08-13 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

Fixed, thanks for the report and the patch!

--
nosy: +ezio.melotti
resolution:  - fixed
stage: needs patch - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12725
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12748] IDLE halts on osx when copy and paste

2011-08-13 Thread hy

hy hoyeung...@gmail.com added the comment:

I use the latest python 2.7.2 binary in a freshly installed os x
I use command c and command v, and also use the menu.
Also, it halts when I cut.
No matter what I cut, copy and paste, it halts.
It happens both in the shell and editor.

I have to remind myself not to use copy and paste now. Once I forget, IDLE 
halts and I have to force quit it and I lost everything unsaved.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12748
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12748] IDLE halts on osx when copy and paste

2011-08-13 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +kbk, ned.deily

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12748
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com