[issue12747] Move devguide into /Docs of cpython repo
Ezio Melotti ezio.melo...@gmail.com added the comment: Actually if we move the devguide to Doc/ we will have to maintain a copy for each branch -- that's the real reason why it's in a separate repo. So I think it's better to leave the devguide in a separate repo, and keep using it to document things that are not strictly dependent on specific Python releases. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12747 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12747] Move devguide into cpython repo
Eric Snow ericsnowcurren...@gmail.com added the comment: I suppose it doesn't have to be in Doc/. -- title: Move devguide into /Docs of cpython repo - Move devguide into cpython repo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12747 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12747] Move devguide into cpython repo
Ezio Melotti ezio.melo...@gmail.com added the comment: The possible options I see are: 1) move it in Doc/ or some other dir -- but we will have to maintain it in all the branches; 2) keep it only in the default branch -- but we will have to remove it from the old branch when we cut a release; 3) make a separate branch for the devguide -- I'm not sure this makes even sense and/or if it solves anything; Also if it's not in Doc/, where would the test.support doc go? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12747 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12747] Move devguide into cpython repo
Nick Coghlan ncogh...@gmail.com added the comment: I'd say the main reason the dev guide is in a separate repo is the historical one (i.e. Brett was working on it as a separate repo prior to the hg migration and we never merged it). However, the version independent nature of the material is the main argument against merging it into the Docs tree - it's a document about the development community around CPython, not a document about CPython itself. Personally, I'm happy with the resolution in the python-dev thread - tagging the test.support docs to keep them out of indices and search results, while leaving the dev guide in a separate version independent repo. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12747 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11866] race condition in threading._newname()
Peter Saveliev svinota.savel...@gmail.com added the comment: counter.next() is a C routine and it is atomic from Python's point of view — if I understand right. The test shows that original threading.py leads to a (rare) race here, while with counter object there is no race condition. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11866 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11866] race condition in threading._newname()
Raymond Hettinger raymond.hettin...@gmail.com added the comment: I think the patch is correct. FWIW, my style is to prebind the next method, making the counter completely self-contained (like a closure): +_counter = itertools.count().next def _newname(template=Thread-%d): global _counter -_counter = _counter + 1 -return template % _counter +return template % _counter() -- assignee: - rhettinger nosy: +rhettinger resolution: - accepted ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11866 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12746] normalization is affected by unicode width
Changes by Ezio Melotti ezio.melo...@gmail.com: -- components: +Unicode nosy: +ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12746 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2857] Add java modified utf-8 codec
STINNER Victor victor.stin...@haypocalc.com added the comment: Python does have other weird encodings like bz2 or rot13. No, it has no more such weird encodings. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2857 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +haypo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12731 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11513] chained exception/incorrect exception from tarfile.open on a non-existent file
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 843cd43206b4 by Georg Brandl in branch '3.2': Fix #11513: wrong exception handling for the case that GzipFile itself raises an IOError. http://hg.python.org/cpython/rev/843cd43206b4 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11513 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11513] chained exception/incorrect exception from tarfile.open on a non-existent file
Georg Brandl ge...@python.org added the comment: Fixed in 3.2/default. 2.7 has even more primitive error handling; should the gzopen() be adapted to the 3.x case? -- nosy: +georg.brandl ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11513 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10799] Improve webbrowser (.open) doc and behavior
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10799 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12737] str.title() is overzealous by upcasing combining marks inappropriately
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +haypo, loewis stage: - needs patch versions: +Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12737 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12746] normalization is affected by unicode width
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +haypo, lemburg ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12746 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12646] zlib.Decompress.decompress/flush do not raise any exceptions when given truncated input streams
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset bb6c2d5c811d by Nadeem Vawda in branch 'default': Issue #12646: Add an 'eof' attribute to zlib.Decompress. http://hg.python.org/cpython/rev/bb6c2d5c811d -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12646 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12669] test_curses skipped on buildbots
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 4358909ee221 by Nadeem Vawda in branch 'default': Issue #12669: Fix test_curses so that it can run on the buildbots. http://hg.python.org/cpython/rev/4358909ee221 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12669 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12646] zlib.Decompress.decompress/flush do not raise any exceptions when given truncated input streams
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 65d61ed991d9 by Nadeem Vawda in branch 'default': Fix incorrect comment in zlib.Decompress.flush(). http://hg.python.org/cpython/rev/65d61ed991d9 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12646 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12723] Provide an API in tkSimpleDialog for defining custom validation functions
R. David Murray rdmur...@bitdance.com added the comment: A bit of both, I think. The current function is actually 'getvalue' and is responsible for retrieving the value, validating its type, and converting to that type (the current ones do both in the same operation). It feels to me like a cleaner interface to decouple retrieval and validation/conversion, so that the validation function gets passed a string and returns the desired type. But in that case, having the string dialog take the validation/coercion function makes the name of the askstring function just wrong. So, I still think the cleaner API is to expose the class and let the application subclass to provide the validation function. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12723 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12646] zlib.Decompress.decompress/flush do not raise any exceptions when given truncated input streams
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- resolution: - fixed stage: patch review - committed/rejected status: open - closed type: behavior - feature request ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12646 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
R. David Murray rdmur...@bitdance.com added the comment: Tom, note that nobody is arguing that what you are requesting is a bad thing :) As far as I know, Matthew is the only one currently working on the regex support in Python. (Other developers will commit small fixes if someone proposes a patch, but no one that I've seen other than Matthew is working on the deeper issues.) If you want to help out that would be great. And as far as this particular issue goes, yes the difference between the narrow and wide build has been a known issue for a long time, but has become less and less ignorable as unicode adoption has grown. Martin's PEP that Matthew references is the only proposed fix that I know of. There is a GSoc project working on it, but I have no idea what the status is. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12729 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12740] Add struct.Struct.nmemb
R. David Murray rdmur...@bitdance.com added the comment: As a new feature, this could only go into 3.3. -- nosy: +r.david.murray versions: -Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12740 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12740] Add struct.Struct.nmemb
Antoine Pitrou pit...@free.fr added the comment: I had never heard of nmemb. nmembers would be less cryptic. The patch needs a versionadded directive in the docs. -- nosy: +pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12740 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12745] Python2 or Python3 page
Antoine Pitrou pit...@free.fr added the comment: It is a wiki page, so you can edit it yourself (you probably need to register, though). If you think your modifications would be too drastic, perhaps you want to launch a discussion on the python-dev mailing-list about that page and its current contents. -- nosy: +pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12745 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12740] Add struct.Struct.nmemb
Georg Brandl ge...@python.org added the comment: While we're at it, let's add str.pbrk() ;) -- nosy: +georg.brandl ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12740 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12744] inefficient pickling of long integers on 64-bit builds
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +alexandre.vassalotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12744 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9552] ssl build under Windows always rebuilds OpenSSL
Changes by Antoine Pitrou pit...@free.fr: -- resolution: - fixed stage: - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9552 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12740] Add struct.Struct.nmemb
Raymond Hettinger raymond.hettin...@gmail.com added the comment: How about __len__()? -- nosy: +rhettinger ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12740 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a
Antoine Pitrou pit...@free.fr added the comment: However, because the \wc issues are bigger, Java addressed the tr18 RL1.2a issues differently, this time by creating a new compilation flag called UNICODE_CHARACTER_CLASSES (with corresponding embedded (?U) regex flag.) Truth be told, even Perl has secret pattern compilation flags to govern this sort of thing (ascii, locale, unicode), but we (well, I) hope you never have to use or even notice them. That too might be a route forward for Python, although I am not quite sure how much flexibility and control of your lexical scope you have. However, the from __future_ imports suggest you may have enough to do something slick so that only people who ask for it get it, and also importantly that they get it all over the place so don't have to add an extra flag or u'...' or whatever every single time. If the current behaviour is buggy or sub-optimal, I think we should simply fix it (which might be done by replacing re with regex if someone wants to shepherd its inclusion in the stdlib). By the way, thanks for the detailed explanations, Tom. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12731 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12740] Add struct.Struct.nmemb
Meador Inge mead...@gmail.com added the comment: The functionality part of the patch looks reasonable. However, the pseudo-randomization in the unit tests seems like a bad idea. Say someone is adding a new feature X. Runs the unit tests to find one of them failing. Then runs them again to investigate and they are now passing. Unit tests should be repeatable. -- nosy: +meador.inge ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12740 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12744] inefficient pickling of long integers on 64-bit builds
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 8e824e09924a by Antoine Pitrou in branch 'default': Issue #12744: Fix inefficient representation of integers http://hg.python.org/cpython/rev/8e824e09924a -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12744 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12744] inefficient pickling of long integers on 64-bit builds
Changes by Antoine Pitrou pit...@free.fr: -- resolution: - fixed stage: needs patch - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12744 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11241] ctypes: subclassing an already subclassed ArrayType generates AttributeError
Amaury Forgeot d'Arc amaur...@gmail.com added the comment: Yes, the patch looks good! -- resolution: - accepted ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11241 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12659] Add tests for packaging.tests.support
Francisco Martín Brugué franci...@email.de added the comment: I've started with test for “fake_dec” and “TempdirManager”. Please let me know if that in the line you want. Thanks in advance Francis -- keywords: +patch nosy: +francismb Added file: http://bugs.python.org/file22895/issue12659_v1.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12659 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12747] Move devguide into cpython repo
Eric Snow ericsnowcurren...@gmail.com added the comment: That's fine. The discussion had moved away from the devguide, so I figured it would be worth following up. You guys have made some good points. -- resolution: - rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12747 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Tom Christiansen tchr...@perl.com added the comment: David Murray rep...@bugs.python.org wrote: Tom, note that nobody is arguing that what you are requesting is a bad thing :) There looked to be minor some resistance, based on absolute backwards compatibility even if wrong, regarding changing anything *at all* in re, even things that to my jaded seem like actual bugs. There are bugs, and then there are bugs. In my survey of Unicode support across 7 programming languages for OSCON http://training.perl.com/OSCON/index.html I came across a lot of weirdnesses, especially as first when the learning curve was high. Sure, I found it odd that unlike Java, Perl, and Ruby, Python didn't offer regular casemapping on strings, only the simple character-based mapping. But that doesn't make it a bug, which is why I filed it as an feature/enhancement request/wish, not as a bug. I always count as bugs not handling Unicode text the way Unicode says it must be handled. Such things would be: Emitting CESU-8 when told to emit UTF-8. Violating the rule that UTF-8 must be in the shortest possible encoding. Not treating a code point as a letter when the supported version of the UCD says it is. (This can happen if internal rules get out of sync with the current UCD.) Claiming one does the expected thing on Unicode for case-insensitive matches when not doing what Unicode says you must minimally do: use at least the simple casefolds, if not in fact the full ones. Saying \w matches Unicode word characters when one's definition of word characters differs from that of the supported version of the UCD. Supporting Unicode vX.Y.Z is more than adding more characters. All the behaviors specified in the UCD have to be updated too, or else you are just ISO 10646. I believe some of Python's Unicode bugs happened because folks weren't aware which things in Python were defined by the UCD or by various UTS reports yet were not being directly tracked that way. That's why its important to always fully state which version of these things you follow. Other bugs, many actually, are a result of the narrow/wide-build untransparency. There is wiggle room in some of these. For example, which is the one that applies to re, in that you could -- in a sense -- remove the bug by no longer claiming to do case-insensitive matches on Unicode. I do not find that very useful. Javascript works this way: it doesn't do Unicode casefolding. Java you have to ask nicely with the extra UNICODE_CASE flag, aka (?u), used with the CASE_INSENSITIVE, aka (?i). Sometimes languages provide different but equivalent interfaces to the same functionality. For example, you may not support the Unicode property \p{NAME=foobar} in patterns but instead support \N{foobar} in patterns and hopefully also in strings. That's just fine. On slightly shakier ground but still I think defensible is how one approaches support for the standard UCD properties: Case_FoldingSimple_Case_Folding Titlecase_MappingSimple_Titlecase_Mapping Uppercase_MappingSimple_Uppercase_Mapping Lowercase_MappingSimple_Lowercase_Mapping One can support folding, for example, via (?i) and not have to directly supporting a Case_Folding property like \p{Case_Folding=s}, since (?i)s should be the same thing as \p{Case_Folding=s}. As far as I know, Matthew is the only one currently working on the regex support in Python. (Other developers will commit small fixes if someone proposes a patch, but no one that I've seen other than Matthew is working on the deeper issues.) If you want to help out that would be great. Yes, I actually would. At least as I find time for it. I'm a competent C programmer and Matthew's C code is very well documented, but that's very time consuming. For bang-for-buck, I do best on test and doc work, making sure things are actually working the way they say do. I was pretty surprised and disappointed by how much trouble I had with Unicode work in Python. A bit of that is learning curve, a bit of it is suboptimal defaults, but quite a bit of it is that things either don't work the way Unicode says, or because something is altogether missing. I'd like to help at least make the Python documentation clearer about what it is or is not doing in this regard. But be warned: one reason that Java 1.7 handles Unicode more according to the published Unicode Standard in its Character, String, and Pattern classes is because when they said they'd be supporting Unicode 6.0.0, I went through those classes and every time I found something in violation of that Standard, I filed a bug report that included a documentation patch explaining what they weren't doing right. Rather than apply my rather embarrassing doc patches, they instead fixed the code. :) And as far as this particular issue goes, yes the difference between the narrow and wide build has been a known issue for a long time, but
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Antoine Pitrou pit...@free.fr added the comment: Here's why I say that Python uses UTF-16 not UCS-2 on its narrow builds. Perhaps someone could tell me why the Python documentation says it uses UCS-2 on a narrow build. There's a disagreement on that point between several developers. See an example sub-thread at: http://mail.python.org/pipermail/python-dev/2010-November/105751.html Since you are already using a variable-width encoding, why the supercilious attitude toward UTF-8? I think you are reading too much into these decisions. It's simply that no-one took the time to write an alternative implementation and demonstrate its superiority. I also believe the original implementation was UCS-2 and surrogate support was added progressively during the years. Hence the terminological mess and the ad-hoc semantics. I agree that going with UTF-8 and a clever indexing scheme would be a better solution. -- nosy: +pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12729 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Matthew Barnett pyt...@mrabarnett.plus.com added the comment: There are occasions when you want to do string slicing, often of the form: pos = my_str.index(x) endpos = my_str.index(y) substring = my_str[pos : endpos] To me that suggests that if UTF-8 is used then it may be worth profiling to see whether caching the last 2 positions would be beneficial. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12729 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Antoine Pitrou pit...@free.fr added the comment: There are occasions when you want to do string slicing, often of the form: pos = my_str.index(x) endpos = my_str.index(y) substring = my_str[pos : endpos] To me that suggests that if UTF-8 is used then it may be worth profiling to see whether caching the last 2 positions would be beneficial. And/or a lookup table giving the byte offset of, say, every 16th character. It gives you a O(1) lookup with a relatively reasonable constant cost (you have to scan for less than 16 characters after the lookup). On small strings ( 256 UTF-8 bytes) the space overhead for the lookup table would be 1/16. It could also be constructed lazily whenever more than 2 positions are cached. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12729 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12740] Add struct.Struct.nmemb
Changes by Raymond Hettinger raymond.hettin...@gmail.com: -- assignee: - rhettinger priority: normal - low ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12740 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12744] inefficient pickling of long integers on 64-bit builds
Raymond Hettinger raymond.hettin...@gmail.com added the comment: Nice. -- nosy: +rhettinger ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12744 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Tom Christiansen tchr...@perl.com added the comment: Matthew Barnett rep...@bugs.python.org wrote on Sat, 13 Aug 2011 20:57:40 -: There are occasions when you want to do string slicing, often of the form: pos = my_str.index(x) endpos = my_str.index(y) substring = my_str[pos : endpos] Me, I would probably give the second call to index the first index position to guarantee the end comes after the start: str = for finding the biggest of all the strings x_at = str.index(big) y_at = str.index(the, x_at) some = str[x_at:y_at] print(GOT, some) But here's a serious question: is that *actually* a common usage pattern for accessing strings in Python? I ask because it wouldn't even *occur* to me to go at such a problem in that way. I would have always just written it this way instead: import re str = for finding the biggest of all the strings some = re.search((big.*?)the, str).group(1) print(GOT, some) I know I would use the pattern approach, just because that's how I always do such things in Perl: $str = for finding the biggest of all the strings; ($some) = $str =~ /(big.*?)the/; print GOT $some\n; Which is obviously a *whole* lot simpler than the index approach: $str = for finding the biggest of all the strings; $x_at = index($str, big); $y_at = index($str, the, $x_at); $len = $y_at - $x_at; $some = substr($str, $x_at, $len); print GOT $some\n; With no arithmetic and no need for temporary variables (you can't really escape needing x_at to pass to the second call to index), it's all a lot more WYSIWIG. See how much easier that is? Sure, it's a bit cleaner and less noisy in Perl than it is in Python by virtue of Perl's integrated pattern matching, but I would still use patterns in Python for this, not index. I honestly find the equivalent pattern operations a lot easier to read and write and maintain than I find the index/substring version. It's a visual thing. I find patterns a win in maintainability over all that busy index monkeywork. The index/rindex and substring approach is one I almost never ever turn to. I bet I use pattern matching 100 or 500 times for each time I use index, and maybe even more. I happen to think in patterns. I don't expect other people to do so. But because of this, I usually end up picking patterns even if they might be a little bit slower, because I think the gain in flexibility and especially maintability more than makes up for any minor performance concerns. This might also show you why patterns are so important to me: they're one of the most important tools we have for processing text. Index isn't, which is why I really don't care about whether it has O(1) access. To me that suggests that if UTF-8 is used then it may be worth profiling to see whether caching the last 2 positions would be beneficial. Notice how with the pattern approach, which is inherently sequential, you don't have all that concern about running over the string more than once. Once you have the first piece (here, big), you proceed directly from there looking for the second piece in a straightforward, WYSIWIG way. There is no need to keep an extra index or even two around on the string structure itself, going at it this way. I would be pretty surprised if Perl could gain any speed by caching a pair of MRU index values against its UTF-8 [but see footnote], because again, I think the normal access pattern wouldn't make use of them. Maybe Python programmers don't think of strings the same way, though. That, I really couldn't tell you. But here's something to think about: If it *is* true that you guys do all this index stuff that Perl programmers just never see or do because of our differing comfort levels with regexes, and so you think Python that might still benefit from that sort of caching because its culture has promoted a different access pattern, then that caching benefit would still apply even if you were retain the current UTF-16 representation instead of going to UTF-8 (which might want it) or to UTF-32 (which wouldn't). After all, you have the same variable-width caching issue with UTF-16 as with UTF-8, so if it makes sense to have an MRU cache mapping character indices to byte indices, then it doesn't matter whether you use UTF-8 or UTF-16! However, I'd want some passive comparative benchmarks using real programs with real data, because I would be suspicious of incurring the memory cost of two more pointers in every string in the whole program. That's serious. --tom FOOTNOTE: The Perl 6 people are thinking about clever ways to set up byte offset indices. You have to do this if you want O(1) access to the Nth element for elements that are not simple code points even if you use UTF-32. That's because they want the default string element to be a user visible grapheme, not a code point. I know they have clever
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Tom Christiansen tchr...@perl.com added the comment: Antoine Pitrou rep...@bugs.python.org wrote on Sat, 13 Aug 2011 21:09:52 -: And/or a lookup table giving the byte offset of, say, every 16th character. It gives you a O(1) lookup with a relatively reasonable constant cost (you have to scan for less than 16 characters after the lookup). On small strings ( 256 UTF-8 bytes) the space overhead for the lookup table would be 1/16. It could also be constructed lazily whenever more than 2 positions are cached. You really should talk to the Perl 6 people to see whether their current strategy for caching offset maps for grapheme positions might be of use to you. Larry explained it to me once but I no longer recall any details. I notice though that they don't seem to think it worth doing for UTF-8 or UTF-16, just for their synthetic NFG (Grapheme Normalization Form) strings, where it would be needed even if they used UTF-32 underneath. --tom -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12729 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Matthew Barnett pyt...@mrabarnett.plus.com added the comment: You're right about starting the second search from where the first finished. Caching the position would be an advantage there. The memory cost of extra pointers wouldn't be so bad if UTF-8 took less space than the current format. Regex isn't used as much as in Perl. BTW, the current re module was introduced in Python 1.5, the previous regex and regsub modules being removed in Python 2.5. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12729 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Tom Christiansen tchr...@perl.com added the comment: Here's why I say that Python uses UTF-16 not UCS-2 on its narrow builds. Perhaps someone could tell me why the Python documentation says it uses UCS-2 on a narrow build. There's a disagreement on that point between several developers. See an example sub-thread at: http://mail.python.org/pipermail/python-dev/2010-November/105751.html Some of those folks know what they're talking about, and some do not. Most of the postings miss the mark. Python uses UTF-16 for its narrow builds. It does not use UCS-2. The argument that it must be UCS-2 because it can store lone surrogates in memory is spurious. You have to read The Unicode Standard very *very* closely, but it is not necessary that all internal buffers always be in well-formed UTF-whatever. Otherwise it would be impossible to append a code unit at a time to buffer. I could pull out the reference if I worked at it, because I've had to find it before. It's in there. Trust me. I know. It is also spurious to pretend that because you can produce illegal output when telling it to generate something in UTF-16 that it is somehow not using UTF-16. You have simply made a mistake. You have generated something that you have promised you would not generate. I have more to say about this below. Finally, it is spurious to argue against UTF-16 because of the code unit interface. Java does exactly the same thing as Python does *in all regards* here, and no one pretends that Java is UCS-2. Both are UTF-16. It is simply a design error to pretend that the number of characters is the number of code units instead of code points. A terrible and ugly one, but it does not mean you are UCS-2. You are not. Python uses UTF-16 on narrow builds. The ugly terrible design error is digusting and wrong, just as much in Python as in Java, and perhaps moreso because of the idiocy of narrow builds even existing. But that doesn't make them UCS-2. If I could wave a magic wand, I would have Python undo its code unit blunder and go back to code points, no matter what. That means to stop talking about serialization schemes and start talking about logical code points. It means that slicing and index and length and everything only report true code points. This horrible code unit botch from narrow builds is most easily cured by moving to wide builds only. However, there is more. I haven't checked its UTF-16 codecs, but Python's UTF-8 codec is broken in a bunch of ways. You should be raising as exception in all kinds of places and you aren't. I can see I need to bug report this stuff to. I don't to be mean about this. HONEST! It's just the way it is. Unicode currently reserves 66 code points as noncharacters, which it guarantees will never be in a legal UTF-anything stream. I am not talking about surrogates, either. To start with, no code point which when bitwise added with 0xFFFE returns 0xFFFE can never appear in a valid UTF-* stream, but Python allow this without any error. That means that both 0xNN_FFFE and 0xNN_ are illegal in all planes, where NN is 00 through 10 in hex. So that's 2 noncharacters times 17 planes = 34 code points illegal for interchange that Python is passing through illegally. The remaining 32 nonsurrogate code points illegal for open interchange are 0xFDD0 through 0xFDEF. Those are not allowed either, but Python doesn't seem to care. You simply cannot say you are generating UTF-8 and then generate a byte sequence that UTF-8 guarantees can never occur. This is a violation. ***SIGH*** --tom -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12729 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12725] Docs: Odd phrase floating seconds in socket.html
Ben Hayden hayden...@gmail.com added the comment: I made the suggested second change - both in the docs the socketmodule.c file. If there's a different way to patch documentation, someone let me know. :D -- keywords: +patch nosy: +beardedp Added file: http://bugs.python.org/file22896/issue12725.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12725 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12748] IDLE halts on osx when copy and paste
New submission from hy hoyeung...@gmail.com: The IDLE halts on os x when copy and paste I tried in 10.6.8 and 10.7 Now I could only use IDLE in Windows in VMware -- assignee: ronaldoussoren components: IDLE, Macintosh messages: 142046 nosy: hoyeung, ronaldoussoren priority: normal severity: normal status: open title: IDLE halts on osx when copy and paste versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12748 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Ezio Melotti ezio.melo...@gmail.com added the comment: It is simply a design error to pretend that the number of characters is the number of code units instead of code points. A terrible and ugly one, but it does not mean you are UCS-2. If you are referring to the value returned by len(unicode_string), it is the number of code units. This is a matter of practicality beats purity. Returning the number of code units is O(1) (num_of_bytes/2). To calculate the number of characters it's instead necessary to scan all the string looking for surrogates and then count any surrogate pair as 1 character. It was therefore decided that it was not worth to slow down the common case just to be 100% accurate in the uncommon case. That said it would be nice to have an API (maybe in unicodedata or as new str methods?) able to return the number of code units, code points, graphemes, etc, but I'm not sure that it should be the default behavior of len(). The ugly terrible design error is digusting and wrong, just as much in Python as in Java, and perhaps moreso because of the idiocy of narrow builds even existing. Again, wide builds use twice as much the space than narrow ones, but one the other hand you can have fast and correct behavior with e.g. len(). If people don't care about/don't need to use non-BMP chars and would rather use less space, they can do so. Until we agree that the difference in space used/speed is no longer relevant and/or that non-BMP characters become common enough to prefer the correct behavior over the fast-but-inaccurate one, we will probably keep both. I haven't checked its UTF-16 codecs, but Python's UTF-8 codec is broken in a bunch of ways. You should be raising as exception in all kinds of places and you aren't. I am aware of some problems of the UTF-8 codec on Python 2. It used to follow RFC 2279 until last year and now it's been updated to follow RFC 3629. However, for backward compatibility, it still encodes/decodes surrogate pairs. This broken behavior has been kept because on Python 2, you can encode every code point with UTF-8, and decode it back without errors: x = [unichr(c).encode('utf-8') for c in range(0x11)] and breaking this invariant would probably make more harm than good. I proposed to add a real utf-8 codec on Python 2, but no one seems to care enough about it. Also note that this is fixed in Python3: x = [chr(c).encode('utf-8') for c in range(0x11)] UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed I can see I need to bug report this stuff to. If you find other places where it's broken (both on Python 2 and/or Python 3), please do and feel free to add me to the nosy. If you can also provide a failing test case and/or point to the relevant parts of the Unicode standard, it would be great. -- nosy: +ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12729 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12748] IDLE halts on osx when copy and paste
Ezio Melotti ezio.melo...@gmail.com added the comment: Can you specify what version of Python are you using, how do you copy/paste (e.g. ctrl+c/v, from the menu), and if it halts regardless of what you copy/paste? -- nosy: +ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12748 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12725] Docs: Odd phrase floating seconds in socket.html
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset dfe6f0a603d2 by Ezio Melotti in branch '2.7': #12725: fix working. Patch by Ben Hayden. http://hg.python.org/cpython/rev/dfe6f0a603d2 New changeset ab3432a81c26 by Ezio Melotti in branch '3.2': #12725: fix working. Patch by Ben Hayden. http://hg.python.org/cpython/rev/ab3432a81c26 New changeset 49e9e34da512 by Ezio Melotti in branch 'default': #12725: merge with 3.2. http://hg.python.org/cpython/rev/49e9e34da512 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12725 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12725] Docs: Odd phrase floating seconds in socket.html
Ezio Melotti ezio.melo...@gmail.com added the comment: Fixed, thanks for the report and the patch! -- nosy: +ezio.melotti resolution: - fixed stage: needs patch - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12725 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12748] IDLE halts on osx when copy and paste
hy hoyeung...@gmail.com added the comment: I use the latest python 2.7.2 binary in a freshly installed os x I use command c and command v, and also use the menu. Also, it halts when I cut. No matter what I cut, copy and paste, it halts. It happens both in the shell and editor. I have to remind myself not to use copy and paste now. Once I forget, IDLE halts and I have to force quit it and I lost everything unsaved. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12748 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12748] IDLE halts on osx when copy and paste
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +kbk, ned.deily ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12748 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com