[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Tom Christiansen
Tom Christiansen added the comment: Ezio Melotti wrote on Mon, 15 Aug 2011 04:56:55 -: > Another thing I noticed is that (at least on wide builds) surrogate pairs are > not joined "on the fly": > >>> p > '\ud800\udc00' > >>> len(p) > 2 > >>> p.encode('utf-16').decode('utf-16') > '𐀀' > >>

[issue12266] str.capitalize contradicts oneself

2011-08-14 Thread Ezio Melotti
Ezio Melotti added the comment: Fixed, thanks for the report! -- resolution: duplicate -> fixed status: open -> closed ___ Python tracker ___ ___

[issue12266] str.capitalize contradicts oneself

2011-08-14 Thread Roundup Robot
Roundup Robot added the comment: New changeset 1ea72da11724 by Ezio Melotti in branch 'default': #12266: merge with 3.2. http://hg.python.org/cpython/rev/1ea72da11724 -- ___ Python tracker

[issue12266] str.capitalize contradicts oneself

2011-08-14 Thread Roundup Robot
Roundup Robot added the comment: New changeset c34772013c53 by Ezio Melotti in branch '3.2': #12266: Fix str.capitalize() to correctly uppercase/lowercase titlecased and cased non-letter characters. http://hg.python.org/cpython/rev/c34772013c53 New changeset eab17979a586 by Ezio Melotti in bra

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Ezio Melotti
Ezio Melotti added the comment: Keep in mind that we should be able to access and use lone surrogates too, therefore: s = '\ud800' # should be valid len(s) # should this raise an error? (or return 0.5 ;)? s[0] # error here too? list(s) # here too? p = s + '\udc00' len(p) # 1? s[0] # '\U0

[issue12672] Some problems in documentation extending/newtypes.html

2011-08-14 Thread Terry J. Reedy
Changes by Terry J. Reedy : -- stage: -> patch review ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://m

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Terry J. Reedy
Terry J. Reedy added the comment: >It is always better to deliver more than you say than to deliver less. Except when promising too little is a copout. >Everyone always talks about important they're sure O(1) access must be, I thought that too until your challenge. But now that you mention it,

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Tom Christiansen
Tom Christiansen added the comment: I wrote: >> Python's narrow builds are, in a sense, 'between' UCS-2 and UTF-16. > So I'm finding. Perhaps that's why I keep getting confused. I do have a > pretty firm > notion of what UCS-2 and UTF-16 are, and so I get sometimes > self-contradictory resu

[issue12750] datetime.datetime timezone problems

2011-08-14 Thread Daniel O'Connor
New submission from Daniel O'Connor : It isn't possible to add a timezone to a naive datetime object which means that if you are getting them from some place you can't directly control there is no way to set the TZ. eg pywws' DataStore returns naive datetime's which are in UTC. There is no way

[issue12693] test.support.transient_internet prints to stderr when verbose is false

2011-08-14 Thread Brett Cannon
Brett Cannon added the comment: The line from the source I am talking about is http://hg.python.org/cpython/file/49e9e34da512/Lib/test/support.py#l943 . And as for the output: > ./python.exe -m test -uall test_ssl > [1/1] test_ssl Resource 'i

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Tom Christiansen
Tom Christiansen added the comment: "Terry J. Reedy" wrote on Mon, 15 Aug 2011 00:26:53 -: > PS: The OSCON link in msg142036 currently gives me 404 not found Sorry, I wrote http://training.perl.com/OSCON/index.html but meant http://training.perl.com/OSCON2011/index.html

[issue11835] python (x64) ctypes incorrectly pass structures parameter

2011-08-14 Thread Vlad Riscutia
Vlad Riscutia added the comment: Changing type to behavior as it doesn't crash on 3.3. I believe issue was opened against 2.6 and Santoso changed it to 2.7 and up where there is no crash. Another data point: there is similar fix in current version of libffi here: https://github.com/atgreen/li

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Matthew Barnett
Matthew Barnett added the comment: Have a look here: http://98.245.80.27/tcpc/OSCON2011/gbu/index.html -- ___ Python tracker ___ ___

[issue12738] Bug in multiprocessing.JoinableQueue() implementation on Ubuntu 11.04

2011-08-14 Thread Michael Hall
Michael Hall added the comment: I tried switching from joining on the work_queue to just joining on the individual child processes, and it seems to work now. Weird. Anyway, it'd be nice to see the JoinableQueue fixed, but it's not pressing any more. -- ___

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Terry J. Reedy
Terry J. Reedy added the comment: Python's narrow builds are, in a sense, 'between' UCS-2 and UTF-16. They support non-BMP chars but only partially, because, BY DESIGN*, indexing and len are by code units, not codepoints. They are documented as being UCS-2 because that is what M-A Lemburg, th

[issue12672] Some problems in documentation extending/newtypes.html

2011-08-14 Thread Terry J. Reedy
Terry J. Reedy added the comment: I agree that the sentence is a bit confusing and the 'object method' ambiguous. I suspect that the sentence was written years ago. In current Python, [].append is a bound method of class 'builtin_function_or_method'. I *suspect* that the intended contrast, an

[issue12748] IDLE halts on osx when copy and paste

2011-08-14 Thread hy
hy added the comment: Thank you. I kinda know what happens now. First, I didn't made any change to IDLE after installed. Second, I'm using dvorak-qwerty. Normally the keyboard layout changes to qwerty when I press Cmd key so that I can type in Dvorak and use the short cut in qwerty. But in I

[issue12740] Add struct.Struct.nmemb

2011-08-14 Thread Raymond Hettinger
Raymond Hettinger added the comment: In general, I think we can prevent confusion about the meaning of __len__ by sticking to the general rule: len(object)==len(list(obj)) for anything that produces an iterable result. In the case of struct, that would be the length of the tuple returned by

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Terry J. Reedy
Terry J. Reedy added the comment: This is off-topic, but there was discussion on whether or not to have a 2.7. The decision was to focus on back-porting things that would make the eventual transition to 3.x easier. -- ___ Python tracker

[issue11835] python (x64) ctypes incorrectly pass structures parameter

2011-08-14 Thread Stefan Krah
Changes by Stefan Krah : -- nosy: +amaury.forgeotdarc, belopolsky ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Terry J. Reedy
Terry J. Reedy added the comment: Tom, I appreciate your taking the time to help us improve our Unicode story. I agree that the compromises made a decade ago need to be revisited and revised. I think it will help if you better understand our development process. Our current *intent* is that '

[issue11835] python (x64) ctypes incorrectly pass structures parameter

2011-08-14 Thread Vlad Riscutia
Vlad Riscutia added the comment: Attached patch for this issue. This only happens on MSVC x64 (I actually tired to repro on Arch Linux x64 before starting work on it and it didn't repro). What happens is that MSVC on x64 always passes structures larger than 8 bytes by reference. See here: h

[issue12740] Add struct.Struct.nmemb

2011-08-14 Thread Stefan Krah
Stefan Krah added the comment: Just to throw in a new name: Struct.nitems would also be possible. -- ___ Python tracker ___ ___ Pytho

[issue10744] ctypes arrays have incorrect buffer information (PEP-3118)

2011-08-14 Thread Stefan Krah
Stefan Krah added the comment: Thanks for the patch. I agree with the interpretation of the format string. One thing is unclear though: Using this interpretation the multi-dimensional array notation in format strings only seems useful for pointers to arrays. The PEP isn't so clear on that, wou

[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Ezio Melotti
Ezio Melotti added the comment: BTW, you can find more information about the one-dir-per-clone setup (and other useful info) here: http://docs.python.org/devguide/committing.html#using-several-working-copies -- ___ Python tracker

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Ezio Melotti
Ezio Melotti added the comment: 2.7 is the last 2.x. There won't be any 2.8 (also I never heard that 2.6 was supposed to be the last). We already have 2.7.2, and we will continue with 2.7.3, 2.7.4, etc for a few more years. Eventually 2.7 will only get security fixes and the development wil

[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Ezio Melotti
Ezio Melotti added the comment: > Perhaps I am doing something wrong? That's weird, I tried on a wide Python 2.6.6 too and it works even there. Maybe a bug that got fixed between 2.6.2 and 2.6.6? Or maybe something else? > Is there a way to easily have these co-exist on the same system? He

[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Matthew Barnett
Matthew Barnett added the comment: On a narrow build, "\N{MATHEMATICAL SCRIPT CAPITAL A}" is stored as 2 code units, and neither re nor regex recombine them when compiling a regex or looking for a match. regex supports \xNN, \u and \U and \N{XYZ} itself, so they can be used in a

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Tom Christiansen
Tom Christiansen added the comment: Ezio Melotti wrote on Sun, 14 Aug 2011 17:46:55 -: >> I'm a bit confused on this. You no longer fix bugs in Python 2? > We do, but it's unlikely that we will introduce major changes in behavior. > Even if we had to get rid of narrow builds and/or

[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Tom Christiansen
Tom Christiansen added the comment: Ezio Melotti wrote on Sun, 14 Aug 2011 17:15:52 -: >> You're right: my wide build is not Python3, just Python2. > And is it failing? Here the tests pass on the wide builds, on both Python 2 > and 3. Perhaps I am doing something wrong? linux% py

[issue12740] Add struct.Struct.nmemb

2011-08-14 Thread Antoine Pitrou
Antoine Pitrou added the comment: > It looks like the choice is between s.nmembers and len(s). I thought > about len(s), but since Struct.pack() returns a bytes object, this > might be confusing. I agree there's a risk of confusion between len()-number-of-elements and size()-number-of-bytes. We

[issue12740] Add struct.Struct.nmemb

2011-08-14 Thread Stefan Krah
Stefan Krah added the comment: I like random tests in the stdlib, otherwise the same thing gets tested over and over again. `make buildbottest` prints the seed, and you can do it for a single test as well: $ ./python -m test -r test_heapq Using random seed 5857004 [1/1] test_heapq 1 test OK.

[issue12611] 2to3 crashes when converting doctest using reduce()

2011-08-14 Thread Catalin Iacob
Catalin Iacob added the comment: I looked at this and understood why it's happening. I don't know exactly how to fix it though, so here's what I found out. When a doctest appears in a docstring at line n in a file, RefactorTool.parse_block will return a tree corresponding to n - 1 newline ch

[issue12266] str.capitalize contradicts oneself

2011-08-14 Thread Ezio Melotti
Ezio Melotti added the comment: Attached patch + tests. -- keywords: +patch Added file: http://bugs.python.org/file22898/issue12266.diff ___ Python tracker ___ _

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Ezio Melotti
Ezio Melotti added the comment: > I'm a bit confused on this. You no longer fix bugs in Python 2? We do, but it's unlikely that we will introduce major changes in behavior. Even if we had to get rid of narrow builds and/or fix len(), we would probably only do it in the next 3.x version (i.e.

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Antoine Pitrou
Antoine Pitrou added the comment: > > The UTF-8 codec described by RFC 2279 didn't say so, so, since our > > codec was following RFC 2279, it was producing valid UTF-8. With RFC > > 3629 a number of things changed in a non-backward compatible way. > > Therefore we couldn't just change the behav

[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Antoine Pitrou
Antoine Pitrou added the comment: > I have private builds that are 2.7 and 3.2, but those are both narrow. > I do not have a 3.3 build. Should I? I don't know if you *should*. But you can make one easily by passing "--with-wide-unicode" to ./configure. --

[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Ezio Melotti
Ezio Melotti added the comment: > You're right: my wide build is not Python3, just Python2. And is it failing? Here the tests pass on the wide builds, on both Python 2 and 3. > In fact, it's even worse, because it's the stock build on Linux, > which seems on this machine to be 2.6 not 2.7.

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Tom Christiansen
Tom Christiansen added the comment: Ezio Melotti wrote on Sun, 14 Aug 2011 07:15:09 -: > For example I don't think removing the 0x10 upper limit is going to > happen -- even if it might be useful for other things. I agree entirely. That's why I appended a triple exclamation poin

[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Tom Christiansen
Tom Christiansen added the comment: >Ezio Melotti added the comment: >On wide 3.2 it passes too, so the failure is limited to narrow builds (are = >you sure that it fails on wide builds for you?). You're right: my wide build is not Python3, just Python2. In fact, it's even worse, because it'

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Tom Christiansen
Tom Christiansen added the comment: Ezio Melotti wrote on Sun, 14 Aug 2011 07:15:09 -: >> Unicode says you can't put surrogates or noncharacters in a >> UTF-anything stream. It's a bug to do so and pretend it's a >> UTF-whatever. > The UTF-8 codec described by RFC 2279 didn't say so,

[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Ezio Melotti
Ezio Melotti added the comment: The error on 3.2 comes from the lru_cache, here's a minimal testcase to reproduce it: >>> from functools import lru_cache >>> @lru_cache() ... def func(arg): raise ValueError() ... >>> func(3) Traceback (most recent call last): File "/home/wolf/dev/py/3.2/Lib/

[issue12748] IDLE halts on osx when copy and paste

2011-08-14 Thread Ned Deily
Ned Deily added the comment: That is encouraging. This is almost certainly a problem with Tk. The Cocoa Tcl/Tk 8.5 used by Apple and ActiveState has been known to have issues with composite characters. There are a couple of IDLE things to ask about first. Have you made any Custom Key Bind

[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Ezio Melotti
Ezio Melotti added the comment: I haven't looked at the code, but I think that the re module is just trying to calculate the range between the low surrogate of π’œ and the high surrogate of 𝒡. If this is the case, this is the "usual bug" that narrow builds have. Also note that re.search(u"[\N{MA

[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Ezio Melotti
Ezio Melotti added the comment: On wide 3.2 it passes too, so the failure is limited to narrow builds (are you sure that it fails on wide builds for you?). On a narrow 2.7 I get a slightly different error though: match 1 passed Traceback (most recent call last): File "bigrange.py", line 16,

[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Ezio Melotti
Ezio Melotti added the comment: On a wide 2.7 and 3.3 all the 3 tests pass. On a narrow 3.2 I get match 1 passed Traceback (most recent call last): File "/home/wolf/dev/py/3.2/Lib/functools.py", line 176, in wrapper result = cache[key] KeyError: (, '[π’œ-𝒡]', 32) During handling of the ab

[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Tom Christiansen
New submission from Tom Christiansen : On neither narrow nor wide builds does this UTF8-encoded bit run without raising an exception: if re.search("[π’œ-𝒡]", "π’ž", re.UNICODE): print("match 1 passed") else: print("match 2 failed") The best you can possibly do is to use both

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Jeremy Kloth
Changes by Jeremy Kloth : -- nosy: +jkloth ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.o

[issue12748] IDLE halts on osx when copy and paste

2011-08-14 Thread hy
hy added the comment: Thanks but the problem is not completely solved I followed your instruction and I can now use mouse to click the menu to copy and paste without problems. But it still halts when using keyboard to do so. Is there a complete solution? -- resolution: works for me ->

[issue12748] IDLE halts on osx when copy and paste

2011-08-14 Thread Ned Deily
Ned Deily added the comment: Chances are that you used the python.org 2.7.2 64-bit/32-bit installer but you did not install the latest ActiveState Tcl, currently 8.5.10, as documented here: http://www.python.org/download/mac/tcltk/ On OS X 10.6, there should have been a warning message a

[issue12743] C API marshalling doc contains XXX

2011-08-14 Thread Martin v . LΓΆwis
Martin v. LΓΆwis added the comment: Would you just remove the "XXX" string, or the entire comment? "XXX" is typically used to indicate that something needs to be done, and the comment makes a clear statement as to what it is that needs to be done. -- nosy: +loewis

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Ezio Melotti
Ezio Melotti added the comment: > If speed is more important than correctness, I can make any algorithm > infinitely fast. Given the choice between correct and quick, I will > take correct every single time. It's a trade-off. Using non-BMP chars is fairly unusual (many real-world applicatio