Re: [Python-Dev] PEP 393 Summer of Code Project
On 9/8/2011 6:15 PM, fwierzbi...@gmail.com wrote: Oops, forgot to add the link for the gory details for Java and> 2 byte unicode: http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ This is dated 2004. Basically, they considered several options, tried out 4, and ended up sticking with char[] (sequences) as UTF-16 with char = 16 bit code unit and added 32-bit Character(int) class for low-level manipulation of code points. I did not see the indexing problem mentioned. I get the impression that they encourage sequence forward-backward iteration (cursor-based access) rather than random-access indexing. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 Summer of Code Project
On Fri, Aug 26, 2011 at 3:00 PM, Guido van Rossum wrote: > I have a different question about IronPython and Jython now. Do their > regular expression libraries support Unicode better than CPython's? > E.g. does "." match a surrogate pair? Tom C suggests that Java's regex > libraries get this and many other details right despite Java's use of > UTF-16 to represent strings. So hopefully Jython's re library is built > on top of Java's? Even bigger oops - I answered the thread questions and not this specific one. Currently Jython's re is a Jython specific implementation and so is not likely to benefit from the improvements in Java's re implementation. I think in terms of PEP 393 this should probably be considered a bug that we need to fix... -Frank Wierzbicki ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 Summer of Code Project
Oops, forgot to add the link for the gory details for Java and > 2 byte unicode: http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 Summer of Code Project
On Fri, Aug 26, 2011 at 3:00 PM, Guido van Rossum wrote: > I have a different question about IronPython and Jython now. Do their > regular expression libraries support Unicode better than CPython's? > E.g. does "." match a surrogate pair? Tom C suggests that Java's regex > libraries get this and many other details right despite Java's use of > UTF-16 to represent strings. So hopefully Jython's re library is built > on top of Java's? > > PS. Is there a better contact for Jython? The best contact for Unicode and Jython is Jim Baker (I added him to the cc) - I'll do my best to answer though: Java 5 added a bunch of methods for dealing with Unicode that doesn't fit into 2 bytes - and looking at our code for our Unicode object, I see that we are using methods like the codePointCount method off of java.lang.String to compute length[1] and using similar methods all through that code to make sure we deal in code points when dealing with unicode. So it looks pretty good for us as far as I can tell. [1] http://download.oracle.com/javase/6/docs/api/java/lang/String.html#codePointCount(int, int) -Frank Wierzbicki ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Issue #12567: Fix curses.unget_wch() tests
Hi, On Tue, Sep 6, 2011 at 11:08 AM, victor.stinner wrote: > http://hg.python.org/cpython/rev/786668a4fb6b > changeset: 72301:786668a4fb6b > user:Victor Stinner > date:Tue Sep 06 10:08:28 2011 +0200 > summary: > Issue #12567: Fix curses.unget_wch() tests > > Skip the test if the function is missing. Use U+0061 (a) instead of U+00E9 > (é) > because U+00E9 raises a _curses.error('unget_wch() returned ERR') on some > buildbots. It's maybe because of the locale encoding. > > files: > Lib/test/test_curses.py | 6 -- > 1 files changed, 4 insertions(+), 2 deletions(-) > > > diff --git a/Lib/test/test_curses.py b/Lib/test/test_curses.py > --- a/Lib/test/test_curses.py > +++ b/Lib/test/test_curses.py > @@ -265,14 +265,16 @@ > stdscr.getkey() > > def test_unget_wch(stdscr): > -ch = '\xe9' > +if not hasattr(curses, 'unget_wch'): > +return > This should be a skip, not a bare return. > +ch = 'a' > curses.unget_wch(ch) > read = stdscr.get_wch() > read = chr(read) > if read != ch: > raise AssertionError("%r != %r" % (read, ch)) > Why not just assertEqual? > > -ch = ord('\xe9') > +ch = ord('a') > curses.unget_wch(ch) > read = stdscr.get_wch() > if read != ch: > > > Best Regards, Ezio Melotti ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot
Hello Jesus, > I sorry to bother you with these details > and waste of time, but could you possibly change my buildbot > configurarion to launch, let's say, 4 test processes in parallel, just > for testing? Ok, I've added "-j4", let's how that works. > Another option would be to have a single Python process and "fork" for > each test. That would launch each test in a separate process without > requiring a full python interpreter launching each time. Is this the > way "-j" is implemented It uses subprocess actually, so fork() + exec() is used. > BTW, the (nice and helpful) OpenIndiana folks have told me a few hours > ago that they would increase my swap limit to 16GB. I am now waiting > for this change to be done. Good news :) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com