Re: [Python-Dev] PEP 393 Summer of Code Project

2011-09-08 Thread Terry Reedy

On 9/8/2011 6:15 PM, fwierzbi...@gmail.com wrote:

Oops, forgot to add the link for the gory details for Java and>  2 byte unicode:

http://java.sun.com/developer/technicalArticles/Intl/Supplementary/


This is dated 2004. Basically, they considered several options, tried 
out 4, and ended up sticking with char[] (sequences) as UTF-16 with char 
= 16 bit code unit and added 32-bit Character(int) class for low-level 
manipulation of code points.


I did not see the indexing problem mentioned. I get the impression that 
they encourage sequence forward-backward iteration (cursor-based access) 
rather than random-access indexing.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 Summer of Code Project

2011-09-08 Thread fwierzbi...@gmail.com
On Fri, Aug 26, 2011 at 3:00 PM, Guido van Rossum  wrote:
> I have a different question about IronPython and Jython now. Do their
> regular expression libraries support Unicode better than CPython's?
> E.g. does "." match a surrogate pair? Tom C suggests that Java's regex
> libraries get this and many other details right despite Java's use of
> UTF-16 to represent strings. So hopefully Jython's re library is built
> on top of Java's?
Even bigger oops - I answered the thread questions and not this
specific one.  Currently Jython's re is a Jython specific
implementation and so is not likely to benefit from the improvements
in Java's re implementation. I think in terms of PEP 393 this should
probably be considered a bug that we need to fix...

-Frank Wierzbicki
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 Summer of Code Project

2011-09-08 Thread fwierzbi...@gmail.com
Oops, forgot to add the link for the gory details for Java and > 2 byte unicode:

http://java.sun.com/developer/technicalArticles/Intl/Supplementary/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 Summer of Code Project

2011-09-08 Thread fwierzbi...@gmail.com
On Fri, Aug 26, 2011 at 3:00 PM, Guido van Rossum  wrote:
> I have a different question about IronPython and Jython now. Do their
> regular expression libraries support Unicode better than CPython's?
> E.g. does "." match a surrogate pair? Tom C suggests that Java's regex
> libraries get this and many other details right despite Java's use of
> UTF-16 to represent strings. So hopefully Jython's re library is built
> on top of Java's?
>
> PS. Is there a better contact for Jython?
The best contact for Unicode and Jython is Jim Baker (I added him to
the cc) - I'll do my best to answer though: Java 5 added a bunch of
methods for dealing with Unicode that doesn't fit into 2 bytes - and
looking at our code for our Unicode object, I see that we are using
methods like the codePointCount method off of java.lang.String to
compute length[1] and using similar methods all through that code to
make sure we deal in code points when dealing with unicode.  So it
looks pretty good for us as far as I can tell.

[1] 
http://download.oracle.com/javase/6/docs/api/java/lang/String.html#codePointCount(int,
int)

-Frank Wierzbicki
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Issue #12567: Fix curses.unget_wch() tests

2011-09-08 Thread Ezio Melotti
Hi,

On Tue, Sep 6, 2011 at 11:08 AM, victor.stinner
wrote:

> http://hg.python.org/cpython/rev/786668a4fb6b
> changeset:   72301:786668a4fb6b
> user:Victor Stinner 
> date:Tue Sep 06 10:08:28 2011 +0200
> summary:
>  Issue #12567: Fix curses.unget_wch() tests
>
> Skip the test if the function is missing. Use U+0061 (a) instead of U+00E9
> (é)
> because U+00E9 raises a _curses.error('unget_wch() returned ERR') on some
> buildbots. It's maybe because of the locale encoding.
>
> files:
>  Lib/test/test_curses.py |  6 --
>  1 files changed, 4 insertions(+), 2 deletions(-)
>
>
> diff --git a/Lib/test/test_curses.py b/Lib/test/test_curses.py
> --- a/Lib/test/test_curses.py
> +++ b/Lib/test/test_curses.py
> @@ -265,14 +265,16 @@
> stdscr.getkey()
>
>  def test_unget_wch(stdscr):
> -ch = '\xe9'
> +if not hasattr(curses, 'unget_wch'):
> +return
>

This should be a skip, not a bare return.


> +ch = 'a'
> curses.unget_wch(ch)
> read = stdscr.get_wch()
> read = chr(read)
> if read != ch:
> raise AssertionError("%r != %r" % (read, ch))
>

Why not just assertEqual?


>
> -ch = ord('\xe9')
> +ch = ord('a')
> curses.unget_wch(ch)
> read = stdscr.get_wch()
> if read != ch:
>
>
>
Best Regards,
Ezio Melotti
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot

2011-09-08 Thread Antoine Pitrou

Hello Jesus,

> I sorry to bother you with these details
> and waste of time, but could you possibly change my buildbot
> configurarion to launch, let's say, 4 test processes in parallel, just
> for testing?

Ok, I've added "-j4", let's how that works.

> Another option would be to have a single Python process and "fork" for
> each test. That would launch each test in a separate process without
> requiring a full python interpreter launching each time. Is this the
> way "-j" is implemented

It uses subprocess actually, so fork() + exec() is used.

> BTW, the (nice and helpful) OpenIndiana folks have told me a few hours
> ago that they would increase my swap limit to 16GB. I am now waiting
> for this change to be done.

Good news :)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com