Re: [Python-3000] Regular expressions, py3k and unicode

Antoine Pitrou Sun, 29 Jun 2008 07:18:30 -0700

Mark Dickinson <dickinsm <at> gmail.com> writes:
> 
> Is there a quick way to convert a general Unicode digit to its
> ascii equivalent?  Having to run str(int(c)) on each numeric character
> sounds painful, and the Decimal constructor doesn't need to
> be any slower right now.


In C it looks like PyUnicode_EncodeDecimal() does the trick (it's used by float
and int conversion functions). What is the status of C-accelerated Decimal?

In plain Python I don't know, perhaps you could keep the fast path for ASCII
strings and have a slow fallback for unicode digits. Or suggest exporting the
above C function as a str method.
(or perhaps, simply, just disallow non-ASCII digits by using [0-9] instead of
\d. I'm not sure anybody really cares)

> I'm more worried, perhaps
> needlessly, about what other unidentified problems might be
> lurking deep in the standard library.  Any use of '\d', '\w', '\s', etc.
> might potentially be a problem.

Yes, we should do a scan of the standard library for this kind of pattern and
try to find out where there might be a problem.



_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] Regular expressions, py3k and unicode

Reply via email to