Re: [Python-Dev] New Py_UNICODE doc

M.-A. Lemburg Tue, 10 May 2005 02:19:23 -0700

Martin v. L�wis wrote:
> M.-A. Lemburg wrote:
> 
>>On sre character classes: I don't think that these provide
>>a good approach to XML lexical classes - custom functions
>>or methods or maybe even a codec mapping the characters
>>to their XML lexical class are much more efficient in
>>practice.
> 
> 
> That isn't my experience: functions that scan XML strings
> are much slower than regular expressions.  I can't imagine
> how a custom codec could work, so I cannot comment on that.


If all you're interested in is the lexical class of the code points
in a string, you could use such a codec to map each code point
to a code point representing the lexical class. Then run re
as usual on the mapped Unicode string. Since the indices of
the matches found in the resulting string will be the same as
in the original string, it's easy to extract the corresponding
data from the original string.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 10 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New Py_UNICODE doc

Reply via email to