Martin v. Löwis wrote: > M.-A. Lemburg wrote: > >>On sre character classes: I don't think that these provide >>a good approach to XML lexical classes - custom functions >>or methods or maybe even a codec mapping the characters >>to their XML lexical class are much more efficient in >>practice. > > > That isn't my experience: functions that scan XML strings > are much slower than regular expressions. I can't imagine > how a custom codec could work, so I cannot comment on that.
If all you're interested in is the lexical class of the code points in a string, you could use such a codec to map each code point to a code point representing the lexical class. Then run re as usual on the mapped Unicode string. Since the indices of the matches found in the resulting string will be the same as in the original string, it's easy to extract the corresponding data from the original string. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 10 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com