[issue1170] shlex have problems with parsing unicode

Andrew Jewett Thu, 15 Sep 2011 04:38:56 -0700

Andrew Jewett <[email protected]> added the comment:

Not to get side-tracked, but on a related note, it would be nice if there was a 
python module which defined sets of unicode characters corresponding to 
different categories (similar to the categories listed here: 
http://www.fileformat.info/info/unicode/category/index.htm)
That way, for example, if the user wants to categorically ignore ALL 
mathematical symbols or punctuation marks, they could assign:


self.wordterminators = unicode_math + unicode_punctuation.
(The + means set union.)

If somebody tried to specify all of them manually, this would be painful.  
There are hundreds of punctuation symbols in unicode, for example.  (I suppose 
most of the time, one does not need to be so thorough.  This feature not really 
necessary for getting shlex to work.  But I think this would be a easy feature 
to add.)

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue1170>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1170] shlex have problems with parsing unicode

Reply via email to