[issue1170] shlex have problems with parsing unicode

Andrew Jewett Thu, 15 Sep 2011 12:52:09 -0700

Andrew Jewett <jewett....@gmail.com> added the comment:

> That can be done programmatically using the unicodedata module.  
> The regex module (that will hopefully be include in 3.3) is 
> also able to match characters that belongs to specific categories.


Ezio:  Thanks.  (New to me, actually)  Is this what you mean?:
http://www.regular-expressions.info/unicode.html
For the purposes of patching shlex, should we use regex instead of sets of 
characters (or strings) to test for membership in shlex.wordterminators?  (Or 
should we create a different class member?  Unfortunately, I guess 
shlex.wordchars has to be left as some kind of container object to maintain 
backwards compatibility.)
Something like that would definitely solve the problem nicely.

> Andrew: Thanks for your contribution, but your patch cannot 
> go into 2.7, as we don’t add new features in stable versions

Eric: That's fine.  I just posted here because this page currently gets the top 
hit when searching for "shlex unicode".  If you think it's appropriate to 
repost my message for python version 3.4, let me know.  The issue with 
shlex.wordchars that I raised is valid for any version of python.  I'm not sure 
my solution is optimal.  (I like the regex idea).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue1170>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1170] shlex have problems with parsing unicode

Reply via email to