[issue1170] shlex have problems with parsing unicode

2021-09-29 Thread Andrew Jewett


Andrew Jewett  added the comment:

Alright.  I'll think about it a little more and post my suggestion there, 
perhaps.  Thanks Victor.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2021-09-29 Thread Andrew Jewett

Andrew Jewett  added the comment:

After posting that, I noticed that the second example I listed in my previous 
post (a language where words contain any non-whitespace, non-parenthesis 
character) can now be implemented in the current version of shlex.py by setting 
​"whitespace_true" and "punctuation".  (Sorry, it's been a while since I looked 
at shlex.py, and it's had some usefl new features.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2021-09-29 Thread STINNER Victor


STINNER Victor  added the comment:

> I would like to suggest making this change (or something similar) to the 
> official version of "shlex.py".  Would sending an email to 
> "python-id...@python.org" be a good place to make this proposal?

Yes, python-ideas is a good place to start discussion such idea. This issue is 
closed, if you discuss it here, you will get a limited audience.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2021-09-29 Thread wombat


wombat  added the comment:

The error messages may have gone away, but the underlying unicode limitations I 
mentioned remain:

Suppose you wanted to use shlex to build a parser for Chinese text.  Would you 
have to set "wordchars" to a string containing every possible Chinese character?

I myself wrote a parser for a crude language where words can contain any 
character except for whitespace and parenthesis.  I needed a way to specify the 
characters which cannot belong to a word.  (That's how I solved the problem.  I 
modified shlex.py and added a "wordterminators" member.  If "wordterminators" 
was left blank, then "wordchars" were used instead.  This was a trivial change 
to "shlex.py" and it added a lot of functionality.)

I would like to suggest making this change (or something similar) to the 
official version of "shlex.py".  Would sending an email to 
"python-id...@python.org" be a good place to make this proposal?

--
nosy: +jewett-aij

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2021-09-20 Thread STINNER Victor


STINNER Victor  added the comment:

This issue has been fixed in Python 3 by using Unicode rather than bytes in 
shlex. Python 2 users: it's time to upgrade to Python 3 ;-)

--
resolution:  -> fixed
stage: needs patch -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2021-09-19 Thread Matej Cepl


Matej Cepl  added the comment:

I cannot reproduce it with the current 3.* version. Did anybody reproduce with 
3.5?

Otherwise, I suggest close this, as a 2.* bug.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2019-07-29 Thread STINNER Victor


STINNER Victor  added the comment:

This issue is 12 years old has 3 patches: it's far from being "newcomer 
friendly", I remove the "Easy" label.

--
keywords:  -easy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2014-06-29 Thread Alexander Belopolsky

Changes by Alexander Belopolsky alexander.belopol...@gmail.com:


--
assignee: belopolsky - 
versions: +Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-10-22 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

$ ./python 
Python 2.7.2+ (2.7:27ae7d4e1983+, Oct 23 2011, 00:09:06) 
[GCC 4.6.1] on linux2
Type help, copyright, credits or license for more information.
 import shlex
 shlex.split(u'Hello, World!')
['Hello,', 'World!']

This bug was fixed indirectly by a StringIO fix in 27ae7d4e1983, for #1548891.  
BTW, this report was a duplicate of #6988, closed a year ago.

Python 2.7.3 will finally support unicode in shlex, so the doc change requested 
in this report is outdated.  However, I still want to do something for this.  
I’ve noticed that shlex.split’s argument can be a file-like object, and I 
wonder if passing a StringIO.StringIO(my_unicode_string) wouldn’t work.  If 
such a short recipe works, I’m all for including it in the 2.7 docs for users 
of older versions.  If a longer recipe is needed, then ActiveState’s Python 
Cookbook would be more appropriate, and I’ll add a link to the docs.  If it’s 
very long and requires a PyPI project, then I’m willing to link to that.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-10-22 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

The second message in this page reports that StringIO.StringIO works, but when 
I pass a unicode string with non-ASCII chars there’s a method call that fails 
because of implicit unicode-to-str conversion:

Traceback (most recent call last):
  File /usr/lib/python2.7/shlex.py, line 150, in read_token
elif nextchar in self.wordchars:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 63: 
ordinal not in range(128)

I’ll try to create a Shlex instance, replace self.wordchars with a decoded 
version and try again.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-09-17 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

Andrew: Ezio means http://docs.python.org/2.7/library/unicodedata

 For the purposes of patching shlex
Sorry, but we are not talking about patching shlex.

 I just posted here because this page currently gets the top hit
 when searching for shlex unicode.
It’s okay.  A recipe on ActiveState and a “shlexu” module on PyPI would also be 
good things to have.

 If you think it's appropriate to repost my message for python version 3.4,
 let me know.
shlex supports Unicode in 3.x.  If there is a bug, can you please open another 
bug report?  This one is already too long, and I’d prefer to keep it focused on 
the need for a documentation patch.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-09-17 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Ezio, I don't see any indication in this ticket that this bug was actually 
*fixed* in 3.x.  Unicode doesn't cause immediate errors in 3.x, but it isn't 
recognized as wordchars, etc.  Am I missing something?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-09-17 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

I haven't looked at the shlex code (yet), my comment was just about the idea of 
adding constants with chars that belong to different Unicode categories.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-09-15 Thread Andrew Jewett

Andrew Jewett jewett@gmail.com added the comment:

Proposed solution and patch to follow.  Please let me know if I am posting it 
in the wrong place.

The main problem with shlex is that the shlex interface is inadequate to handle 
unicode.  Specifically it is no longer feasible to provide a list of every 
possible character that the user could want to appear within a token.  Suppose 
the user wants the ability to parse words in simplified Chinese.  If I 
understand correctly, then currently, they would have to set self.wordchars 
to a string (or some other container) of 6000 (unicode) characters, and this 
enormous string would need to be searched each time a new character is read.  
This was a problem with shlex from the beginning, but it became more acute when 
support for unicode was added.  Generally, in some cases, it is much more 
convenient instead to specify a short list of characters you -don't- want to 
appear in a word (word delimiters), than to list all the characters you do.

An obvious (although perhaps not optimal) solution is to add an additional data 
member to shlex, consisting of the characters which terminate the reading of a 
token.  (In other words, the set-inverse of wordchars.)  In the attached 
example code, I call it self.wordterminators.  To remain backwards-compatible 
with shlex, self.wordterminators is empty by default.  But if not-empty, 
self.wordterminators overrides self.wordchars.

I've been distributing a customized version of shlex with my own software which 
implements this modest change (shlex_wt).  (See attachment.)  It is otherwise 
identical to the version of shlex.py that ships with python 3.2.2.  (It has 
been further modified only slightly to be compatible with both python 2.7 and 
python 3.)  It's not beautiful code, but it seems to be a successful kluge for 
this particular issue.  I don't know if it makes a worthy patch, but perhaps 
somebody out there finds it useful.  To make it easy to spot the changes, each 
of the lines I changed ends in a comment #WORDTERMINATORS.  (There are only 
15 of these lines.)
-Andrew Jewett

--
nosy: +wombat
versions:  -Python 2.7
Added file: http://bugs.python.org/file23161/shlex_wt.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-09-15 Thread Andrew Jewett

Andrew Jewett jewett@gmail.com added the comment:

Not to get side-tracked, but on a related note, it would be nice if there was a 
python module which defined sets of unicode characters corresponding to 
different categories (similar to the categories listed here: 
http://www.fileformat.info/info/unicode/category/index.htm)
That way, for example, if the user wants to categorically ignore ALL 
mathematical symbols or punctuation marks, they could assign: 

self.wordterminators = unicode_math + unicode_punctuation.
(The + means set union.)

If somebody tried to specify all of them manually, this would be painful.  
There are hundreds of punctuation symbols in unicode, for example.  (I suppose 
most of the time, one does not need to be so thorough.  This feature not really 
necessary for getting shlex to work.  But I think this would be a easy feature 
to add.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-09-15 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

That can be done programmatically using the unicodedata module.  The regex 
module (that will hopefully be include in 3.3) is also able to match characters 
that belongs to specific categories.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-09-15 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

Andrew: Thanks for your contribution, but your patch cannot go into 2.7, as we 
don’t add new features in stable versions (re-read the whole thread if you need 
more info).  This report is still open because we need a doc patch to explain 
how to work around that.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-09-15 Thread Andrew Jewett

Andrew Jewett jewett@gmail.com added the comment:

 That can be done programmatically using the unicodedata module.  
 The regex module (that will hopefully be include in 3.3) is 
 also able to match characters that belongs to specific categories.

Ezio:  Thanks.  (New to me, actually)  Is this what you mean?:
http://www.regular-expressions.info/unicode.html
For the purposes of patching shlex, should we use regex instead of sets of 
characters (or strings) to test for membership in shlex.wordterminators?  (Or 
should we create a different class member?  Unfortunately, I guess 
shlex.wordchars has to be left as some kind of container object to maintain 
backwards compatibility.)
Something like that would definitely solve the problem nicely.

 Andrew: Thanks for your contribution, but your patch cannot 
 go into 2.7, as we don’t add new features in stable versions

Eric: That's fine.  I just posted here because this page currently gets the top 
hit when searching for shlex unicode.  If you think it's appropriate to 
repost my message for python version 3.4, let me know.  The issue with 
shlex.wordchars that I raised is valid for any version of python.  I'm not sure 
my solution is optimal.  (I like the regex idea).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-09-02 Thread Éric Araujo

Changes by Éric Araujo mer...@netwok.org:


--
components: +Documentation -Library (Lib), Unicode
keywords: +easy
stage: test needed - needs patch
versions: +Python 2.7 -Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-07-18 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

We all recognize that ASCII is very much limited and that the real way to work 
with strings is Unicode.  However, here our hands are tied by our development 
process: shlex in 2.x does not support Unicode, adding that support would be a 
new feature, and 2.7 is closed to new features.  If shlex was supposed to 
support Unicode, then this would be a bug that could be fixed in 2.7, but it’s 
not.  All we can do is improve the 2.7 doc to show how to work around that 
(splitting on bytes and then decoding each chunk, for example).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-07-18 Thread Doug Hellmann

Doug Hellmann doug.hellm...@gmail.com added the comment:

Is unicode supported by shlex in 3.x already? It's curious that unicode support 
is considered a new feature, rather than a bug. I understand wanting to 
allocate development resources carefully, though. If someone were to prepare a 
patch, would it even have a chance of being accepted in 2.7?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-07-18 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

See http://bugs.python.org/issue1170#msg106424 and following.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-07-18 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

It’s not about allocating resources, it’s about following process.  The first 
part is that we don’t add new features to stable releases, the second item is 
that this is not considered a bug fix: The code pre-dates Unicode, was not 
updated to support it, and the docs say “The shlex module currently does not 
support Unicode input”.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-07-17 Thread Santiago Romero

Santiago Romero srom...@gmail.com added the comment:

 It would be good to hear a strong argument from
 the user that how did he end up passing
 unicode to shlex.split? It is for parsing command
 line args for programs and personally have not
 seen those cases.

 I'm from Spain: I personally write programs and python/bash scripts that 
accept unicode input arguments. And I'm currently writing a wxpython gui 
program (SSH/rdesktop launcher, in Spanish) that needs (and cannot) do 
shlex.split in of a given text string.

 Remember that English is NOT the most spoken language on the world. It's the 
THIRD most spoken language, after Chinese and Spanish, languages that need and 
use unicode support.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-07-17 Thread Chris Rebert

Changes by Chris Rebert pyb...@rebertia.com:


--
nosy:  -cvrebert

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-07-17 Thread Doug Hellmann

Doug Hellmann doug.hellm...@gmail.com added the comment:

Right. Any program that needs to parse command lines containing filenames or 
other arguments with unicode characters will encounter this problem.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: [issue1170] shlex have problems with parsing unicode

2011-07-16 Thread Senthil Kumaran
TypeError should be okay. But I am still -0 on that. It would be good
to hear a strong argument from the user that how did he end up passing
unicode to shlex.split? It is for parsing command line args for
programs and personally have not seen those cases. Or did he want
unicode everywhere if we was using just using ascii characters.
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-07-13 Thread Santiago Romero

Santiago Romero srom...@gmail.com added the comment:

I think I'm suffering the same problem in some small programs that use shlex:


 import shlex

 text = python and shlex
 shlex.split(text)
['python', 'and', 'shlex']

 text = upython and shlex
 shlex.split(text)
['p\x00\x00\x00y\x00\x00\x00t\x00\x00\x00h\x00\x00\x00o\x00\x00\x00n\x00\x00\x00',
 '\x00\x00\x00a\x00\x00\x00n\x00\x00\x00d\x00\x00\x00', 
'\x00\x00\x00s\x00\x00\x00h\x00\x00\x00l\x00\x00\x00e\x00\x00\x00x\x00\x00\x00']


 I'm currently using the following basic workaround (while assuming that my 
strings have only ascii chars):

 [ x.replace(\0, ) for x in shlex.split(text) ]
['python', 'and', 'shlex']

 It would be very nice if shlex could work with unicode strings ...

 Thanks.

--
nosy: +Santiago.Romero

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-07-13 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-07-13 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

This isn't going to get fixed in 2.x (shlex doesn't support unicode in 2.x, and 
doing so would be a new feature).  In 3.x all strings are unicode, so the 
problem you are seeing doesn't exist.  This issue is about the broader problem 
of what counts as a word character when more than ASCII is involved.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-07-13 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

Would raising a TypeError if the given argument is a unicode be unacceptable 
for 2.7?  It would at least make things clear.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2011-01-14 Thread Doug Hellmann

Changes by Doug Hellmann doug.hellm...@gmail.com:


--
nosy: +doughellmann

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-11-30 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

Adding #10587 because we need to figure out the exact meaning of str.isspace() 
etc. first.  It is possible that for proper operation shlex should consult 
unicodedata directly.

--
dependencies: +Document the meaning of str methods

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-11-30 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

The key requirement to consider for in POSIX compatible mode is, well, POSIX 
compatibility, which is defined in

http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html
http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_03

Now, POSIX declares that what blank is depends on LC_CTYPE (character class 
blank). I'd argue that if the objective is to behave exactly like the shell, it 
really should be doing that (i.e. work in a locale-aware manner).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-08-04 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

I don't like my patch anymore because it breaks code that manipulates public 
wordchars attribute.  Users may want to set it to their own alphabet or append 
additional characters to the default list.  Maybe wordchars should always be 
non-posix wordchars and iswordchar posix mode test be c.isalnum() or c in 
self.wordchars?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

Fernando,

Is this 2.7 only problem?  In 3.2


 list(shlex.shlex('ab'))
['ab']

and bytes are not supported.

 list(shlex.shlex(b'ab'))
Traceback (most recent call last):
..
AttributeError: 'bytes' object has no attribute 'read'

It is debatable whether either is a bug.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Fernando Perez

Fernando Perez fdo.pe...@gmail.com added the comment:

Yes, sorry that I failed to mention the example I gave applies only to 2.x, not 
to 3.x.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

On Tue, Jul 27, 2010 at 2:26 PM, Fernando Perez rep...@bugs.python.org wrote:
..
 Yes, sorry that I failed to mention the example I gave applies only to 2.x, 
 not to 3.x.

Why do you expect shlex to work with unicode in 2.x?  The
documentation clearly says that the argument should be a string.
Supporting unicode is not an unreasonable RFE, but won't be considered
for 2.x anymore.

What's your take on accepting bytes in 3.x?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Fernando Perez

Fernando Perez fdo.pe...@gmail.com added the comment:

On Tue, Jul 27, 2010 at 11:52, Alexander Belopolsky
rep...@bugs.python.org wrote:
 Why do you expect shlex to work with unicode in 2.x? =A0The
 documentation clearly says that the argument should be a string.
 Supporting unicode is not an unreasonable RFE, but won't be considered
 for 2.x anymore.

Well, I didn't make the original report, just provided a short,
illustrative example :)  It's easy enough to work around the issue for
2.x that I don't care too much about it, so I have no problem with 2.x
staying as it is.

 What's your take on accepting bytes in 3.x?

Mmh... Not too sure.  I'd think about it from the perspective of what
possible sources of input could produce raw bytes, that would be
reasonable use cases for shlex.  Is it common in 3.x to read a file in
bytes mode?  If so, then it might be a good reason to have shlex parse
bytes as well, since I can imagine reading inputs from files to be
parsed via shlex.

But take my opinion on 3.x with a big grain of salt, I have very
little experience with it as of yet.

Cheers,

f

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Raymond Hettinger

Raymond Hettinger rhettin...@users.sourceforge.net added the comment:

+1 on get shlex to work better with Unicode.  The core concepts of this module 
are general purpose and applicable to all kinds of text.

--
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

On Tue, Jul 27, 2010 at 3:04 PM, Raymond Hettinger
rep...@bugs.python.org wrote:

 Raymond Hettinger rhettin...@users.sourceforge.net added the comment:

 +1 on get shlex to work better with Unicode.

In 2.7.x?  It more or less works in 3.x already.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Alexander: the more or less is on the less side when dealing with non-ASCII 
letters, I think.  See my msg109292 and your own followups.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

David,

What do you think about attached patch?  Would that be a change in the more 
direction?

--
Added file: http://bugs.python.org/file18224/issue1170.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

I am adding MvL to nosy.

Martin,

I believe you are the ultimate authority on how to tokenize a unicode stream.

--
nosy: +loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-07-25 Thread Fernando Perez

Fernando Perez fdo.pe...@gmail.com added the comment:

Here is an illustration of the problem with a simple test case (the value of 
the posix flag doesn't make any difference):

Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
[GCC 4.4.3] on linux2
Type help, copyright, credits or license for more information.
 import shlex
 list(shlex.shlex('ab'))
['ab']
 list(shlex.shlex(u'ab', posix=True))
['a', '\x00', '\x00', '\x00', 'b', '\x00', '\x00', '\x00']
 list(shlex.shlex(u'ab', posix=False))
['a', '\x00', '\x00', '\x00', 'b', '\x00', '\x00', '\x00']


--
nosy: +fperez

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-07-19 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

As discussed in msg110828 under issue9308, it is not clear whether logic 
identifying word characters in shlex is correct in presence of unicode.

--
assignee:  - belopolsky
keywords: +patch
nosy: +belopolsky

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-07-19 Thread Alexander Belopolsky

Changes by Alexander Belopolsky belopol...@users.sourceforge.net:


___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-07-19 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

I believe the e-mail thread that culminated in r32284, Implemented posix-mode 
parsing support in shlex.py, was shellwords from April 2003:
http://mail.python.org/pipermail/python-dev/2003-April/034670.html

I scanned through the messages, but could not find a reference to the standard 
that was implemented.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-07-04 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

shlex may use unicode in py3k, but since the file still starts with a latin-1 
coding cookie and the posix logic hasn't been changed, I suspect that it does 
not work correctly (ie: does not correctly identify word characters, per 
msg55969).

It's too late for 2.7 I think, but it seems there is work still to do in py3k.

--
nosy: +mark.dickinson, r.david.murray
stage:  - unit test needed
versions: +Python 3.1, Python 3.2 -Python 2.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-07-04 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2010-05-25 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

shlex in 3.x works with Unicode strings. Is it still time to try to fix this 
bug for 2.7?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2009-08-22 Thread Benjamin Peterson

Benjamin Peterson benja...@python.org added the comment:

The patch needs tests before it can be applied. Additionally, I'm not
sure if having a utf option is helpful. Is there a reason not to have
unicode support by default?

--
nosy: +benjamin.peterson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2009-08-21 Thread Chris Rebert

Changes by Chris Rebert pyb...@rebertia.com:


--
nosy: +cvrebert

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2008-12-17 Thread Nicolau Leal Werneck

Nicolau Leal Werneck nwern...@gmail.com added the comment:

Hello. I tried to patch my own shlex, and this doens't seem to be
working properly. When I try the patched module isntead of th eoriginal,
in my otherwise working program, I get the result ahead.

Is there any conversion steps missing?...


mymachine$ python interp.py  exemplo.prg
Traceback (most recent call last):
  File interp.py, line 11, in module
tok = ss.get_token()
  File shlexutf.py, line 103, in get_token
raw = self.read_token()
  File shlexutf.py, line 139, in read_token
nextcategory = unicodedata.category(nextchar)
TypeError: category() argument 1 must be unicode, not str

--
nosy: +nwerneck

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2008-12-17 Thread Nicolau Leal Werneck

Nicolau Leal Werneck nwern...@gmail.com added the comment:

OK, it worked after I found out I didn't know how to open unicode
files... Sorry for the noise, and thanks for this patch! :]

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1170
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2008-02-27 Thread Matej Cepl

Changes by Matej Cepl:


--
nosy: +mcepl

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1170
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2007-09-18 Thread Sean Reifschneider

Changes by Sean Reifschneider:


--
priority:  - normal

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1170
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2007-09-17 Thread dexen deVries

Changes by dexen deVries:


--
components: Library (Lib), Unicode
severity: normal
status: open
title: shlex have problems with parsing unicode
type: behavior
versions: Python 2.5

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1170
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2007-09-17 Thread dexen deVries

New submission from dexen deVries:

Feeding unicode to shlex object created in POSIX compat mode causes 
UnicodeDecodeError to be raised. It appears that shlex object defines 
sting .wordchars, containing latin-1 (iso8859-1) encoded characters 
with charcodes =128, which is used to check whether a character from 
input constitues a word character or not.

--
nosy: +dexen

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1170
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2007-09-17 Thread dexen deVries

dexen deVries added the comment:

A quick paste to illustrate: the exception is raised only when unicode 
object is passed to shlex. Warning: the cStringIO module is unsuitable 
for use there, only the StringIO. cStringIO does not output unicode.


dexen!muraena!~$ python
Python 2.5.1 (r251:54863, May  4 2007, 16:52:23)
[GCC 4.1.2] on linux2
Type help, copyright, credits or license for more information.
 from StringIO import StringIO
 import shlex
 lx = shlex.shlex( StringIO( unicode( abc ) ) )
 lx.get_token()
u'abc'
 lx = shlex.shlex( StringIO( unicode( abc ) ), None, True )
 lx.get_token()
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib/python2.5/shlex.py, line 96, in get_token
raw = self.read_token()
  File /usr/lib/python2.5/shlex.py, line 150, in read_token
elif nextchar in self.wordchars:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 
63: ordinal not in range(128)


__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1170
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1170] shlex have problems with parsing unicode

2007-09-17 Thread dexen deVries

dexen deVries added the comment:

One remark to previous message:
the first time i created shlex object in non-POSIX mode (the default), 
in later it's in POSIX mode (due to the third parameter to shlex being 
True). The bug in question manifests only in POSIX mode.

BTW, that so-called POSIX mode would be more POSIX-ish, if instead of 
comparing characters with a fixed, short list, would use the ctype() 
function as found in standard C library. The functions takes current 
locale (setable in process) into account when deciding what is leter, 
whitespace, punctuation etc.

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1170
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com