[issue1170] shlex have problems with parsing unicode

2021-09-29 Thread Andrew Jewett
Andrew Jewett added the comment: Alright. I'll think about it a little more and post my suggestion there, perhaps. Thanks Victor. -- ___ Python tracker ___ _

[issue1170] shlex have problems with parsing unicode

2021-09-29 Thread Andrew Jewett
Andrew Jewett added the comment: After posting that, I noticed that the second example I listed in my previous post (a language where words contain any non-whitespace, non-parenthesis character) can now be implemented in the current version of shlex.py by setting ​"whitespace_true" and "punct

[issue1170] shlex have problems with parsing unicode

2021-09-29 Thread STINNER Victor
STINNER Victor added the comment: > I would like to suggest making this change (or something similar) to the > official version of "shlex.py". Would sending an email to > "python-id...@python.org" be a good place to make this proposal? Yes, python-ideas is a good place to start discussion s

[issue1170] shlex have problems with parsing unicode

2021-09-29 Thread wombat
wombat added the comment: The error messages may have gone away, but the underlying unicode limitations I mentioned remain: Suppose you wanted to use shlex to build a parser for Chinese text. Would you have to set "wordchars" to a string containing every possible Chinese character? I mysel

[issue1170] shlex have problems with parsing unicode

2021-09-20 Thread STINNER Victor
STINNER Victor added the comment: This issue has been fixed in Python 3 by using Unicode rather than bytes in shlex. Python 2 users: it's time to upgrade to Python 3 ;-) -- resolution: -> fixed stage: needs patch -> resolved status: open -> closed ___

[issue1170] shlex have problems with parsing unicode

2021-09-19 Thread Matej Cepl
Matej Cepl added the comment: I cannot reproduce it with the current 3.* version. Did anybody reproduce with 3.5? Otherwise, I suggest close this, as a 2.* bug. -- ___ Python tracker ___

[issue1170] shlex have problems with parsing unicode

2019-07-29 Thread STINNER Victor
STINNER Victor added the comment: This issue is 12 years old has 3 patches: it's far from being "newcomer friendly", I remove the "Easy" label. -- keywords: -easy ___ Python tracker

[issue1170] shlex have problems with parsing unicode

2014-06-29 Thread Alexander Belopolsky
Changes by Alexander Belopolsky : -- assignee: belopolsky -> versions: +Python 3.5 ___ Python tracker ___ ___ Python-bugs-list mailing

[issue1170] shlex have problems with parsing unicode

2011-10-22 Thread Éric Araujo
Éric Araujo added the comment: The second message in this page reports that StringIO.StringIO works, but when I pass a unicode string with non-ASCII chars there’s a method call that fails because of implicit unicode-to-str conversion: Traceback (most recent call last): File "/usr/lib/python

[issue1170] shlex have problems with parsing unicode

2011-10-22 Thread Éric Araujo
Éric Araujo added the comment: $ ./python Python 2.7.2+ (2.7:27ae7d4e1983+, Oct 23 2011, 00:09:06) [GCC 4.6.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import shlex >>> shlex.split(u'Hello, World!') ['Hello,', 'World!'] This bug was fixed indirectly

[issue1170] shlex have problems with parsing unicode

2011-09-17 Thread Ezio Melotti
Ezio Melotti added the comment: I haven't looked at the shlex code (yet), my comment was just about the idea of adding constants with chars that belong to different Unicode categories. -- ___ Python tracker __

[issue1170] shlex have problems with parsing unicode

2011-09-17 Thread R. David Murray
R. David Murray added the comment: Ezio, I don't see any indication in this ticket that this bug was actually *fixed* in 3.x. Unicode doesn't cause immediate errors in 3.x, but it isn't recognized as wordchars, etc. Am I missing something? -- ___

[issue1170] shlex have problems with parsing unicode

2011-09-17 Thread Éric Araujo
Éric Araujo added the comment: Andrew: Ezio means http://docs.python.org/2.7/library/unicodedata > For the purposes of patching shlex Sorry, but we are not talking about patching shlex. > I just posted here because this page currently gets the top hit > when searching for "shlex unicode". It’s

[issue1170] shlex have problems with parsing unicode

2011-09-15 Thread Andrew Jewett
Andrew Jewett added the comment: > That can be done programmatically using the unicodedata module. > The regex module (that will hopefully be include in 3.3) is > also able to match characters that belongs to specific categories. Ezio: Thanks. (New to me, actually) Is this what you mean?:

[issue1170] shlex have problems with parsing unicode

2011-09-15 Thread Éric Araujo
Éric Araujo added the comment: Andrew: Thanks for your contribution, but your patch cannot go into 2.7, as we don’t add new features in stable versions (re-read the whole thread if you need more info). This report is still open because we need a doc patch to explain how to work around that.

[issue1170] shlex have problems with parsing unicode

2011-09-15 Thread Ezio Melotti
Ezio Melotti added the comment: That can be done programmatically using the unicodedata module. The regex module (that will hopefully be include in 3.3) is also able to match characters that belongs to specific categories. -- ___ Python tracker <

[issue1170] shlex have problems with parsing unicode

2011-09-15 Thread Andrew Jewett
Andrew Jewett added the comment: Not to get side-tracked, but on a related note, it would be nice if there was a python module which defined sets of unicode characters corresponding to different categories (similar to the categories listed here: http://www.fileformat.info/info/unicode/categor

[issue1170] shlex have problems with parsing unicode

2011-09-15 Thread Andrew Jewett
Andrew Jewett added the comment: Proposed solution and patch to follow. Please let me know if I am posting it in the wrong place. The main problem with shlex is that the shlex interface is inadequate to handle unicode. Specifically it is no longer feasible to provide a list of every possib

[issue1170] shlex have problems with parsing unicode

2011-09-02 Thread Éric Araujo
Changes by Éric Araujo : -- components: +Documentation -Library (Lib), Unicode keywords: +easy stage: test needed -> needs patch versions: +Python 2.7 -Python 3.1, Python 3.2 ___ Python tracker _

[issue1170] shlex have problems with parsing unicode

2011-07-18 Thread Éric Araujo
Éric Araujo added the comment: It’s not about allocating resources, it’s about following process. The first part is that we don’t add new features to stable releases, the second item is that this is not considered a bug fix: The code pre-dates Unicode, was not updated to support it, and the

[issue1170] shlex have problems with parsing unicode

2011-07-18 Thread Éric Araujo
Éric Araujo added the comment: See http://bugs.python.org/issue1170#msg106424 and following. -- ___ Python tracker ___ ___ Python-bugs

[issue1170] shlex have problems with parsing unicode

2011-07-18 Thread Doug Hellmann
Doug Hellmann added the comment: Is unicode supported by shlex in 3.x already? It's curious that unicode support is considered a new feature, rather than a bug. I understand wanting to allocate development resources carefully, though. If someone were to prepare a patch, would it even have a c

[issue1170] shlex have problems with parsing unicode

2011-07-18 Thread Éric Araujo
Éric Araujo added the comment: We all recognize that ASCII is very much limited and that the real way to work with strings is Unicode. However, here our hands are tied by our development process: shlex in 2.x does not support Unicode, adding that support would be a new feature, and 2.7 is cl

[issue1170] shlex have problems with parsing unicode

2011-07-17 Thread Doug Hellmann
Doug Hellmann added the comment: Right. Any program that needs to parse command lines containing filenames or other arguments with unicode characters will encounter this problem. -- ___ Python tracker

[issue1170] shlex have problems with parsing unicode

2011-07-17 Thread Chris Rebert
Changes by Chris Rebert : -- nosy: -cvrebert ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python

[issue1170] shlex have problems with parsing unicode

2011-07-17 Thread Santiago Romero
Santiago Romero added the comment: > It would be good to hear a strong argument from > the user that how did he end up passing > unicode to shlex.split? It is for parsing command > line args for programs and personally have not > seen those cases. I'm from Spain: I personally write programs an

Re: [issue1170] shlex have problems with parsing unicode

2011-07-16 Thread Senthil Kumaran
TypeError should be okay. But I am still -0 on that. It would be good to hear a strong argument from the user that how did he end up passing unicode to shlex.split? It is for parsing command line args for programs and personally have not seen those cases. Or did he want unicode everywhere if we was

[issue1170] shlex have problems with parsing unicode

2011-07-13 Thread Éric Araujo
Éric Araujo added the comment: Would raising a TypeError if the given argument is a unicode be unacceptable for 2.7? It would at least make things clear. -- ___ Python tracker

[issue1170] shlex have problems with parsing unicode

2011-07-13 Thread R. David Murray
R. David Murray added the comment: This isn't going to get fixed in 2.x (shlex doesn't support unicode in 2.x, and doing so would be a new feature). In 3.x all strings are unicode, so the problem you are seeing doesn't exist. This issue is about the broader problem of what counts as a word

[issue1170] shlex have problems with parsing unicode

2011-07-13 Thread Ezio Melotti
Changes by Ezio Melotti : -- nosy: +ezio.melotti ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pyt

[issue1170] shlex have problems with parsing unicode

2011-07-13 Thread Santiago Romero
Santiago Romero added the comment: I think I'm suffering the same problem in some small programs that use shlex: >>> import shlex >>> text = "python and shlex" >>> shlex.split(text) ['python', 'and', 'shlex'] >>> text = u"python and shlex" >>> shlex.split(text) ['p\x00\x00\x00y\x00\x00\x00t\

[issue1170] shlex have problems with parsing unicode

2011-01-14 Thread Doug Hellmann
Changes by Doug Hellmann : -- nosy: +doughellmann ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.py

[issue1170] shlex have problems with parsing unicode

2010-11-30 Thread Martin v . Löwis
Martin v. Löwis added the comment: The key requirement to consider for in POSIX compatible mode is, well, POSIX compatibility, which is defined in http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag

[issue1170] shlex have problems with parsing unicode

2010-11-30 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: Adding #10587 because we need to figure out the exact meaning of str.isspace() etc. first. It is possible that for proper operation shlex should consult unicodedata directly. -- dependencies: +Document the meaning of str methods __

[issue1170] shlex have problems with parsing unicode

2010-08-04 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: I don't like my patch anymore because it breaks code that manipulates public wordchars attribute. Users may want to set it to their own alphabet or append additional characters to the default list. Maybe wordchars should always be "non-posix" wordchar

[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: I am adding MvL to nosy. Martin, I believe you are the ultimate authority on how to tokenize a unicode stream. -- nosy: +loewis ___ Python tracker _

[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: David, What do you think about attached patch? Would that be a change in the "more" direction? -- Added file: http://bugs.python.org/file18224/issue1170.diff ___ Python tracker

[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread R. David Murray
R. David Murray added the comment: Alexander: the "more or less" is on the "less" side when dealing with non-ASCII letters, I think. See my msg109292 and your own followups. -- ___ Python tracker

[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Tue, Jul 27, 2010 at 3:04 PM, Raymond Hettinger wrote: > > Raymond Hettinger added the comment: > > +1 on get shlex to work better with Unicode. In 2.7.x? It more or less works in 3.x already. -- ___ Pyt

[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Raymond Hettinger
Raymond Hettinger added the comment: +1 on get shlex to work better with Unicode. The core concepts of this module are general purpose and applicable to all kinds of text. -- nosy: +rhettinger ___ Python tracker

[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Fernando Perez
Fernando Perez added the comment: On Tue, Jul 27, 2010 at 11:52, Alexander Belopolsky wrote: > Why do you expect shlex to work with unicode in 2.x? =A0The > documentation clearly says that the argument should be a string. > Supporting unicode is not an unreasonable RFE, but won't be considered

[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: On Tue, Jul 27, 2010 at 2:26 PM, Fernando Perez wrote: .. > Yes, sorry that I failed to mention the example I gave applies only to 2.x, > not to 3.x. Why do you expect shlex to work with unicode in 2.x? The documentation clearly says that the argument

[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Fernando Perez
Fernando Perez added the comment: Yes, sorry that I failed to mention the example I gave applies only to 2.x, not to 3.x. -- ___ Python tracker ___ _

[issue1170] shlex have problems with parsing unicode

2010-07-27 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: Fernando, Is this 2.7 only problem? In 3.2 >>> list(shlex.shlex('ab')) ['ab'] and bytes are not supported. >> list(shlex.shlex(b'ab')) Traceback (most recent call last): .. AttributeError: 'bytes' object has no attribute 'read' It is debatable wheth

[issue1170] shlex have problems with parsing unicode

2010-07-25 Thread Fernando Perez
Fernando Perez added the comment: Here is an illustration of the problem with a simple test case (the value of the posix flag doesn't make any difference): Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more informat

[issue1170] shlex have problems with parsing unicode

2010-07-19 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: I believe the e-mail thread that culminated in r32284, "Implemented posix-mode parsing support in shlex.py", was "shellwords" from April 2003: http://mail.python.org/pipermail/python-dev/2003-April/034670.html I scanned through the messages, but could no

[issue1170] shlex have problems with parsing unicode

2010-07-19 Thread Alexander Belopolsky
Changes by Alexander Belopolsky : ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/

[issue1170] shlex have problems with parsing unicode

2010-07-19 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: As discussed in msg110828 under issue9308, it is not clear whether logic identifying word characters in shlex is correct in presence of unicode. -- assignee: -> belopolsky keywords: +patch nosy: +belopolsky _

[issue1170] shlex have problems with parsing unicode

2010-07-04 Thread STINNER Victor
Changes by STINNER Victor : -- nosy: +haypo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.o

[issue1170] shlex have problems with parsing unicode

2010-07-04 Thread R. David Murray
R. David Murray added the comment: shlex may use unicode in py3k, but since the file still starts with a latin-1 coding cookie and the posix logic hasn't been changed, I suspect that it does not work correctly (ie: does not correctly identify word characters, per msg55969). It's too late for

[issue1170] shlex have problems with parsing unicode

2010-05-25 Thread Éric Araujo
Éric Araujo added the comment: shlex in 3.x works with Unicode strings. Is it still time to try to fix this bug for 2.7? -- ___ Python tracker ___ __

[issue1170] shlex have problems with parsing unicode

2009-08-21 Thread Benjamin Peterson
Benjamin Peterson added the comment: The patch needs tests before it can be applied. Additionally, I'm not sure if having a "utf" option is helpful. Is there a reason not to have unicode support by default? -- nosy: +benjamin.peterson ___ Python trac

[issue1170] shlex have problems with parsing unicode

2009-08-21 Thread Chris Rebert
Changes by Chris Rebert : -- nosy: +cvrebert ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.

[issue1170] shlex have problems with parsing unicode

2008-12-17 Thread Nicolau Leal Werneck
Nicolau Leal Werneck added the comment: OK, it worked after I found out I didn't know how to open unicode files... Sorry for the noise, and thanks for this patch! :] ___ Python tracker __

[issue1170] shlex have problems with parsing unicode

2008-12-17 Thread Nicolau Leal Werneck
Nicolau Leal Werneck added the comment: Hello. I tried to patch my own shlex, and this doens't seem to be working properly. When I try the patched module isntead of th eoriginal, in my otherwise working program, I get the result ahead. Is there any conversion steps missing?... mymachine$ pyth

[issue1170] shlex have problems with parsing unicode

2008-02-27 Thread Matej Cepl
Changes by Matej Cepl: -- nosy: +mcepl __ Tracker <[EMAIL PROTECTED]> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org

[issue1170] shlex have problems with parsing unicode

2007-12-22 Thread Colin Walters
Colin Walters added the comment: Patch to add Unicode support. Note: this patch recodes shlex.py from iso-8859-1 to utf-8, so it has mixed encodings. -- nosy: +cgwalters Added file: http://bugs.python.org/file9025/shlex-unicode.patch __ Tracker <[EMAIL P

[issue1170] shlex have problems with parsing unicode

2007-09-18 Thread Sean Reifschneider
Changes by Sean Reifschneider: -- priority: -> normal __ Tracker <[EMAIL PROTECTED]> __ ___ Python-bugs-list mailing list Unsubscribe: http:/

[issue1170] shlex have problems with parsing unicode

2007-09-17 Thread dexen deVries
dexen deVries added the comment: One remark to previous message: the first time i created shlex object in non-POSIX mode (the default), in later it's in POSIX mode (due to the third parameter to shlex being True). The bug in question manifests only in POSIX mode. BTW, that so-called POSIX mode

[issue1170] shlex have problems with parsing unicode

2007-09-17 Thread dexen deVries
dexen deVries added the comment: A quick paste to illustrate: the exception is raised only when unicode object is passed to shlex. Warning: the cStringIO module is unsuitable for use there, only the StringIO. cStringIO does not output unicode. dexen!muraena!~$ python Python 2.5.1 (r251:54863,

[issue1170] shlex have problems with parsing unicode

2007-09-17 Thread dexen deVries
New submission from dexen deVries: Feeding unicode to shlex object created in POSIX compat mode causes UnicodeDecodeError to be raised. It appears that shlex object defines sting .wordchars, containing latin-1 (iso8859-1) encoded characters with charcodes >=128, which is used to check whether

[issue1170] shlex have problems with parsing unicode

2007-09-17 Thread dexen deVries
Changes by dexen deVries: -- components: Library (Lib), Unicode severity: normal status: open title: shlex have problems with parsing unicode type: behavior versions: Python 2.5 __ Tracker <[EMAIL PROTECTED]> ___