[issue7089] shlex behaves unexpected if newlines are not whitespace

2009-12-31 Thread Jan David Mol

Jan David Mol jjd...@gmail.com added the comment:

As there seems to be some interest, I've continued working on patching
this issue.

Attached is an improved version of the patch, including additions to
test_shlex.py. Improved in the sense that newlines after a comment are
not considered to be actually part of the comment (according to POSIX),
which makes a difference when newlines are tokens.

To accomplish this, I had to add an ungetc buffer to shlex, in order to
push back any newlines read by the readline() routine used when a
comment is encountered.

@Gabriel: the test case of no newline at the end of the file after a
comment is addressed.

Relevant POSIX sections are
Shell  Utilities 2.3(10)
Rationale C.2.3

--
Added file: 
http://bugs.python.org/file15708/lexer-newline-tokens-patch-2.0.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7089
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7611] shlex not posix compliant when parsing foo#bar

2009-12-31 Thread Jan David Mol

New submission from Jan David Mol jjd...@gmail.com:

The shlex parser parses foo#bar as foo, discarding the rest as a
comment. This is actually one of the test cases, even in POSIX mode.

However, POSIX (see below) only allows comments to start at the
beginning of a token, so foo#bar has to result in a foo#bar token.
To easily see this, do echo foo#bar in bash, versus echo foo #bar.

Fixing this might break some applications that rely on this broken
behaviour, even though they're not strictly POSIX compliant.

POSIX 2008, Rationale C.2.3 (which refers to Shell  Utilities 2.3(10)):

The (10) rule about '#' as the current character is the first in the
sequence in which a new token is being assembled. The '#' starts a
comment only when it is at the beginning of a token. This rule is also
written to indicate that the search for the end-of-comment does not
consider escaped newline specially, so that a comment cannot be
continued to the next line.

--
components: Library (Lib)
messages: 97081
nosy: jjdmol2
severity: normal
status: open
title: shlex not posix compliant when parsing foo#bar
type: behavior
versions: Python 2.5, Python 2.6, Python 2.7, Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7611
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7611] shlex not posix compliant when parsing foo#bar

2009-12-31 Thread Jan David Mol

Jan David Mol jjd...@gmail.com added the comment:

Attached a program which shows the relevant behaviour:

import shlex

tests = [ foo#bar, foo #bar ]

for t in tests:
  print %s - %s % (t,[x for x in shlex.shlex(t,posix=True)])

results in

$ python lexer_test.py
foo#bar - ['foo']
foo #bar - ['foo']

(expected of course is ['foo#bar'] on the first line).

--
versions: +Python 2.5
Added file: http://bugs.python.org/file15709/lexer_test.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7611
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7089] shlex behaves unexpected if newlines are not whitespace

2009-10-09 Thread Jan David Mol

New submission from Jan David Mol jjd...@gmail.com:

The shlex module does not function as expected in the presence of
comments when newlines are not whitespace. An example (attached):

 from shlex import shlex
 
 lexer = shlex(a \n b)
 print ,.join(lexer)
a,b
 
 lexer = shlex(a # comment \n b)
 print ,.join(lexer)
a,b
 
 lexer = shlex(a \n b)
 lexer.whitespace= 
 print ,.join(lexer)
a,
,b
 
 lexer = shlex(a # comment \n b)
 lexer.whitespace= 
 print ,.join(lexer)
a,b

Now where did my newline go? The comment ate it! Even though the docs
seem to indicate the newline is not part of the comment itself:

shlex.commenters:
The string of characters that are recognized as comment beginners.
All characters from the comment beginner to end of line are ignored.
Includes just '#' by default.

--
files: lexertest.py
messages: 93776
nosy: jjdmol2
severity: normal
status: open
title: shlex behaves unexpected if newlines are not whitespace
type: behavior
versions: Python 2.4, Python 2.5, Python 2.6, Python 2.7, Python 3.0, Python 
3.1, Python 3.2
Added file: http://bugs.python.org/file15087/lexertest.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7089
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7089] shlex behaves unexpected if newlines are not whitespace

2009-10-09 Thread Jan David Mol

Jan David Mol jjd...@gmail.com added the comment:

Attached is a patch which fixes this for me. It basically does a
fall-through using '\n' when encountering a comment. So that may be a
bit of a hack (who says '\n' is the only newline char in there, and not
'\r'?) but I'll leave the more intricate stuff to you experts.

--
keywords: +patch
Added file: http://bugs.python.org/file15088/lexer-newline-tokens.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7089
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7089] shlex behaves unexpected if newlines are not whitespace

2009-10-09 Thread Jan David Mol

Changes by Jan David Mol jjd...@gmail.com:


--
components: +Library (Lib)

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7089
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com