[issue7615] unicode_escape codec does not escape quotes

2021-08-09 Thread Irit Katriel


Change by Irit Katriel :


--
resolution: out of date -> 
versions: +Python 3.10, Python 3.11, Python 3.9 -Python 2.6, Python 2.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2021-08-09 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Nothing was changed. Backslashes in your output are backslashes in the bytes 
object repr.

>>> print('a\'b"c\'\'\'d"""e'.encode('unicode_escape').decode())
a'b"c'''d"""e

--
nosy: +serhiy.storchaka
status: pending -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2021-08-09 Thread Irit Katriel


Irit Katriel  added the comment:

Looks like this has been fixed by now:

>>> print(u'a\'b"c\'\'\'d"""e'.encode('unicode_escape'))
b'a\'b"c\'\'\'d"""e'

Let me know if there is a reason not to close this issue.

--
nosy: +iritkatriel
resolution:  -> out of date
status: open -> pending

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2014-02-03 Thread Mark Lawrence

Changes by Mark Lawrence breamore...@yahoo.co.uk:


--
nosy:  -BreamoreBoy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-07-28 Thread Mark Lawrence

Mark Lawrence breamore...@yahoo.co.uk added the comment:

Could we please have some responses to msg98327 as there are some very positive 
comments there.

--
nosy: +BreamoreBoy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-07-28 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

Mark Lawrence wrote:
 
 Mark Lawrence breamore...@yahoo.co.uk added the comment:
 
 Could we please have some responses to msg98327 as there are some very 
 positive comments there.

A patch implementing the suggestions would be even better :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-26 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

Amaury Forgeot d'Arc wrote:
 I feel uneasy to change the default unicode-escape encoding.
 I think that we mix two features here; to transfer a unicode string between 
 two points, programs must agree on where the data ends, and how characters 
 are represented as bytes.
 All codecs including unicode-escape only dealt with byte conversion; 
 (unicode-escape converts everything to printable 7bit ascii);
 these patches want to add a feature related to the where does the string 
 end issue, and is only aimed at python code containers. Other transports 
 and protocols may choose different delimiters.
 
 My point is that unicode-escape used to not change printable 7-bit ascii 
 characters, and the patches will change this.
 
 And actually this will break existing code. It did not take me long to find 
 two examples of programs which embed unicode_escape-encoded text between 
 quotes, and take care themselves of escaping quotes. First example generates 
 javascript code, the second generates SQL statements:
 http://github.com/chriseppstein/pywebmvc/blob/master/src/code/pywebmvc/tools/searchtool.py#L450
 http://gitweb.sabayon.org/?p=entropy.git;a=blob;f=libraries/entropy/db/__init__.py;h=2d818455efa347f35b2e96d787fefd338055d066;hb=HEAD#l6463

Ouch... these codecs should not have been used outside
Python. I wonder why these applications don't use repr(text)
to format the JavaScript/SQL strings.

I guess this is the result of documenting them in
http://docs.python.org/library/codecs.html#standard-encodings

Too bad that the docs actually say Produce a string that is
suitable as Unicode literal in Python source code. The codecs
main intent was to *decode* Unicode literals in Python source
code to Unicode objects...

The naming in the examples you mention also suggest that the
programmers used the table from the docs - they use
unicode_escape as codec name, not the standard
unicode-escape name which we use throughout the Python
code.

The fact that the demonstrated actual use already does apply the
extra quote escaping suggests that we cannot easily add this
to the existing codecs. It would break those applications, since
they'd be applying double-escaping.

 This does not prevent the creation of a new codec, let's call it 
 'python-unicode-escape' [ or 'repr' :-) ]

I think that's a good idea to move forward.

Python 3.x comes with a new Unicode repr() format which we could
turn into a new codec: it automatically adds the quotes, processes
the in-string quotes and backslashes and also escapes \t, \n and \r
as well as all non-printable code points.

As for naming the new codec, I'd suggest unicode-repr since
that's what it implements.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-25 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

Richard Hansen wrote:
 
 Richard Hansen rhan...@bbn.com added the comment:
 
 Any comments on the patches?  I'd love to see at least patches 1-3 make it 
 into Python 2.7.  :)

Sorry, I haven't had a chance to review them yet. Will try today.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-25 Thread Florent Xicluna

Changes by Florent Xicluna la...@yahoo.fr:


--
nosy: +flox

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-25 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc amaur...@gmail.com added the comment:

I feel uneasy to change the default unicode-escape encoding.
I think that we mix two features here; to transfer a unicode string between two 
points, programs must agree on where the data ends, and how characters are 
represented as bytes.
All codecs including unicode-escape only dealt with byte conversion; 
(unicode-escape converts everything to printable 7bit ascii);
these patches want to add a feature related to the where does the string end 
issue, and is only aimed at python code containers. Other transports and 
protocols may choose different delimiters.

My point is that unicode-escape used to not change printable 7-bit ascii 
characters, and the patches will change this.

And actually this will break existing code. It did not take me long to find two 
examples of programs which embed unicode_escape-encoded text between quotes, 
and take care themselves of escaping quotes. First example generates javascript 
code, the second generates SQL statements:
http://github.com/chriseppstein/pywebmvc/blob/master/src/code/pywebmvc/tools/searchtool.py#L450
http://gitweb.sabayon.org/?p=entropy.git;a=blob;f=libraries/entropy/db/__init__.py;h=2d818455efa347f35b2e96d787fefd338055d066;hb=HEAD#l6463

This does not prevent the creation of a new codec, let's call it 
'python-unicode-escape' [ or 'repr' :-) ]

--
nosy: +amaury.forgeotdarc

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-24 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Any comments on the patches?  I'd love to see at least patches 1-3 make it into 
Python 2.7.  :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: 
http://bugs.python.org/file15729/unicode_escape_single_and_double_quotes.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: http://bugs.python.org/file15771/unicode_escape_reorg.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: http://bugs.python.org/file15774/unicode_escape_tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: http://bugs.python.org/file15775/raw_unicode_escape_tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching updated unit tests.  The tests now check to make sure that single and 
double quotes are escaped.

--
Added file: http://bugs.python.org/file15809/unicode_escape_tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching a minimal patch:
  * unicode_escape now backslash-escapes single and double quotes
  * raw_unicode_escape now unicode-escapes single and double quotes
  * raw_unicode_escape now unicode-escapes backslashes
  * removes pickle's escaping workarounds so that it continues to work
  * updates documentation

--
Added file: http://bugs.python.org/file15810/unicode_escape_1_minimal.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching a patch for an issue discovered while looking at the code:
  * The UTF-16 decode logic in the Unicode escape encoders no longer reads past 
the end of the provided Py_UNICODE buffer if the last character's value is 
between 0xD800 and 0xDC00.

This patch is meant to be applied after unicode_escape_1_minimal.patch.

--
Added file: 
http://bugs.python.org/file15811/unicode_escape_2_utf-16_invalid_read.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching a patch for another issue discovered while looking at the code:
  * The Unicode escape encoders now check to make sure that the provided size 
is nonnegative.
  * C API documentation updated to make it clear that size must be nonnegative.

This patch is meant to be applied after 
unicode_escape_2_utf-16_invalid_read.patch.

--
Added file: 
http://bugs.python.org/file15812/unicode_escape_3_check_size_nonnegative.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-09 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching a patch that eliminates duplicate code:
  * Merge unicodeescape_string(), PyUnicode_EncodeRawUnicodeEscape(), and 
modified_EncodeRawUnicodeEscape() into one function called 
_PyUnicode_EncodeCustomUnicodeEscape().

This patch is meant to be applied after 
unicode_escape_3_check_size_nonnegative.patch.

--
Added file: 
http://bugs.python.org/file15813/unicode_escape_4_eliminate_duplicate_code.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: http://bugs.python.org/file15748/unicode_escape_reorg.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching updated unicode_escape_reorg.patch.  This addresses two additional 
issues:
  * removes pickle's workaround of raw-unicode-escape's broken escaping
  * eliminates duplicated code (the raw escape encode function was copied with 
only a slight modification in cPickle.c)

With this, all regression tests pass.

--
Added file: http://bugs.python.org/file15771/unicode_escape_reorg.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

I believe this issue is ready to be bumped to the patch review stage.  
Thoughts?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Does the last patch obsolete the first two?  If so please delete the obsolete 
ones.

I imagine there are might be small doc updates required, as well.

I haven't looked at the patch itself, but concerning your test patch:  your 
try/except style is unnecessary, I think.  Better to just let the syntax error 
bubble up on its own.  After all, you don't *know* that the SyntaxError is 
because the quotes aren't escaped.  I've run into other unit tests in the test 
suite where the author made such an assumption, which turned out to be false 
and confused me for a bit while debugging the failure.  I had to remove the 
code producing the mistaken reason message in order to see the real problem.  
So it's better to just let the real error show up in the first place, in my 
experience.

--
nosy: +r.david.murray
stage: test needed - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

 Does the last patch obsolete the first two?  If so please delete the 
 obsolete ones.

Yes and no -- it depends on what the core Python developers want and are 
comfortable with:
  * unicode_escape_single_quotes.patch:  Only escapes single quotes, simple 
patch.
  * unicode_escape_single_and_double_quotes:  Superset of the above (also 
escapes double quotes), but probably unnecessary.  Still a relatively simple 
patch.
  * unicode_escape_reorg.patch:  Superset of unicode_escape_single_quotes.patch 
that also fixes raw_unicode_escape and other small issues.  It's a bigger patch 
with a greater potential for backwards-compatibility issues.  (Pickle is an 
example:  It implemented its own workaround to address raw_unicode_escape's 
broken escaping, so fixing raw_unicode_escape ended up breaking pickle.  The 
reorg patch removes pickle's workaround, but there will probably be similar 
workarounds in other existing code.)

My preference is to have unicode_escape_reorg.patch committed, but I'm not sure 
how conservative the core developers are.  The release of Python 2.7 is 
approaching, and they may not want to take on the risk right now.  If that's 
the case, I'd be happy with applying unicode_escape_single_quotes.patch for now 
and moving unicode_escape_reorg.patch to a new issue report.

 I imagine there are might be small doc updates required, as well.

Certainly Misc/NEWS will need to be patched.  I'm unfamiliar with what else the 
devs might want for documentation, so I'd love to get some additional guidance. 
 I would also appreciate additional feedback on the technical merits of the 
reorg patch before investing too much time on updating documentation.

 I haven't looked at the patch itself, but concerning your test 
 patch:  your try/except style is unnecessary, I think.  Better to 
 just let the syntax error bubble up on its own.

OK, I'll make that change.  I added the try/except as an attempt to convert a 
known ERROR to a FAIL in case that was important for some reason.

Thanks for the feedback!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: http://bugs.python.org/file15746/unicode_escape_tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching updated unit tests for the unicode_escape codec.  I removed the 
raw_unicode_escape tests and will attach a separate patch for those.

--
Added file: http://bugs.python.org/file15774/unicode_escape_tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching updated unit tests for the raw_unicode_escape codec.

--
Added file: http://bugs.python.org/file15775/raw_unicode_escape_tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

Richard Hansen wrote:
 
 Richard Hansen rhan...@bbn.com added the comment:
 
 Does the last patch obsolete the first two?  If so please delete the 
 obsolete ones.
 
 Yes and no -- it depends on what the core Python developers want and are 
 comfortable with:
   * unicode_escape_single_quotes.patch:  Only escapes single quotes, simple 
 patch.
   * unicode_escape_single_and_double_quotes:  Superset of the above (also 
 escapes double quotes), but probably unnecessary.  Still a relatively simple 
 patch.
   * unicode_escape_reorg.patch:  Superset of 
 unicode_escape_single_quotes.patch that also fixes raw_unicode_escape and 
 other small issues.  It's a bigger patch with a greater potential for 
 backwards-compatibility issues.  (Pickle is an example:  It implemented its 
 own workaround to address raw_unicode_escape's broken escaping, so fixing 
 raw_unicode_escape ended up breaking pickle.  The reorg patch removes 
 pickle's workaround, but there will probably be similar workarounds in other 
 existing code.)
 
 My preference is to have unicode_escape_reorg.patch committed, but I'm not 
 sure how conservative the core developers are.  The release of Python 2.7 is 
 approaching, and they may not want to take on the risk right now.  If that's 
 the case, I'd be happy with applying unicode_escape_single_quotes.patch for 
 now and moving unicode_escape_reorg.patch to a new issue report.

We'll need a patch that implements single and double quote escaping for
unicode_escape and a \u style escaping of quotes for the raw_unicode_escape
encoder.

Other changes are not necessary. The pickle copy of the codec can be
left untouched (both cPickle.c and pickle.py) - it doesn't matter
whether quotes are escaped or not in the pickle data stream.

It's only important that the decoders can reliably decode the
additional escapes (which AFAIK, they do automatically anyway).

 I imagine there are might be small doc updates required, as well.
 
 Certainly Misc/NEWS will need to be patched.  I'm unfamiliar with what else 
 the devs might want for documentation, so I'd love to get some additional 
 guidance.  I would also appreciate additional feedback on the technical 
 merits of the reorg patch before investing too much time on updating 
 documentation.

Misc/NEWS needs an entry which makes the changes clear.

The codecs' encode direction is not defined anywhere in the
documentation, AFAIK, and basically an implementation detail.
The codecs were originally only meant for Python's internal
use to decode Python literal byte strings into Unicode.
In PEP 100, I only defined the decoding direction. The
main idea was to have a set of codecs which read and produce
Latin-1 compatible text - Python 2.x used to accept Latin-1
Unicode literals without source code encoding marker.

 I haven't looked at the patch itself, but concerning your test 
 patch:  your try/except style is unnecessary, I think.  Better to 
 just let the syntax error bubble up on its own.
 
 OK, I'll make that change.  I added the try/except as an attempt to convert a 
 known ERROR to a FAIL in case that was important for some reason.

I'll have to have a look at the patch itself as well. No time for
that now, perhaps tomorrow.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com



::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

 We'll need a patch that implements single and double quote escaping 
 for unicode_escape and a \u style escaping of quotes for the 
 raw_unicode_escape encoder.

OK, I'll remove unicode_escape_single_quotes.patch and update 
unicode_escape_reorg.patch.

 Other changes are not necessary.

Would you please clarify?  There are a few other (minor) bugs that were 
discovered while writing unicode_escape_reorg.patch that I think should be 
fixed:
  * the UTF-16 surrogate pair decoding logic could read past the end of the 
provided Py_UNICODE character array if the last character is between 0xD800 and 
0xDC00
  * _PyString_Resize() will be called on an empty string if the size argument 
of unicodeescape_string() is 0.  This will raise a SystemError because 
_PyString_Resize() can only be called if the object's ref count is 1 (even if 
no resizing is to take place) yet PyString_FromStringAndSize() returns a shared 
empty string instance if size is 0.
  * it is unclear what unicodeescape_string() should do if size  0

Beyond those issues, I'm worried about manageability stemming from the amount 
of code duplication.  If a bug is found in one of those encoding functions, the 
other two will likely need updating.

 The pickle copy of the codec can be left untouched (both cPickle.c 
 and pickle.py) - it doesn't matter whether quotes are escaped or not 
 in the pickle data stream.

Unfortunately, pickle.py must be modified because it does its own backslash 
escaping before encoding with the raw_unicode_escape codec.  This means that 
backslashes would become double escaped and the decoded value would differ 
(confirmed by running the pickle unit tests).

The (minor) bugs in PyUnicode_EncodeRawUnicodeEscape() are also present in 
cPickle.c, so they should probably be fixed as well.

 The codecs' encode direction is not defined anywhere in the 
 documentation, AFAIK, and basically an implementation detail.

I read the escape codec documentation (see the original post) as implying that 
the encoders can generate eval-able string literals.  I'll add some clarifying 
statements.

Thanks for the feedback!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: 
http://bugs.python.org/file15716/unicode_escape_single_quotes.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-05 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attached is a patch to the unicode unit tests.  It adds tests for the following:
  * unicode_escape escapes single quotes
  * raw_unicode_escape escapes single quotes
  * raw_unicode_escape escapes backslashes

--
Added file: http://bugs.python.org/file15746/unicode_escape_tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-05 Thread Richard Hansen

Changes by Richard Hansen rhan...@bbn.com:


Removed file: http://bugs.python.org/file15742/unicode_escape_reorg.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-05 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

Attaching updated unicode_escape_reorg.patch.  This fixes two additional issues:
  * don't call _PyString_Resize() on an empty string because there is only one 
empty string instance, and that instance is returned when creating an empty 
string
  * make sure the size parameter is nonnegative

--
Added file: http://bugs.python.org/file15748/unicode_escape_reorg.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-04 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

I thought about raw_unicode_escape more, and there's a way to escape quotes:  
use unicode escape sequences (e.g., ur'\u0027').  I've attached a new patch 
that does the following:

 * backslash-escapes single quotes when encoding with the unicode_escape codec 
(the original subject of this bug)
 * replaces single quotes with \u0027 when encoding with the raw_unicode_escape 
codec (a separate bug not related to the original report, but brought up in 
comments)
 * replaces backslashes with \u005c when encoding with the raw_unicode_escape 
codec (a separate bug not related to the original report)
 * fixes a corner-case bug where the UTF-16 surrogate pair decoding logic could 
read past the end of the provided Py_UNICODE character array (a separate bug 
not related to the original report)
 * eliminates redundant code in PyUnicode_EncodeRawUnicodeEscape() and 
unicodeescape_string()
 * general cleanup in unicodeescape_string()

The changes in the patch are non-trivial and have only been lightly tested.

--
Added file: http://bugs.python.org/file15742/unicode_escape_reorg.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-03 Thread Richard Hansen

Richard Hansen rhan...@bbn.com added the comment:

 If we change this, the encoder should quote both single and double 
 quotes - simply because it is not known whether the literal 
 will use single or double quotes.

Or document that single quotes are always escaped so that the user knows he/she 
can safely use u''.  I'm not sure if there is a use case where both would 
*need* to be escaped, and escaping both has a size penalty.

I've attached an untested patch that escapes both.

If both are escaped, then the behavior of the string_escape codec should also 
be changed for consistency (it only escapes single quotes).

 The raw_unicode_escape codec would have to be fixed as well.

I'm not sure there's anything to fix.  Adding backslashes to quotes in raw 
strings changes the value of the string -- the backslashes prevent the quotes 
from ending the string literal, but they are not removed when the raw literal 
is evaluated.

Perhaps raw_unicode_escape should be fixed by raising an exception when it 
contains any quotes.

--
Added file: 
http://bugs.python.org/file15729/unicode_escape_single_and_double_quotes.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-02 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

Richard Hansen wrote:
 
 New submission from Richard Hansen rhan...@bbn.com:
 
 The description of the unicode_escape codec says that it produces a
 string that is suitable as Unicode literal in Python source code. [1] 
 Unfortunately, this is not true as it does not escape quotes.  For example:
 
   print u'a\'bc\'\'\'de'.encode('unicode_escape')
 
 outputs:
 
   a'bc'''de

Indeed. Python only uses the decoder of that codec internally.

 I have attached a patch that fixes this issue by escaping single quotes.
  With the patch applied, the output is:
 
   a\'bc\'\'\'de
 
 I chose to only escape single quotes because:
   1.  it simplifies the patch, and
   2.  it matches string_escape's behavior.

If we change this, the encoder should quote both single and double
quotes - simply because it is not known whether the literal
will use single or double quotes.

The raw_unicode_escape codec would have to be fixed as well.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2010-01-01 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +lemburg

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2009-12-31 Thread Richard Hansen

New submission from Richard Hansen rhan...@bbn.com:

The description of the unicode_escape codec says that it produces a
string that is suitable as Unicode literal in Python source code. [1] 
Unfortunately, this is not true as it does not escape quotes.  For example:

  print u'a\'bc\'\'\'de'.encode('unicode_escape')

outputs:

  a'bc'''de

I have attached a patch that fixes this issue by escaping single quotes.
 With the patch applied, the output is:

  a\'bc\'\'\'de

I chose to only escape single quotes because:
  1.  it simplifies the patch, and
  2.  it matches string_escape's behavior.


[1] http://docs.python.org/library/codecs.html#standard-encodings

--
components: Unicode
files: unicode_escape_single_quotes.patch
keywords: patch
messages: 97112
nosy: rhansen
severity: normal
status: open
title: unicode_escape codec does not escape quotes
type: behavior
versions: Python 2.6, Python 2.7
Added file: http://bugs.python.org/file15716/unicode_escape_single_quotes.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7615] unicode_escape codec does not escape quotes

2009-12-31 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti
priority:  - normal
stage:  - test needed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7615
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com