[issue22746] cgitb html: wrong encoding for utf-8

2014-12-02 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

We can convert cgitb.hook to produce ASCII-compatible output with charrefs in 
3.x. But there is a problem with str in 2.7. 8-bit string can contain non-ASCII 
data and the encoding is not known in general case.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22746] cgitb html: wrong encoding for utf-8

2014-10-31 Thread Ezio Melotti

Ezio Melotti added the comment:

 In normal HTML utf-8 works fine, doesn't it?

It does, in fact as long as the encoding used by the browser matches the one 
used in the file, no charrefs needs to be used (except gt; lt; and quot;).  
Of course, if non-Unicode encodings are used, the range of available characters 
that can go directly in the HTML will be more limited, but this can be solved 
by using charrefs -- the browser will display the corresponding character no 
matter what is the encoding.  This also means that if charrefs are used for all 
non-ASCII characters, then the browser will be able to display the page no 
matter what encoding is being used (as long as it's ASCII-compatible, and most 
encoding are).  The downside is that it will make the source less readable and 
possible longer, especially if there are lot of non-ASCII characters, but if 
most of the characters are expected to be ASCII, using charrefs might be ok.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

What about
  open(..., encoding='latin-1', errors='xmlcharrefreplace')

--
nosy: +amaury.forgeotdarc
stage: resolved - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread Wolfgang Rohdewald

Wolfgang Rohdewald added the comment:

 What about
  open(..., encoding='latin-1', errors='xmlcharrefreplace')

That works fine. I tested with a chinese character 与

But I do not think the application should work around something that cgitb is 
supposed to handle. More so since the documentation is dead silent about this. 
You need to use codecs.open instead of open and add those kw arguments. As long 
as this is not explained in the documentation, I guess it is a bug for everyone 
not using latin-1.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread Wolfgang Rohdewald

Wolfgang Rohdewald added the comment:

correction: A bug for everyone using non-ascii characters.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +ezio.melotti, serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

 You need to use codecs.open instead of open
No, why? in python3 open() supports the errors handler.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@gmail.com:


--
components: +Unicode
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread R. David Murray

R. David Murray added the comment:

In normal HTML utf-8 works fine, doesn't it?. It's only when reading from a 
file (where the browser doesn't know the encoding) that it fails.  Do you have 
a use case for xmlcharrefreplace in the HTML context (which is what cgitb is 
primarily targeted at).  Some place where the web page can't be declared as 
utf-8, perhaps?

I suppose it might be a not-unreasonable enhancement request to have a 
parameter to Hook that says do xmlcharrefreplace, but since the workaround is 
actually simpler than that, I don't know if that is worthwhile or not.  Or do 
people feel like doing the replacement all the time (it's only in tracebacks, 
after all) be the right thing to do?

--
resolution: remind - 
versions: +Python 3.4, Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22746] cgitb html: wrong encoding for utf-8

2014-10-28 Thread Wolfgang Rohdewald

Wolfgang Rohdewald added the comment:

  You need to use codecs.open instead of open
 No, why? in python3 open() supports the errors handler.

right, but not in python2 which has the same problem. I need my code to run 
with both.

 Do you have a use case for xmlcharrefreplace in the HTML context?

No, my only use case is the local file.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22746] cgitb html: wrong encoding for utf-8

2014-10-27 Thread Wolfgang Rohdewald

New submission from Wolfgang Rohdewald:

The attached script shows the non-ascii characters wrong wherever they occur, 
including the exception message and the comment in the source code.

Looking at the produced .html, I can say that cgitb simply passes the single 
byte utf-8 codes without encoding them as needed.

Same happens with Python3.4 (after applying some quick and dirty changes to 
cgitb.py, see bug #22745).

--
components: Library (Lib)
files: cgibug.py
messages: 230085
nosy: wrohdewald
priority: normal
severity: normal
status: open
title: cgitb html: wrong encoding for utf-8
type: behavior
versions: Python 2.7
Added file: http://bugs.python.org/file37044/cgibug.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22746] cgitb html: wrong encoding for utf-8

2014-10-27 Thread R. David Murray

R. David Murray added the comment:

If you look at the file, you'll find that the data is in utf-8 (at least if 
your locale is a utf-8 locale).  However, html is by default interpreted as 
latin-1, so that's what the webrowser displays when you pass the file on disk 
to it.  If you add encoding='latin-1' to your open call, your script will 
work.  What you do if you need to display non-latin1 characters, I don't know.  
(See https://bugzil.la/760050, for example).

Note: the above is for python3.  I don't remember how you do the equivalent in 
python2...a naive codecs.open call just got me a UnicodeDecodeError.

--
nosy: +r.david.murray
resolution:  - not a bug
stage:  - resolved
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22746] cgitb html: wrong encoding for utf-8

2014-10-27 Thread Wolfgang Rohdewald

Wolfgang Rohdewald added the comment:

If you cannot offer a solution for arbitrary unicode, you have no solution at 
all. Afer all, that is what unicode is about: support ALL languages, not only 
your own.

I do not quite understand why you think this is not a bug.

If cgitb encodes unicode like  x e 4 ; (remove spaces), the browser does not 
have to guess the encoding, it will always show the correct character. This 
works for all of unicode. See 
https://en.wikipedia.org/wiki/Unicode_and_HTML#Numeric_character_references

So this bug is fixable, I am reopening it.

For Python3, the fix is actually very simple: Do not write doc but 
str(doc.encode('ascii', 'xmlcharrefreplace')), like in the attached patch. This 
patch works for me but there might be yet uncovered code paths. And my source 
file is encoded in utf-8, other source file encodings should be tested too. I 
do not know if cgitb correctly honors the source file header like # -*- coding: 
utf-8 -*-

Fixing this for Python2 is certainly doable too but perhaps more difficult 
because a Python2 str() may have an unknown encoding.

--
keywords: +patch
resolution: not a bug - 
status: closed - open
Added file: http://bugs.python.org/file37047/22746.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22746] cgitb html: wrong encoding for utf-8

2014-10-27 Thread Wolfgang Rohdewald

Changes by Wolfgang Rohdewald wolfg...@rohdewald.de:


--
resolution:  - remind

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com