Re: [PATCH] Documentation/sphinx: fix kernel-doc decode for non-utf-8 locale

2017-09-01 Thread Markus Heiser

> Am 31.08.2017 um 21:43 schrieb Jonathan Corbet :
> 
> On Thu, 31 Aug 2017 22:21:29 +0300
> Jani Nikula  wrote:
> 
>> On python3, Popen() universal_newlines=True converts the subprocess
>> stdout to unicode text using a codec based on user preferences. Given
>> LANG indicating ascii and utf-8 stdout from the subprocess, you'd get:
>> 
>> WARNING: kernel-doc '../scripts/kernel-doc -rst -enable-lineno
>> ../drivers/media/dvb-core/demux.h' processing failed with: 'ascii' codec 
>> can't
>> decode byte 0xe2 in position 6368: ordinal not in range(128)
>> 
>> Fix this by dropping universal_newlines=True and replacing the implicit
>> LANG specific decode with an explicit utf-8 decode. This also gets rid
>> of the annoying conditional code for python 2 vs. 3.
>> 
>> Fixes: ba3501859354 ("Documentation/sphinx: fix kernel-doc extension on 
>> python3")
>> Reference: 
>> 54c23e8e-89c0-5cea-0dcc-e938952c5642@infradead.org">http://mid.mail-archive.com/54c23e8e-89c0-5cea-0dcc-e938952c5642@infradead.org
>> Reported-and-tested-by: Randy Dunlap 
>> Cc: Jonathan Corbet 
>> Cc: Mauro Carvalho Chehab 
>> Signed-off-by: Jani Nikula 
> 
> Cool...I go out to run some errands and the problem's fixed! :)
> 
> Patch applied, thanks to everybody for figuring this out.

Sorry for my absence on this, I have been on a journey. Just let me
throw my 5 cent behind:

  Sometimes I really hate python for crap stuff like 
  "universal newline support" [4]

Started in Python 2 with universal-newline feature [1]

"""If universal_newlines is True, the file objects stdout and
 stderr are opened as text files in universal newlines mode.
 Lines may be terminated by any of '\n', the Unix end-of-line convention,
 '\r', the old Macintosh convention or '\r\n', the Windows convention. All
 of these external representations are seen as '\n' by the Python program."""

First it sounds fine ... "but" read on:

"""Note:
 This feature is only available if Python is built with universal newline
 support (the default). Also, the newlines attribute of the file objects
 stdout, stdin and stderr are not updated by the communicate() method.
"""

Then it comes to Python 3 and after a while -- in Python 3.3 -- crapy 
"universal newline" has been revised [2]:

"""Changed in version 3.3: When universal_newlines is True, the
 class uses the encoding locale.getpreferredencoding(False) instead
 of locale.getpreferredencoding(). ..."""

And here is what locale.getpreferredencoding(do_setlocale=False) says [3]...

"""locale.getpreferredencoding(do_setlocale=True)
 Return the encoding used for text data, according to user preferences.
 User preferences are expressed differently on different systems, and might
 not be available programmatically on some systems, so this function
 only returns a guess.

 On some systems, it is necessary to invoke setlocale() to obtain the user
 preferences, so this function is not thread-safe. If invoking setlocale is
 not necessary or desired, do_setlocale should be set to False."""

To summarize: 

1. universal-newline is an option which has to be compiled into the interpreter
2. universal-newline is based on a 'guess'
3. the way to get such a 'guess' has been changed in Python 3.3

I was suspiciously in "universal stuff" since the first occurrence.
This confirms me, that it is crap and Jani's patch -- removing it --
is the only and right answer. / Thanks!


-- Markus --

[1] https://docs.python.org/2/library/subprocess.html 
[2] 
https://docs.python.org/3.3/library/subprocess.html#frequently-used-arguments
[3] https://docs.python.org/3/library/locale.html#locale.getpreferredencoding
[4] https://www.python.org/dev/peps/pep-0278/



--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Documentation/sphinx: fix kernel-doc decode for non-utf-8 locale

2017-08-31 Thread Jonathan Corbet
On Thu, 31 Aug 2017 22:21:29 +0300
Jani Nikula  wrote:

> On python3, Popen() universal_newlines=True converts the subprocess
> stdout to unicode text using a codec based on user preferences. Given
> LANG indicating ascii and utf-8 stdout from the subprocess, you'd get:
> 
> WARNING: kernel-doc '../scripts/kernel-doc -rst -enable-lineno
> ../drivers/media/dvb-core/demux.h' processing failed with: 'ascii' codec can't
> decode byte 0xe2 in position 6368: ordinal not in range(128)
> 
> Fix this by dropping universal_newlines=True and replacing the implicit
> LANG specific decode with an explicit utf-8 decode. This also gets rid
> of the annoying conditional code for python 2 vs. 3.
> 
> Fixes: ba3501859354 ("Documentation/sphinx: fix kernel-doc extension on 
> python3")
> Reference: 
> 54c23e8e-89c0-5cea-0dcc-e938952c5642@infradead.org">http://mid.mail-archive.com/54c23e8e-89c0-5cea-0dcc-e938952c5642@infradead.org
> Reported-and-tested-by: Randy Dunlap 
> Cc: Jonathan Corbet 
> Cc: Mauro Carvalho Chehab 
> Signed-off-by: Jani Nikula 

Cool...I go out to run some errands and the problem's fixed! :)

Patch applied, thanks to everybody for figuring this out.

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Documentation/sphinx: fix kernel-doc decode for non-utf-8 locale

2017-08-31 Thread Jani Nikula
On python3, Popen() universal_newlines=True converts the subprocess
stdout to unicode text using a codec based on user preferences. Given
LANG indicating ascii and utf-8 stdout from the subprocess, you'd get:

WARNING: kernel-doc '../scripts/kernel-doc -rst -enable-lineno
../drivers/media/dvb-core/demux.h' processing failed with: 'ascii' codec can't
decode byte 0xe2 in position 6368: ordinal not in range(128)

Fix this by dropping universal_newlines=True and replacing the implicit
LANG specific decode with an explicit utf-8 decode. This also gets rid
of the annoying conditional code for python 2 vs. 3.

Fixes: ba3501859354 ("Documentation/sphinx: fix kernel-doc extension on 
python3")
Reference: 
54c23e8e-89c0-5cea-0dcc-e938952c5642@infradead.org">http://mid.mail-archive.com/54c23e8e-89c0-5cea-0dcc-e938952c5642@infradead.org
Reported-and-tested-by: Randy Dunlap 
Cc: Jonathan Corbet 
Cc: Mauro Carvalho Chehab 
Signed-off-by: Jani Nikula 
---
 Documentation/sphinx/kerneldoc.py | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/Documentation/sphinx/kerneldoc.py 
b/Documentation/sphinx/kerneldoc.py
index d15e07f36881..39aa9e8697cc 100644
--- a/Documentation/sphinx/kerneldoc.py
+++ b/Documentation/sphinx/kerneldoc.py
@@ -27,6 +27,7 @@
 # Please make sure this works on both python2 and python3.
 #
 
+import codecs
 import os
 import subprocess
 import sys
@@ -88,13 +89,10 @@ class KernelDocDirective(Directive):
 try:
 env.app.verbose('calling kernel-doc \'%s\'' % (" ".join(cmd)))
 
-p = subprocess.Popen(cmd, stdout=subprocess.PIPE, 
stderr=subprocess.PIPE, universal_newlines=True)
+p = subprocess.Popen(cmd, stdout=subprocess.PIPE, 
stderr=subprocess.PIPE)
 out, err = p.communicate()
 
-# python2 needs conversion to unicode.
-# python3 with universal_newlines=True returns strings.
-if sys.version_info.major < 3:
-out, err = unicode(out, 'utf-8'), unicode(err, 'utf-8')
+out, err = codecs.decode(out, 'utf-8'), codecs.decode(err, 'utf-8')
 
 if p.returncode != 0:
 sys.stderr.write(err)
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html