[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2019-06-19 Thread STINNER Victor


STINNER Victor  added the comment:


New changeset 15e7d2432294ec46f1ad84ce958fdeb9d4ca78b1 by Victor Stinner 
(Michael Felt) in branch '3.7':
[3.7] bpo-34347: Fix test_utf8_mode.test_cmd_line for AIX (GH-8923) (GH-14233)
https://github.com/python/cpython/commit/15e7d2432294ec46f1ad84ce958fdeb9d4ca78b1


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2019-06-19 Thread Michael Felt


Change by Michael Felt :


--
pull_requests: +14069
pull_request: https://github.com/python/cpython/pull/14233

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2019-03-10 Thread Michael Felt


Michael Felt  added the comment:

Could this be backported to version 3.7?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-31 Thread STINNER Victor


STINNER Victor  added the comment:

> The buildbots seem happy. This may be closed.

Cool, thank you for checking, and thanks for your fix! I close the issue.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-31 Thread Michael Felt


Michael Felt  added the comment:

The buildbots seem happy. This may be closed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-27 Thread Michael Osipov


Michael Osipov <1983-01...@gmx.net> added the comment:

Interesting is that the very same approach does not work for HP-UX even if I 
swap out the params for HP-UX:

$ ./python -m test test_utf8_mode
Run tests sequentially
0:00:00 [1/1] test_utf8_mode
test test_utf8_mode failed -- Traceback (most recent call last):
  File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 226, in 
test_cmd_line
check('utf8=0', [c_arg], LC_ALL='C')
  File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 217, in check
self.assertEqual(args, ascii(expected), out)
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != 
"['h\\xfb\\u02cb\\xe3\\x82\\u02dc']"
- ['h\xc3\xa9\xe2\x82\xac']
+ ['h\xfb\u02cb\xe3\x82\u02dc']
 : roman8:['h\xc3\xa9\xe2\x82\xac']

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-27 Thread STINNER Victor


STINNER Victor  added the comment:


New changeset 7ef1697be54a74314d5214d9ba0580d4e620694c by Victor Stinner 
(Michael Felt) in branch 'master':
bpo-34347: Fix test_utf8_mode.test_cmd_line for AIX (GH-8923)
https://github.com/python/cpython/commit/7ef1697be54a74314d5214d9ba0580d4e620694c


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-25 Thread Michael Osipov


Michael Osipov <1983-01...@gmx.net> added the comment:

This is a very thorough analysis. Kudos to that.

--
nosy: +michael-o

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-25 Thread Michael Felt


Michael Felt  added the comment:

Solution much simpler than I thought:

not arg.decode('ascii', 'surrogateescape'), but arg.decode('iso-8859-1')

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-25 Thread Michael Felt


Change by Michael Felt :


--
keywords: +patch
pull_requests: +8396
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-24 Thread Michael Felt


Michael Felt  added the comment:

On 23/08/2018 19:14, Michael Felt wrote:
> Michael Felt  added the comment:
>
> On 23/08/2018 12:51, STINNER Victor wrote:
>> STINNER Victor  added the comment:
>>
>> Your issue is about decoding command line argument which is done from main() 
>> function. It doesn't use Python codecs, but functions like Py_DecodeLocale().
> This is beyond my understanding atm.
> Early on I tried making the expected just be 'arg' and went from
> situation A to situation B - which looked much closer, BUT, the 'types'
> differed:
>
> Situaltion A (original)
> AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != 
> "['h\\udcc3\\udca9\\udce2\\udc82\\udcac']"
> - ['h\xc3\xa9\xe2\x82\xac']
> + ['h\udcc3\udca9\udce2\udc82\udcac']
>  : ISO8859-1:['h\xc3\xa9\xe2\x82\xac']
>
> I tried saying the "expected" is arg, but arg is still a byte object, the 
> cmd_line result is not (printed as such).
>
> Situation B
> AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != 
> "[b'h\\xc3\\xa9\\xe2\\x82\\xac']"
> - ['h\xc3\xa9\xe2\x82\xac']
> + [b'h\xc3\xa9\xe2\x82\xac']
> ?  +
>  : ISO8859-1:['h\xc3\xa9\xe2\x82\xac']
>
> After further digging - to understand why it was coming as "\x encoding 
> rather than \udc"
>
> I looked at what was happening here:
>
> out = self.get_output('-X', utf8_opt, '-c', code, arg, **kw)
> becomes
> out = self.get_output('-X', utf8_opt, '-c', code, 
> 'h\xe9\u20ac'.encode('utf-8'), **kw)
> becomes
> out = self.get_output('-X', utf8_opt, '-c', code, b'h\xc3\xa9\xe2\x82\xac', 
> **kw)
>
> And finally, at the CLI becomes:
> ['/data/prj/python/python3-3.8/python', '-X', 'faulthandler', '-X', 'utf8', 
> '-c', 'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), 
> ascii(sys.argv[1:])))', b'h\xc3\xa9\xe2\x82\xac']
>
> /data/prj/python/python3-3.8/python '-X' 'faulthandler' '-X' 'utf8' '-c' 
> 'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), 
> ascii(sys.ar
> gv[1:])))', b'h\xc3\xa9\xe2\x82\xac'
> UTF-8:['bh\\xc3\\xa9\\xe2\\x82\\xac']
>
> /data/prj/python/python3-3.8/python '-X' 'faulthandler' '-X' 'utf8=0' '-c' 
> 'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), 
> ascii(sys.
> argv[1:])))', b'h\xc3\xa9\xe2\x82\xac'
> ISO8859-1:['bh\\xc3\\xa9\\xe2\\x82\\xac']
>
> Note:
> /data/prj/python/python3-3.8/python '-X' 'faulthandler' '-X' 'utf8=0' '-c' 
> 'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), 
> ascii(sys.
> argv[1:])))', 'h\udcc3\udca9\udce2\udc82\udcac'
> ISO8859-1:['h\\udcc3\\udca9\\udce2\\udc82\\udcac']
>
> /data/prj/python/python3-3.8/python '-X' 'faulthandler' '-X' 'utf8=0' '-c' 
> 'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), 
> ascii(sys.
> argv[1:])))', b'h\udcc3\udca9\udce2\udc82\udcac'
> ISO8859-1:['bh\\udcc3\\udca9\\udce2\\udc82\\udcac']
>
> root@x066:[/data/prj/python/python3-3.8]/data/prj/python/python3-3.8/python 
> '-X' 'faulthandler' '-X' 'utf8' '-c' 'import locale, sys; print("%s:%s" % (>
> UTF-8:['bh\\udcc3\\udca9\\udce2\\udc82\\udcac']
>
> Summary:
> a) concerned about how b'h' becomes 'bh'
> b) whatwever argv[1] is, is very close to what is returned - so whatever 
> happens durinf the transformation from 
> self.get_output('-X', utf8_opt, '-c', code, arg, **kw)
>  determines the output and the (failed) comparison.
p.s. also tried:
michael@x071:[/data/prj/python/git/python3-3.8]/data/prj/python/python3-3.8/python
'-X' 'faulthandler' '-X' 'utf8=0' '-c' 'import locale, sys;
print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.argv[1:])))',
'h\xe9\u20ac'.encode\('utf-8'\)
ISO8859-1:['h\\xe9\\u20ac.encode(utf-8)']
michael@x071:[/data/prj/python/git/python3-3.8]/data/prj/python/python3-3.8/python
'-X' 'faulthandler' '-X' 'utf8=1' '-c' 'import locale, sys;
print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.argv[1:])))',
'h\xe9\u20ac'.encode\('utf-8'\)
UTF-8:['h\\xe9\\u20ac.encode(utf-8)']

Really unclear to me what this test is trying to verify. The CLI seems
to just 'echo' what it is provided.
>>> Question 1: why is windows excluded? Because it does not use UTF-8 as it's 
>>> default (it's default is CP1252)
>> Windows uses wmain() which gets command line arguments as wchar_t* strings: 
>> Unicode. No decoding is needed.
>>
>> --
>>
>> ___
>> Python tracker 
>> 
>> ___
>>
> --
>
> ___
> Python tracker 
> 
> ___
>

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-23 Thread Michael Felt


Michael Felt  added the comment:

On 23/08/2018 12:51, STINNER Victor wrote:
> STINNER Victor  added the comment:
>
> Your issue is about decoding command line argument which is done from main() 
> function. It doesn't use Python codecs, but functions like Py_DecodeLocale().
This is beyond my understanding atm.
Early on I tried making the expected just be 'arg' and went from
situation A to situation B - which looked much closer, BUT, the 'types'
differed:

Situaltion A (original)
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != 
"['h\\udcc3\\udca9\\udce2\\udc82\\udcac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ ['h\udcc3\udca9\udce2\udc82\udcac']
 : ISO8859-1:['h\xc3\xa9\xe2\x82\xac']

I tried saying the "expected" is arg, but arg is still a byte object, the 
cmd_line result is not (printed as such).

Situation B
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != 
"[b'h\\xc3\\xa9\\xe2\\x82\\xac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ [b'h\xc3\xa9\xe2\x82\xac']
?  +
 : ISO8859-1:['h\xc3\xa9\xe2\x82\xac']

After further digging - to understand why it was coming as "\x encoding rather 
than \udc"

I looked at what was happening here:

out = self.get_output('-X', utf8_opt, '-c', code, arg, **kw)
becomes
out = self.get_output('-X', utf8_opt, '-c', code, 
'h\xe9\u20ac'.encode('utf-8'), **kw)
becomes
out = self.get_output('-X', utf8_opt, '-c', code, b'h\xc3\xa9\xe2\x82\xac', 
**kw)

And finally, at the CLI becomes:
['/data/prj/python/python3-3.8/python', '-X', 'faulthandler', '-X', 'utf8', 
'-c', 'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), 
ascii(sys.argv[1:])))', b'h\xc3\xa9\xe2\x82\xac']

/data/prj/python/python3-3.8/python '-X' 'faulthandler' '-X' 'utf8' '-c' 
'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), 
ascii(sys.ar
gv[1:])))', b'h\xc3\xa9\xe2\x82\xac'
UTF-8:['bh\\xc3\\xa9\\xe2\\x82\\xac']

/data/prj/python/python3-3.8/python '-X' 'faulthandler' '-X' 'utf8=0' '-c' 
'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.
argv[1:])))', b'h\xc3\xa9\xe2\x82\xac'
ISO8859-1:['bh\\xc3\\xa9\\xe2\\x82\\xac']

Note:
/data/prj/python/python3-3.8/python '-X' 'faulthandler' '-X' 'utf8=0' '-c' 
'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.
argv[1:])))', 'h\udcc3\udca9\udce2\udc82\udcac'
ISO8859-1:['h\\udcc3\\udca9\\udce2\\udc82\\udcac']

/data/prj/python/python3-3.8/python '-X' 'faulthandler' '-X' 'utf8=0' '-c' 
'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.
argv[1:])))', b'h\udcc3\udca9\udce2\udc82\udcac'
ISO8859-1:['bh\\udcc3\\udca9\\udce2\\udc82\\udcac']

root@x066:[/data/prj/python/python3-3.8]/data/prj/python/python3-3.8/python 
'-X' 'faulthandler' '-X' 'utf8' '-c' 'import locale, sys; print("%s:%s" % (>
UTF-8:['bh\\udcc3\\udca9\\udce2\\udc82\\udcac']

Summary:
a) concerned about how b'h' becomes 'bh'
b) whatwever argv[1] is, is very close to what is returned - so whatever 
happens durinf the transformation from 
self.get_output('-X', utf8_opt, '-c', code, arg, **kw)
 determines the output and the (failed) comparison.

>> Question 1: why is windows excluded? Because it does not use UTF-8 as it's 
>> default (it's default is CP1252)
> Windows uses wmain() which gets command line arguments as wchar_t* strings: 
> Unicode. No decoding is needed.
>
> --
>
> ___
> Python tracker 
> 
> ___
>

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-23 Thread STINNER Victor


STINNER Victor  added the comment:

Your issue is about decoding command line argument which is done from main() 
function. It doesn't use Python codecs, but functions like Py_DecodeLocale().

> Question 1: why is windows excluded? Because it does not use UTF-8 as it's 
> default (it's default is CP1252)

Windows uses wmain() which gets command line arguments as wchar_t* strings: 
Unicode. No decoding is needed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-23 Thread STINNER Victor


STINNER Victor  added the comment:

I fixed bpo-34207.

--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-21 Thread Łukasz Langa

Change by Łukasz Langa :


--
keywords: +3.7regression

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-21 Thread Łukasz Langa

Change by Łukasz Langa :


--
dependencies: +test_cmd_line test_utf8_mode test_warnings fail in all FreeBSD 
3.x (3.8) buildbots

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-21 Thread Łukasz Langa

Łukasz Langa  added the comment:

I have no idea what's going on here yet but just wanted to report that we are 
seeing this issue on one FreeBSD buildbot, too:

https://buildbot.python.org/all/#/builders/124/builds/508/steps/4/logs/stdio

I can also reproduce on CentOS 7.

Could this be related to LC_ALL= or related environment variables?

--
nosy: +lukasz.langa

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-09 Thread Michael Felt

Michael Felt  added the comment:

Starting this discussion again. Please take time to read. I have spent hours 
trying to understand what is failing. Please spend a few minutes reading.

Sadly, there is a lot of text - but I do not know what I could leave out 
without damaging the process of discovery.

The failing result is:

self.assertEqual(args, ascii(expected), out)
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != 
"['h\\udcc3\\udca9\\udce2\\udc82\\udcac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ ['h\udcc3\udca9\udce2\udc82\udcac']
 : ISO8859-1:['h\xc3\xa9\xe2\x82\xac']

The test code is:
  +207  @unittest.skipIf(MS_WINDOWS, 'test specific to Unix')
  +208  def test_cmd_line(self):
  +209  arg = 'h\xe9\u20ac'.encode('utf-8')
  +210  arg_utf8 = arg.decode('utf-8')
  +211  arg_ascii = arg.decode('ascii', 'surrogateescape')
  +212  code = 'import locale, sys; print("%s:%s" % 
(locale.getpreferredencoding(), ascii(sys.argv[1:])))'
  +213
  +214  def check(utf8_opt, expected, **kw):
  +215  out = self.get_output('-X', utf8_opt, '-c', code, arg, **kw)
  +216  args = out.partition(':')[2].rstrip()
  +217  self.assertEqual(args, ascii(expected), out)
  +218
  +219  check('utf8', [arg_utf8])
  +220  if sys.platform == 'darwin' or support.is_android:
  +221  c_arg = arg_utf8
  +222  else:
  +223  c_arg = arg_ascii
  +224  check('utf8=0', [c_arg], LC_ALL='C')

Question 1: why is windows excluded? Because it does not use UTF-8 as it's 
default (it's default is CP1252)

Question 2: It seems that what the test is 'checking' is that 
object.encode('utf-8') gets decoded by ascii() based on the utf8_mode set.

 +215  out = self.get_output('-X', utf8_opt, '-c', code, arg, **kw)

rewrites (less indent) as:
 +215  out = self.get_output('-X', utf8_opt, '-c', code, 
'h\xe9\u20ac'.encode('utf-8'), **kw)

or
out = self.get_output('-X', utf8_opt, '-c', code, b'h\xc3\xa9\xe2\x82\xac', 
**kw)

Finally, in  Lib/test/support/script_helper.py we have
  +127  print("\n", cmd_line) # debug info, ignore
  +128  proc = subprocess.Popen(cmd_line, stdin=subprocess.PIPE,
  +129   stdout=subprocess.PIPE, stderr=subprocess.PIPE,
  +130   env=env, cwd=cwd)

Which gives:

 ['/data/prj/python/python3-3.8/python', '-X', 'faulthandler', '-X', 'utf8', 
'-c', 'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), 
ascii(sys.argv[1:])))', b'h\xc3\xa9\xe2\x82\xac']

Above - utf8=1 - is successful

 ['/data/prj/python/python3-3.8/python', '-X', 'faulthandler', '-X', 'utf8=0', 
'-c', 'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), 
ascii(sys.argv[1:])))', b'h\xc3\xa9\xe2\x82\xac']

Here: utf8=0 fails. The arg to the CLI is equal in both cases.
FAIL

## Goiing back to check() and what does it have:
## Add some debug. The first line is the 'raw' expected,
## the second line is ascii(decoded)
## the final is the value extracted from get_output

  +214  def check(utf8_opt, expected, **kw):
  +215  out = self.get_output('-X', utf8_opt, '-c', code, arg, **kw)
  +216  args = out.partition(':')[2].rstrip()
  +217  print("")
  +218  print("%s: expected\n%s:ascii(expected)\n%s:out" % 
(expected, ascii(expected), out))
  +219  self.assertEqual(args, ascii(expected), out)

For: utf8 mode true, it works:
['h▒\u20ac']: expected
['h\xe9\u20ac']:ascii(expected)
UTF-8:['h\xe9\u20ac']:out

  +221  check('utf8', [arg_utf8])

But not for utf8=0
  +226  check('utf8=0', [c_arg], LC_ALL='C')
 # note, different values for LC_ALL='C' have been tried
['h\udcc3\udca9\udce2\udc82\udcac']: expected
['h\udcc3\udca9\udce2\udc82\udcac']:ascii(expected)
ISO8859-1:['h\xc3\xa9\xe2\x82\xac']:out

## re: expected and ascii(expected)
When utf8=1 expected and ascii(expected) differ. "arg" looks different from 
both - but after processing by get_object() expected and out match.

When utf8=0 there is no difference is "arg1" passed to "code".
However, whith check - the values for both expected and ascii(expected) are 
identical. And, sadly, the value coming back via get_output looks nothing like 
'expected'.

In short, when utf8=1 ascii(b'h\xc3\xa9\xe2\x82\xac') becomes ['h\xe9\u20ac' 
which is what is desired. But when utf8=0 ascii(b'h\xc3\xa9\xe2\x82\xac') is 
b'h\xc3\xa9\xe2\x82\xac' not 'h\udcc3\udca9\udce2\udc82\udcac'

Finally, when I run the command from the command line (after rewrites)

What passes:
./python '-X' 'faulthandler' '-X' 'utf8=1' '-c' 'import locale, sys; 
print("%s:%s" % (locale.getpreferredencoding(), ascii(
sys.argv[1:])))' b'h\xc3\xa9\xe2\x82\xac'
UTF-8:['bh\\xc3\\xa9\\xe2\\x82\\xac']

encoding is UTF-8, but the result of ascii(argv[1]) is the same as argv[1]

./python '-X' 'faulthandler' '-X' 'utf8=0' '-c' 'import locale, sys; 
print("%s:%s" % 

[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-07 Thread Michael Felt


Michael Felt  added the comment:

Common "experts" - feedback needed!

Original
test test_utf8_mode failed -- Traceback (most recent call last):
  File "/data/prj/python/git/python3-3.8/Lib/test/test_utf8_mode.py", line 225, 
in test_cmd_line
check('utf8=0', [c_arg], LC_ALL='C')
  File "/data/prj/python/git/python3-3.8/Lib/test/test_utf8_mode.py", line 217, 
in check
self.assertEqual(args, ascii(expected), out)
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != 
"['h\\udcc3\\udca9\\udce2\\udc82\\udcac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ ['h\udcc3\udca9\udce2\udc82\udcac']
 : ISO8859-1:['h\xc3\xa9\xe2\x82\xac']

Modification #1:
if sys.platform == 'darwin' or support.is_android:
c_arg = arg_utf8
elif sys.platform.startswith("aix"):
c_arg = arg_ascii.encode('utf-8', 'surrogateescape')
else:
c_arg = arg_ascii
check('utf8=0', [c_arg], LC_ALL='C')

Result:
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != 
"[b'h\\xc3\\xa9\\xe2\\x82\\xac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ [b'h\xc3\xa9\xe2\x82\xac']
?  +
 : ISO8859-1:['h\xc3\xa9\xe2\x82\xac']

Modifiction #2:
if sys.platform == 'darwin' or support.is_android:
c_arg = arg_utf8
elif sys.platform.startswith("aix"):
c_arg = arg
else:
c_arg = arg_ascii
check('utf8=0', [c_arg], LC_ALL='C')

AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != 
"[b'h\\xc3\\xa9\\xe2\\x82\\xac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ [b'h\xc3\xa9\xe2\x82\xac']
?  +
 : ISO8859-1:['h\xc3\xa9\xe2\x82\xac']

The "expected" continues to be a "bytes" object, while the CLI code returns a 
non-byte string.
Or - the original has an ascii string object but uses \udc rather than \x

\udc is common (i.e., I see it frequently in googled results on other things) - 
should something in ascii() be changed to output \udc rather than \x ?

Thx!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-06 Thread Michael Felt

Michael Felt  added the comment:

On 8/6/2018 10:10 PM, Michael Felt wrote:
> Michael Felt  added the comment:
>
> In short, I do not understand how this passes on Linux.
>
> This is python3-3.4.6 on sles12:
>
 'h\xe9\u20ac'.encode('utf-8')
> b'h\xc3\xa9\xe2\x82\xac'
 ascii('h\xe9\u20ac'.encode('utf-8'))
> "b'h\\xc3\\xa9\\xe2\\x82\\xac'"
 'h\xe9\u20ac'.encode('utf-8').decode('us-ascii', 'surrogateescape')
> 'h\udcc3\udca9\udce2\udc82\udcac'
> This is python3-3.7.0 on AIX:
 'h\xe9\u20ac'.encode('utf-8')
> b'h\xc3\xa9\xe2\x82\xac'
 ascii('h\xe9\u20ac'.encode('utf-8'))
> "b'h\\xc3\\xa9\\xe2\\x82\\xac'"
 'h\xe9\u20ac'.encode('utf-8').decode('us-ascii', 'surrogateescape')
> 'h\udcc3\udca9\udce2\udc82\udcac'
>
> If I am missing something essential here - please be blunt!
Also seeing the same with Windows.
C:\Users\MICHAELFelt>python
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:06:47) [MSC v.1914 32
bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> 'h\xe9\u20ac'.encode('utf-8')
b'h\xc3\xa9\xe2\x82\xac'
>>> ascii('h\xe9\u20ac'.encode('utf-8'))
"b'h\\xc3\\xa9\\xe2\\x82\\xac'"
>>> 'h\xe9\u20ac'.encode('utf-8').decode('ascii','surrogateescape')
'h\udcc3\udca9\udce2\udc82\udcac'
>>>
>
> --
>
> ___
> Python tracker 
> 
> ___
>

--
Added file: https://bugs.python.org/file47733/pEpkey.asc

___
Python tracker 

___

pEpkey.asc
Description: application/pgp-keys
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-06 Thread Michael Felt


Michael Felt  added the comment:

In short, I do not understand how this passes on Linux.

This is python3-3.4.6 on sles12:

>>> 'h\xe9\u20ac'.encode('utf-8')
b'h\xc3\xa9\xe2\x82\xac'
>>> ascii('h\xe9\u20ac'.encode('utf-8'))
"b'h\\xc3\\xa9\\xe2\\x82\\xac'"
>>> 'h\xe9\u20ac'.encode('utf-8').decode('us-ascii', 'surrogateescape')
'h\udcc3\udca9\udce2\udc82\udcac'
>>>

This is python3-3.7.0 on AIX:
>>> 'h\xe9\u20ac'.encode('utf-8')
b'h\xc3\xa9\xe2\x82\xac'
>>> ascii('h\xe9\u20ac'.encode('utf-8'))
"b'h\\xc3\\xa9\\xe2\\x82\\xac'"
>>> 'h\xe9\u20ac'.encode('utf-8').decode('us-ascii', 'surrogateescape')
'h\udcc3\udca9\udce2\udc82\udcac'

If I am missing something essential here - please be blunt!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34347] AIX: test_utf8_mode.test_cmd_line fails

2018-08-06 Thread Michael Felt


New submission from Michael Felt :

The test fails because

byte_str.decode('ascii', 'surragateescape')

is not what ascii(byte_str) - returns when called from the commandline.

Assumption: since " check('utf8', [arg_utf8])" succeeds I assume the parsing of 
the command-line is correct.

DETAILS
>>> arg = 'h\xe9\u20ac'.encode('utf-8')
>>> arg
b'h\xc3\xa9\xe2\x82\xac'

>>> arg.decode('ascii', 'surrogateescape')
'h\udcc3\udca9\udce2\udc82\udcac'


I am having a difficult time getting the syntax correct for all the "escapes", 
so I added a print statement in the check routine:

test_cmd_line (test.test_utf8_mode.UTF8ModeTests) ...
code:import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), 
ascii(sys.argv[1:]))) arg:b'h\xc3\xa9\xe2\x82\xac'
out:UTF-8:['h\xe9\u20ac']

code:import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), 
ascii(sys.argv[1:]))) arg:b'h\xc3\xa9\xe2\x82\xac'
out:ISO8859-1:['h\xc3\xa9\xe2\x82\xac']

test code with my debug statement (to generate above):

def test_cmd_line(self):
arg = 'h\xe9\u20ac'.encode('utf-8')
arg_utf8 = arg.decode('utf-8')
arg_ascii = arg.decode('ascii', 'surrogateescape')
code = 'import locale, sys; print("%s:%s" % 
(locale.getpreferredencoding(), ascii(sys.argv[1:])))'

def check(utf8_opt, expected, **kw):
out = self.get_output('-X', utf8_opt, '-c', code, arg, **kw)
print("\ncode:%s arg:%s\nout:%s" % (code, arg, out))
args = out.partition(':')[2].rstrip()
self.assertEqual(args, ascii(expected), out)

check('utf8', [arg_utf8])
if sys.platform == 'darwin' or support.is_android:
c_arg = arg_utf8
else:
c_arg = arg_ascii
check('utf8=0', [c_arg], LC_ALL='C')

So the first check succeeds:

check('utf8', [arg_utf8])

But the second does not:

FAIL: test_cmd_line (test.test_utf8_mode.UTF8ModeTests)
--
Traceback (most recent call last):
  File "/data/prj/python/src/python3-3.7.0/Lib/test/test_utf8_mode.py", line 
225, in test_cmd_line
check('utf8=0', [c_arg], LC_ALL='C')
  File "/data/prj/python/src/python3-3.7.0/Lib/test/test_utf8_mode.py", line 
218, in check
self.assertEqual(args, ascii(expected), out)
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != 
"['h\\udcc3\\udca9\\udce2\\udc82\\udcac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ ['h\udcc3\udca9\udce2\udc82\udcac']
 : ISO8859-1:['h\xc3\xa9\xe2\x82\xac']

I tried saying the "expected" is arg, but arg is still a byte object, the 
cmd_line result is not (printed as such).

AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != 
"[b'h\\xc3\\xa9\\xe2\\x82\\xac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ [b'h\xc3\xa9\xe2\x82\xac']
?  +
 : ISO8859-1:['h\xc3\xa9\xe2\x82\xac']

--
components: Interpreter Core, Tests
messages: 323214
nosy: Michael.Felt
priority: normal
severity: normal
status: open
title: AIX: test_utf8_mode.test_cmd_line fails
type: behavior
versions: Python 3.7, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com