[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-03-07 Thread STINNER Victor

STINNER Victor added the comment:

It looks like PyUnicode_FromUnicode() should accept invalid UTF-16 surrogates 
because the array module indirectly relies on that:

On Windows (16-bit wchar_t/Py_UNICODE), len(array.array('u', '\U0010')) is 
2 and array.array('u', '\U0010')[0] is '\udbff' (lone surrogate).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-03-07 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 1fd165883a65 by Victor Stinner in branch '3.3':
Issue #17223: the test is specific to 32-bit wchar_t type
http://hg.python.org/cpython/rev/1fd165883a65

New changeset 42970cbfc982 by Victor Stinner in branch 'default':
(Merge 3.3) Issue #17223: the test is specific to 32-bit wchar_t type
http://hg.python.org/cpython/rev/42970cbfc982

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-03-07 Thread STINNER Victor

STINNER Victor added the comment:

The test should now pass on Windows.

--
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-03-05 Thread Ezio Melotti

Ezio Melotti added the comment:

Tests are still failing on Windows:
http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/1558/steps/test/logs/stdio

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-03-05 Thread STINNER Victor

STINNER Victor added the comment:

 ezio.melotti: Tests are still failing on Windows

Oh, I read the PyUnicode_FromUnicode() twice and there is a bug :-( With 16-bit 
wchar_t type (on Windows), find_maxchar_surrogates() doesn't fail if the 
wchar_* string contains in invalid surrogate pair.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-03-05 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 15190138d3f3 by Victor Stinner in branch 'default':
Issue #17223: Add another test to check that _PyUnicode_Ready() rejects
http://hg.python.org/cpython/rev/15190138d3f3

New changeset b9f7b1bf36aa by Victor Stinner in branch 'default':
Issue #17223: Fix PyUnicode_FromUnicode() on Windows (16-bit wchar_t type)
http://hg.python.org/cpython/rev/b9f7b1bf36aa

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-03-05 Thread STINNER Victor

STINNER Victor added the comment:

Changeset b9f7b1bf36aa should fix the test on Windows. My Windows VM is dead, I 
cannot test myself. If the fix works, it must be backported in Python 3.3.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-03-05 Thread STINNER Victor

STINNER Victor added the comment:

 Changeset b9f7b1bf36aa should fix the test on Windows.

Oh, many tests are failing because of this change, so I reverted it.

==
ERROR: test_surrogatepass_handler (test.test_codecs.CP65001Test)
--
Traceback (most recent call last):
  File C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_codecs.py, 
line 799, in test_surrogatepass_handler
abc\ud800def)
  File C:\buildbot.python.org\3.x.kloth-win64\build\lib\unittest\case.py, 
line 642, in assertEqual
assertion_func(first, second, msg=msg)
  File C:\buildbot.python.org\3.x.kloth-win64\build\lib\unittest\case.py, 
line 1007, in assertMultiLineEqual
if first != second:
ValueError: illegal UTF-16 surrogate


==
ERROR: test_unicode (test.test_array.ArrayReconstructorTest)
--
Traceback (most recent call last):
  File C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_array.py, 
line 183, in test_unicode
msg={0!r} != {1!r}; testcase={2!r}.format(a, b, testcase))
  File C:\buildbot.python.org\3.x.kloth-win64\build\lib\unittest\case.py, 
line 642, in assertEqual
assertion_func(first, second, msg=msg)
  File C:\buildbot.python.org\3.x.kloth-win64\build\lib\unittest\case.py, 
line 632, in _baseAssertEqual
if not first == second:
ValueError: illegal UTF-16 surrogate

==
ERROR: test_byteswap (test.test_array.UnicodeTest)
--
Traceback (most recent call last):
  File C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_array.py, 
line 239, in test_byteswap
self.assertNotEqual(a, b)
  File C:\buildbot.python.org\3.x.kloth-win64\build\lib\unittest\case.py, 
line 648, in assertNotEqual
if not first != second:
ValueError: illegal UTF-16 surrogate

==
FAIL: test_issue17223 (test.test_array.UnicodeTest)
--
Traceback (most recent call last):
  File C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_array.py, 
line 1082, in test_issue17223
self.assertRaises(ValueError, a.tounicode)
AssertionError: ValueError not raised by tounicode

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-26 Thread Richard Oudkerk

Richard Oudkerk added the comment:

The new test seems to be reliably failing on Windows:

==
FAIL: test_issue17223 (__main__.UnicodeTest)
--
Traceback (most recent call last):
  File C:\Repos\cpython-dirty\lib\test\test_array.py, line 1075, in 
test_issue17223
self.assertRaises(ValueError, a.tounicode)
AssertionError: ValueError not raised by tounicode

--

--
nosy: +sbt
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-26 Thread STINNER Victor

STINNER Victor added the comment:

On Windows, test_array can use an invalid surrogate pair to test this
issue: b'\xff\xdf\x61\x00' for example.

I don't know how to easily check the size of wchar_t.
ctypes.sizeof(ctypes.c_wchar) can be used, but ctypes is not always
available. sys.unicode is now always 0x10 since Python 3.3.
PyUnicode_GetMax() is not accessible in Python.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-26 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 66e9d0185b0f by Victor Stinner in branch '3.3':
Issue #17223: Fix test_array on Windows (16-bit wchar_t/Py_UNICODE)
http://hg.python.org/cpython/rev/66e9d0185b0f

New changeset 5aaf6bc1d502 by Victor Stinner in branch 'default':
(Merge 3.3) Issue #17223: Fix test_array on Windows (16-bit wchar_t/Py_UNICODE)
http://hg.python.org/cpython/rev/5aaf6bc1d502

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-26 Thread STINNER Victor

STINNER Victor added the comment:

It was discussed to add new formats for UCS1, UCS2 and UCS4 formats to the 
array module, but nobody implemented the idea. The u format is kept unchanged 
(use Py_UNICODE / wchar_t) for backward compatibility with Python 3.2.

See also issue #13072 for this discussion.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-25 Thread STINNER Victor

STINNER Victor added the comment:

If the problem is that PyUnicode_FromUnicode() rejects character outside range 
[U+; U+10], it would be better to use the byte string '\xff' * 
sizeof_PY_UNICODE. U+66647361 may become valid in a future version of Unicode, 
I don't thing that U+ would become valid.

sizeof_PY_UNICODE is ctypes.sizeof(ctypes.c_wchar) since Python 3.3. '\xff' * 4 
works on any platform.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-25 Thread Ezio Melotti

Ezio Melotti added the comment:

 If the problem is that PyUnicode_FromUnicode() rejects character
 outside range [U+; U+10],

But this used to return two valid characters:
 str(array('u', b'asdf'))
array('u', '獡晤')

so I think it still should -- unless the operation was already nonsensical 
and/or there's no way to do the same thing on 3.3+ due to the change introduced 
by PEP 393.

 it would be better to use the byte string '\xff' * sizeof_PY_UNICODE. 

What for?

 U+66647361 may become valid in a future version of Unicode,

It won't.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-25 Thread Ezio Melotti

Ezio Melotti added the comment:

We discussed this on IRC, and apparently the seemingly valid result I got on 
3.2 was because I had a narrow build.  On a wide 3.2 build I get:
 str(array('u', b'asdf'))
array('u', '\\U66647361')

Since 3.3+ behaves like a wide build and since \U66647361 is not valid, I now 
agree that raising an error is the right thing to do.

If possible, even 3.2 should raise an error, rather than returning an invalid 
codepoint.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-25 Thread Roundup Robot

Roundup Robot added the comment:

New changeset c354afedb866 by Victor Stinner in branch '3.3':
Issue #17223: Fix PyUnicode_FromUnicode() for string of 1 character outside
http://hg.python.org/cpython/rev/c354afedb866

New changeset a4295ab52427 by Victor Stinner in branch 'default':
(Merge 3.3) Issue #17223: Fix PyUnicode_FromUnicode() for string of 1 character
http://hg.python.org/cpython/rev/a4295ab52427

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-25 Thread STINNER Victor

STINNER Victor added the comment:

 I think this should be updated to work with the PEP 393 implementation, 
 rather than raising an error.

It was discussed to add new formats for UCS1, UCS2 and UCS4 formats to the 
array module, but nobody implemented the idea. The u format is kept unchanged 
(use Py_UNICODE / wchar_t) for backward compatibility with Python 3.2.

--

I found another bug while trying Manuel's patch :-/ It's now fixed.

@Manuel: Thanks for your patch!

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-25 Thread Roundup Robot

Roundup Robot added the comment:

New changeset ebeed44702ec by Victor Stinner in branch '3.3':
Issue #17223: array module: Fix a crasher when converting an array containing
http://hg.python.org/cpython/rev/ebeed44702ec

New changeset 381de621ff6a by Victor Stinner in branch 'default':
(Merge 3.3) Issue #17223: array module: Fix a crasher when converting an array
http://hg.python.org/cpython/rev/381de621ff6a

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-24 Thread Ezio Melotti

Ezio Melotti added the comment:

Even if deprecated it should continue to work (if possible).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-23 Thread Manuel Jacob

Manuel Jacob added the comment:

http://docs.python.org/3/library/array.html states that the 'u' type code is 
deprecated together with the rest of the Py_UNICODE API (which includes 
PyUnicode_FromUnicode), so keeping this using PyUnicode_FromUnicode should be 
fine.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-22 Thread Ezio Melotti

Ezio Melotti added the comment:

Shouldn't this still work on 3.3/3.4?

In Modules/arraymodule.c:1562, in the array_tounicode function, there is:

return PyUnicode_FromUnicode((Py_UNICODE *) self-ob_item, Py_SIZE(self));

I think this should be updated to work with the PEP 393 implementation, rather 
than raising an error.

--
versions:  -Python 2.7, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-18 Thread Jesús Cea Avión

Changes by Jesús Cea Avión j...@jcea.es:


--
nosy: +jcea

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-17 Thread Manuel Jacob

New submission from Manuel Jacob:

 from array import array
 str(array('u', b'asdf'))
[1]19411 segmentation fault (core dumped)  python

This error occures with Python 3.3 and hg tip but not with Python 3.2.

--
components: Library (Lib), Unicode
messages: 182291
nosy: ezio.melotti, mjacob
priority: normal
severity: normal
status: open
title: Initializing array.array with unicode type code and buffer segfaults
type: crash
versions: Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-17 Thread Manuel Jacob

Manuel Jacob added the comment:

The attached patch fixes the crash.

Output:
 from array import array
 str(array('u', b'asdf'))
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: character U+66647361 is not in range [U+; U+10]

--
keywords: +patch
Added file: http://bugs.python.org/file29109/issue17223.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-17 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +haypo
stage:  - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-17 Thread Ezio Melotti

Ezio Melotti added the comment:

Thanks for the report and the patch.
Could you also include a test for this?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-17 Thread Manuel Jacob

Manuel Jacob added the comment:

I've attached a new patch with a test that segfaults on Python 3.3 and passes 
on hg tip with the patch applied.

--
Added file: http://bugs.python.org/file29110/issue17223_with_test.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-17 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@gmail.com:


--
versions: +Python 2.7, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17223] Initializing array.array with unicode type code and buffer segfaults

2013-02-17 Thread STINNER Victor

STINNER Victor added the comment:

issue17223_with_test.diff looks good to me (we may just drop {...} around 
return NULL).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17223
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com