[issue15625] Support u and w codes in memoryview

2016-05-25 Thread Марк Коренберг

Марк Коренберг added the comment:

Trigger the same bug

I want to effectively slice big unicode string. So I decide to use memoryview 
for that in order to eliminate memory copying.

In [33]: a = array.array('u', 'превед')
In [34]: m = memoryview(a)
In [35]: m[2:]
Out[35]: 
In [36]: m[0]
...
NotImplementedError: memoryview: format w not supported


1. Why format 'w' error if I asked 'u' ?
2. Format 'w' is not listed in https://docs.python.org/3.5/library/array.html
3. What is alternative for fast slicing, like memoryview(bytearray(b'test')), 
but for unicode ?

--
nosy: +mmarkk

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2015-04-15 Thread Steve Dower

Steve Dower added the comment:

Closing sounds good to me

--
nosy: +steve.dower
resolution:  - out of date
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2015-04-15 Thread Arnon Yaari

Arnon Yaari added the comment:

The documentation already specifies that 'u' is deprecated and doesn't mention 
the 'w' code. I think we can close this issue.

--
nosy: +wiggin15

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2012-10-03 Thread Jesús Cea Avión

Changes by Jesús Cea Avión j...@jcea.es:


--
versions: +Python 3.4 -Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2012-08-16 Thread Martin v . Löwis

Martin v. Löwis added the comment:

 My current inclination is still to apply Victor's patch from #13072  
 (which changes array to export the appropriate integer typecodes for  
 'u' arrays) and otherwise punt on this for 3.3 and try to sort out  
 the mess for 3.4.

I think this would be the worst choice. It would mean that we change
the format for exported array.arrays now for 3.3, and then change it
in 3.4 again. So anybody who cares about this would have to deal
with three different behaviors.

Note that the array module had been using 'u' and 'w' essentially
forever (i.e. since 3.0).

 For 3.4, I'm inclined to favour Stefan's proposal of C, U, W mapping  
 to multi-point sequences of UCS-1, UCS-2, UCS-4 code points (with  
 corresponding typecodes in the array module).

Fine with me in principle, although I see a problem when NumPy uses
'U' for UCS-4, yet CPython declares it to be UCS-2. I also think that
Travis' explicit agreement must be sought.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2012-08-16 Thread Nick Coghlan

Nick Coghlan added the comment:

I wouldn't change the export formats used for the 'u' typecode at all in 3.4 - 
I'd add new typecodes to array that match any new struct format characters and 
are exported accordingly. 'u' would *never* become a formally defined struct 
character, instead lingering in the array module as a legacy of the narrow/wide 
build distinction.

And good point that U would need to match UCS-4 to be consistent with NumPy. 
Perhaps we can just add 'U' in 3.4 and forget about UCS-2 entirely?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2012-08-16 Thread Martin v . Löwis

Martin v. Löwis added the comment:

 I wouldn't change the export formats used for the 'u' typecode at  
 all in 3.4 - I'd add new typecodes to array that match any new  
 struct format characters and are exported accordingly. 'u' would  
 *never* become a formally defined struct character, instead  
 lingering in the array module as a legacy of the narrow/wide build  
 distinction.

I think it is a desirable property that for an array A and an index
I, that A[I] == memoryview(A)[I]. Exporting the elements of an 'u'
array as integers would break that property.

So if we do want to support Unicode arrays (which some people apparently
want to see - I haven't heard anybody saying they actually *need* such
a type), the buffer type of it should be unicode, in some form, not
number.

I would be fine with deprecating the 'u' type arrays, acknowledging
that the Py_UNICODE element type is even more useless now than before.
If that is done, there is no point in fixing anything about it. If
it exports using the 'u' and 'w' codes - fine. If then memoryview
doesn't work properly - fine; this is a deprecated feature.

It should be fixed only if we want to support it properly (which I
believe this patch would do).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2012-08-16 Thread Nick Coghlan

Nick Coghlan added the comment:

I guess the main alternative to deprecation that preserves the invariant you 
describe would be to propagate the u == Py_UNICODE definition to memoryview. 
Since we're trying to phase out Py_UNICODE, deprecation seems the more sensible 
course.

Perhaps just a documented deprecation for now, like the rest of the Py_UNICODE 
based APIs?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2012-08-16 Thread Stefan Krah

Stefan Krah added the comment:

Martin v. Loewis rep...@bugs.python.org wrote:
 I would be fine with deprecating the 'u' type arrays, acknowledging
 that the Py_UNICODE element type is even more useless now than before.
 If that is done, there is no point in fixing anything about it. If
 it exports using the 'u' and 'w' codes - fine. If then memoryview
 doesn't work properly - fine; this is a deprecated feature.

From the perspective of memoryview backwards compatibility, deprecation is 
fine.
In 3.2, memoryview could really only handle one-dimensional buffers of unsigned
bytes:

 import array
 a = array.array('u', ABC)
 x = memoryview(a)
 a[0] == x[0]
False
 a[0]
'A'

# Indexing returns bytes instead of str:
 x[0]
b'A\x00'
 

# Index assignment attempts to do slice assignment:
 x[0] = 'Z'
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: 'str' does not support the buffer interface
 

I'm +1 for deprecating 'u' and 'w' in the array module, accept that memoryview
cannot handle 'u' and 'w' and fix the situation properly in 3.4. I agree that
the latter would require people to come up with actual use cases.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2012-08-16 Thread Stefan Krah

Stefan Krah added the comment:

Well, apparently people do use 'u', see #15035.

--
nosy: +ronaldoussoren

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2012-08-16 Thread Martin v . Löwis

Martin v. Löwis added the comment:

#15035 indicates that there is a need for UCS-2 arrays, using 'u' arrays was 
technically incorrect, since it is based on Py_UNICODE, whereas the API in 
question uses UniChar (which apparently is a two-byte type).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2012-08-16 Thread Stefan Krah

Stefan Krah added the comment:

Martin v. L??wis rep...@bugs.python.org wrote:
 #15035 indicates that there is a need for UCS-2 arrays, using 'u' arrays was 
 technically incorrect, since it is based on Py_UNICODE, whereas the API in 
 question uses UniChar (which apparently is a two-byte type).

Right, thanks for clearing that up. Then #15035 would indeed support deprecating
'u' and 'w' and moving on to UCS2 and UCS4 arrays.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2012-08-16 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2012-08-15 Thread Stefan Krah

Stefan Krah added the comment:

Nick's comment in msg167963 got me thinking. Indeed, in Numpy the 'U'
specifier is similar to the struct module's 's' format code, only for
UCS4. So I'm questioning whether the current semantics of 'u' and 'w'
used by array.array were ever intended by the PEP authors:


import numpy

 nd = numpy.array([A, B], dtype='U')
 nd
array(['A', 'B'],
  dtype='U1')
 nd.tostring()
b'A\x00\x00\x00B\x00\x00\x00'

 nd = numpy.array([ABC, D], dtype='U')
 nd
array(['ABC', 'D'],
  dtype='U3')
 nd.tostring()
b'A\x00\x00\x00B\x00\x00\x00C\x00\x00\x00D\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'



Internally, in NumPy 'U' is always UCS4, and the data type is a fixed
length string that has the length of the longest initializer element.


NumPy's use of 'U' seems vastly more useful for arrays than the behavior
of array.array:

 array.array('u', ['A', 'B'])
array('u', 'AB')
 array.array('u', ['ABC', 'D'])
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: array item must be unicode character


In Numpy, arrays of words are possible, with array.array they are not.

An additional thought: The convention in the struct module is to use
uppercase for unsigned types. So it would be a possibility to use
'C', 'U' and 'W', where '3C' would denote the same as '3s', except
for UCS1 instead of bytes.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2012-08-15 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Travis: can you please comment on what the intended semantics of the 'u' and 
'w' specifiers is, in PEP 3118? More specifically:

- an array/memoryview with format 'u' can support exactly one-character values 
(i.e. unicode strings of length 1): true or false?
- in a struct, an element of type 'u' will use up two bytes exactly (ignoring 
padding): true or false?

--
nosy: +teoliphant

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2012-08-15 Thread Nick Coghlan

Nick Coghlan added the comment:

I admit that the main thing that bothers me with the proposal in PEP 3118 is 
the inconsistency between c - bytes, while u, w - str

This was less of an issue in 2.x (which was the main frame of reference when 
the PEP was written), with implicit str/unicode interoperability, but seems 
quite jarring in the 3.x world.

Status quo:
struct module: 'c' = individual bytes, 's' = multi-byte sequence
array module: 'u' typecode may be either 2 bytes or 4 bytes (Py_UNICODE) (the 
addition of the 'w' typecode has been reverted)

My current inclination is still to apply Victor's patch from #13072 (which 
changes array to export the appropriate integer typecodes for 'u' arrays) and 
otherwise punt on this for 3.3 and try to sort out the mess for 3.4.

For 3.4, I'm inclined to favour Stefan's proposal of C, U, W mapping to 
multi-point sequences of UCS-1, UCS-2, UCS-4 code points (with corresponding 
typecodes in the array module).

Support for lowercase 'u' would then never become an official part of the 
buffer API, existing only as an array typecode.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2012-08-11 Thread Martin v . Löwis

New submission from Martin v. Löwis:

Currently, the following test case fails:

 import array
 a=array.array('u', 'foo')
 memoryview(a)==memoryview(a)
False

This is because the memoryview object doesn't support the u and w codes, as it 
should per PEP 3118. This patch fixes it.

--
files: uwcodes.diff
keywords: patch
messages: 168009
nosy: loewis, ncoghlan, skrah
priority: normal
severity: normal
status: open
title: Support u and w codes in memoryview
Added file: http://bugs.python.org/file26769/uwcodes.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15625] Support u and w codes in memoryview

2012-08-11 Thread Martin v . Löwis

Changes by Martin v. Löwis mar...@v.loewis.de:


--
versions: +Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15625
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com