New submission from Martin Panter:

There are moves at documenting and implementing support for “bytes-like” 
objects in more APIs, such as the “io” module (Issue 20699), http.client (Issue 
23740). The glossary definition is currently “An object that supports the 
Buffer Protocol, like bytes, bytearray or memoryview.” This was originally 
added for Issue 16518. However after reading Issue 23688, I realized that it 
should probably not mean absolutely _any_ object supporting the buffer 
protocol. For instance:

>>> reverse_view = memoryview(b"123")[::-1]
>>> stdout.buffer.write(reverse_view)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BufferError: memoryview: underlying buffer is not C-contiguous

I think the definition should at least be tightened to only objects with a 
contiguous buffer, and “contiguous” should be defined (probably in the linked C 
API page or the memoryview.contiguous flag definition, not the glossary). So 
far, my understanding is these are contiguous:

* A zero-dimensional object, such as a ctypes object
* An multi-dimensional array with items stored contiguously in order of 
increasing indexes. I.e. a_2,2 is stored somewhere after both a_1,2 and a_2,1, 
and all the strides are positive.

and these are not contiguous:

* memoryview(contiguous)[::2], because there are memory gaps between the items
* memoryview(contiguous)[::-1], despite there being no gaps nor overlapping 
items
* Views that set the “suboffsets” field (i.e. include pointers to further 
memory)
* Views where different array items overlap each other (e.g. 0 in view.strides)

Perhaps the bytes-like definition should tightened further, to match the above 
error message, to only “C-contiguous” buffers. I understand that C-contiguous 
means the strides tuple has to be in non-strict decreasing order, e.g. for 2 × 
1 × 3 arrays, strides == (3, 3, 1) is C-contiguous, but strides == (1, 3, 3) is 
not. This also needs documenting.

I’m not so sure about these, but the definition could be tightened even further:

* Require memoryview(x).cast("B") to be supported. Otherwise, native Python 
code would have to use workarounds like struct.pack_into() to write to the 
“bytes-like” object. See Issue 15944.
* Require len(view) == view.nbytes. This would help in some cases avoid the bug 
that I have seen of code naively calling len(data), but the downside is ctypes 
objects would no longer be considered bytes-like objects.

----------
assignee: docs@python
components: Documentation
messages: 239097
nosy: docs@python, vadmium
priority: normal
severity: normal
status: open
title: Tighten definition of bytes-like objects

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue23756>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to