On Wed, Oct 12, 2016 at 12:08 AM, INADA Naoki <songofaca...@gmail.com> wrote: > > Now I'm sure about bytes.frombuffer() is worth enough.
I would like to revive this thread (taking a liberty to shorten the subject line.) The issue of how the bytes(x) constructor should behave when given objects of various types have come up recently in issue 29159 (Regression in bytes constructor). [1] The regression was introduced in issue 27704 (bytes(x) is slow when x is bytearray) which attempted to speed-up creating bytes and bytearray from byte-like objects. I think the core problem is that the bytes(x) constructor tries to be the Jack of All Trades. Here is how it is documented in the docstring: | Construct an immutable array of bytes from: | - an iterable yielding integers in range(256) | - a text string encoded using the specified encoding | - any object implementing the buffer API. | - an integer | On the other hand, the reference manual while not having this description in the bytes section, has a similar list in the bytearray section. [3] """ The optional source parameter can be used to initialize the array in a few different ways: * If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode(). * If it is an integer, the array will have that size and will be initialized with null bytes. * If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array. * If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array. Without an argument, an array of size 0 is created. """ Note that the integer case is listed before buffer interface. Neither document mentions the possibility that the source type has a __bytes__ method. This ambiguity between integer-like and buffer-like sources causes a problem in the case when a 3rd party type is both integer-like and buffer-like. This is what happens with numpy arrays: >>> bytes(numpy.array([2], 'i1')) b'\x00\x00' >>> bytes(numpy.array([2, 2], 'i1')) b'\x02\x02' For better or worse, single-element numpy arrays have a working __index__ methods >>> numpy.array([2], 'i1').__index__() 2 and are interpreted as integers by the bytes(X) constructor. I propose the following: 1. For 3.6, restore and document 3.5 behavior. Recommend that 3rd party types that are both integer-like and buffer-like implement their own __bytes__ method to resolve the bytes(x) ambiguity. 2. For 3.7, I would like to see a drastically simplified bytes(x): 2.1. Accept only objects with a __bytes__ method or a sequence of ints in range(256). 2.2. Expand __bytes__ definition to accept optional encoding and errors parameters. Implement str.__bytes__(self, [encoding[, errors]]). 2.3. Implement new specialized bytes.fromsize and bytes.frombuffer constructors as per PEP 467 and Inada Naoki proposals. 2.4. Implement memoryview.__bytes__ method so that bytes(memoryview(x)) works ad before. 2.5. Implement a fast bytearray.__bytes__ method. 3. Consider promoting __bytes__ to a tp_bytes type slot. [1]: http://bugs.python.org/issue29159 [2]: http://bugs.python.org/issue27704 [3]: https://docs.python.org/3/library/functions.html#bytearray
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com