[issue32431] Two bytes objects of zero length don't compare equal

2020-10-21 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> not a bug
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32431] Two bytes objects of zero length don't compare equal

2018-02-07 Thread Xiang Zhang

Xiang Zhang  added the comment:

In my mind I don't think here is any problem. The code

val = PyBytes_FromStringAndSize (NULL, 20);
Py_SIZE(val) = 0;

doesn't create a zero length bytes object. A resizing I think should always 
means reorganizing the internal representation, including changing the size 
attribute. Using Py_SIZE won't trigger a full resizing. The code is not 
resizing but just breaks the internal representation and then leads to bugs. In 
C level, if you want, it's easy to muck Python objects. And exposing that much 
implementation details in documentation seems unnecessary.

--
nosy: +xiang.zhang

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32431] Two bytes objects of zero length don't compare equal

2017-12-27 Thread Jonathan Underwood

Jonathan Underwood  added the comment:

Actually the commentary at the top of bytesobject.c for 
PyBytes_FromStringAndSize says:

"... If `str' is NULL then PyBytes_FromStringAndSize() will allocate `size+1' 
bytes (setting the last byte to the null terminating character)... "

So, perhaps that's as close to gospel as it gets - this does imply that bytes 
objects are expected to be null terminated. Why PyBytesAsString then adds an 
extra null terminator is a bit of a mystery.

Perhaps what's needed is some documentation clarifications:

1/ State early on that bytes objects are always expected to be null terminated.

2/ As such, the string pointer returned by PyBytes_AsString will point to a 
null terminated string - I think the current docs could be misinterpreted to 
suggest that _AsString *adds* an extra byte for the null, which it doesn't.

3/ Document that using Py_SIZE to reduce the length of a bytes object is 
dangerous, because the null terminator will be lost, and subsequent behaviour 
undefined.

4/ Document that the preferred way to resize is to use 
PyBytes_FromStringAndSize with a new size.

5/ Indicate clearly that _PyBytes_Resize is not a public interface and its use 
is discouraged in favour of PyBytes_FromStringAndSize

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32431] Two bytes objects of zero length don't compare equal

2017-12-27 Thread Jonathan Underwood

Jonathan Underwood  added the comment:

Py_SIZE is actually precisely specified and documented[1] as it stands; I don't 
think a change there is needed. The usage I outlined is in line with that 
documentation, and many other uses of that macro in the cpython sources.

The documentation issues are, at least:

1. There is no documentation specifying that bytes objects should be null 
terminated.

2. Nothing in the documentation of PyBytes_FromStringAndSize[2] specifies that 
passing 0 as the size results in a singleton being returned. This is 
undocumented behaviour, and it would seem fragile to rely on this.

But there are more implementation inconsistencies: the documentation for 
PyBytes_AsString()[3] returns a buffer which is one byte longer than the length 
of the object *in order to store a terminating null*, which implies that the 
object need not itself have a terminating null. I could go on with other 
examples, but this is very poorly defined behaviour.

Question: are bytes objects defined to be null terminated, or not?

Because if they're not defined to be null terminated, the fix I propose is 
correct even if it doesn't solve the other 100 bugs lurking in the code.

[Aside: even if bytes objects are in fact defined to be null terminated, I 
think the change proposed amounts to an optimization in any case.]

[1] https://docs.python.org/3/c-api/structures.html#c.Py_SIZE
[2] 
https://docs.python.org/3/c-api/bytes.html?highlight=pybytes_fromstringandsize#c.PyBytes_FromStringAndSize
[3] 
https://docs.python.org/3/c-api/bytes.html?highlight=pybytes_asstring#c.PyBytes_AsString

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32431] Two bytes objects of zero length don't compare equal

2017-12-27 Thread R. David Murray

R. David Murray  added the comment:

I think what Py_SIZE "means" depends on the object type in the general case, 
not just this one, so documenting that it is mucking with the internal 
representation is probably the way to go.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32431] Two bytes objects of zero length don't compare equal

2017-12-27 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Much C code implies that bytes are null terminated. For example bytes-to-number 
converters and most functions that accept bytes as file path. Changing just the 
compare logic will not help.

The documented way of changing the size of the bytes object is 
_PyBytes_Resize(). But since this is a private function the only way of 
resizing the bytes object using public functions is creating a new object with 
PyBytes_FromStringAndSize(). If the new size is 0, it will return a singleton, 
thus this is a fast and memory efficient way.

Using the Py_SIZE() macro with the bytes object is not even documented. We 
should either document this way with its caveats (for example misusing it can 
lead to inefficient use of memory), or admit that this is working with internal 
representation which we shouldn't encourage.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32431] Two bytes objects of zero length don't compare equal

2017-12-27 Thread Jonathan Underwood

Change by Jonathan Underwood :


--
keywords: +patch
pull_requests: +4911
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32431] Two bytes objects of zero length don't compare equal

2017-12-27 Thread Jonathan Underwood

Jonathan Underwood  added the comment:

https://github.com/python/cpython/pull/5021

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32431] Two bytes objects of zero length don't compare equal

2017-12-27 Thread R. David Murray

R. David Murray  added the comment:

My suspicion is that this behavior/code is left over from when the code was 
handling strings in python2, where strings were always null terminated and so 
the equal-bytes test would always pass.  I don't think this is appropriate for 
bytes objects, so I think the compare logic should be fixed.  But I don't deal 
with the C code much, so I'd like an opinion from a core dev who does.

--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32431] Two bytes objects of zero length don't compare equal

2017-12-27 Thread Jonathan Underwood

New submission from Jonathan Underwood :

With the current logic in Objects/bytesobject.c in the function 
bytes_compare_eq it can be the case that zero length bytes object object 
created in an extension module like this:

val = PyBytes_FromStringAndSize (NULL, 20);
Py_SIZE(val) = 0;

won't compare equal to b'' because the memory is not initialized, so the first 
two bytes won't be equal. Nonetheless, the Python interpreter does return b'' 
for print(repr(val)), so this behaviour is very confusing. To get the correct 
behaviour, one would have to initialize the memory:

val = PyBytes_FromStringAndSize (NULL, 20);
c = PyBytes_AS_STRING (val);
c[0] = '\0';
Py_SIZE(val) = 0;

However, it would be more sensible to fix the logic in bytes_compare_eq in my 
opinion. That function should return true for two zero length bytes objects, 
irrespective of the memory contents.

--
components: Interpreter Core
messages: 309086
nosy: jonathanunderwood
priority: normal
severity: normal
status: open
title: Two bytes objects of zero length don't compare equal
type: behavior
versions: Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com