[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-16 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-16 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thank you Josh and Xiang for your contribution.

--
dependencies:  -Replace PyUnicode_CompareWithASCIIString with 
_PyUnicode_EqualToASCIIString
resolution:  -> fixed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-16 Thread Roundup Robot

Roundup Robot added the comment:

New changeset b995a6950139 by Serhiy Storchaka in branch '3.6':
Issue #21449: Removed private function _PyUnicode_CompareWithId.
https://hg.python.org/cpython/rev/b995a6950139

New changeset 9b053d3c74dc by Serhiy Storchaka in branch 'default':
Issue #21449: Removed private function _PyUnicode_CompareWithId.
https://hg.python.org/cpython/rev/9b053d3c74dc

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-16 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

New changeset faf04a995031 by Serhiy Storchaka in branch '3.5':
Issue #28701: Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId.
https://hg.python.org/cpython/rev/faf04a995031

New changeset ff3dacc98b3a by Serhiy Storchaka in branch '3.6':
Issue #28701: Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId.
https://hg.python.org/cpython/rev/ff3dacc98b3a

New changeset 765013f71bc4 by Serhiy Storchaka in branch 'default':
Issue #28701: Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId.
https://hg.python.org/cpython/rev/765013f71bc4

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-16 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

LGTM. Except that _PyUnicode_EqualToASCIIString() could be used for simplicity.

> _PyUnicode_CompareWithId is a private function. We can remove it if it has 
> issues.

I would left it in maintained releases and removed it in 3.7 (or 3.6?).

--
assignee:  -> serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-16 Thread Xiang Zhang

Xiang Zhang added the comment:

> _PyUnicode_CompareWithId is a private function. We can remove it if it has 
> issues.

It doesn't. But once there is _PyUnicode_EqualToASCIIId, it's can be rarely 
used.

The new patch implements a version of _PyUnicode_EqualToASCIIId.

--
Added file: http://bugs.python.org/file45499/_PyUnicode_EqualToASCIIId.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-15 Thread STINNER Victor

STINNER Victor added the comment:

_PyUnicode_CompareWithId is a private function. We can remove it if it has
issues.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-15 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

_PyUnicode_EqualToASCIIString() added in issue28701 would help in the patch for 
this issue.

--
dependencies: +Replace PyUnicode_CompareWithASCIIString with 
_PyUnicode_EqualToASCIIString

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-15 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> Callers should be fixed to handle errors.

This would make the code of callers more complex for small benefit. And perhaps 
we will add more callers (if replace PyUnicode_CompareWithASCII with new 
function).

> Since currently _PyUnicode_CompareWithId is used to compare a unicode with 
> ascii identifiers for all cases, how about introduce a more specific function 
> like _PyUnicode_EqualToASCIIId for this case?

Great idea Xiang!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-15 Thread Xiang Zhang

Xiang Zhang added the comment:

Since currently _PyUnicode_CompareWithId is used to compare a unicode with 
ascii identifiers for all cases, how about introduce a more specific function 
like _PyUnicode_EqualToASCIIId for this case? We can preserve 
_PyUnicode_CompareWithId for more general purpose usage. It's much easier to 
write a error free _PyUnicode_EqualToASCIIId.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-15 Thread STINNER Victor

STINNER Victor added the comment:

> This issue is not just about readability or performance. It is about 
> correctness, since none of callers check for failure of 
> _PyUnicode_CompareWithId.

Callers should be fixed to handle errors.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-15 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> Please don't modify _PyUnicode_FromId(), I prefer to keep it as it is, decode 
> from UTF-8.

Then we should add handling of three special cases: PyUnicode_READY() fails, 
_PyUnicode_FromId() fails and both fail due to memory error. This means that 
should be implemented character-by-character encoding of UCS1, UCS2, UCS4 or 
wchar_t (with possible surrogate pairs) to UTF-8 and comparing with UTF-8 
encoded data.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-15 Thread STINNER Victor

STINNER Victor added the comment:

Please don't modify _PyUnicode_FromId(), I prefer to keep it as it is, decode 
from UTF-8.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-15 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
nosy:  -pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-15 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

_PyUnicode_FromId would not fail due to bad encoded data if use the Latin1 
encoding. Seems encoded data always is ASCII. We could use 
PyErr_WriteUnraisable() for output a warning if it is not.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-14 Thread Xiang Zhang

Xiang Zhang added the comment:

_PyUnicode_FromId could fail due to memoryerror or bad encoded data. They 
should be treated differently like PyUnicode_READY.

Could we treat it like PyDict_GetItem? If memoryerror happens it's highly 
possible other parts will fail too.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

There are yet few subtle details.

1. _PyUnicode_FromId() uses UTF-8 for decoding from C string, but 
PyUnicode_CompareWithASCIIString() uses Latin1. Two ways of comparison can 
return different results. Currently all identifiers are ASCII, thus perhaps we 
can ignore this issue for a time. Perhaps the simplest solution is to make 
PyUnicode_FromId() using ASCII or Latin1.

2. PyUnicode_READY() can fail either because Unicode object is misformed or due 
to MemoryError. The former case is unavoidable error and returning false is 
good. But the latter can be temporary error and we should add a fallback, 
compare wchar_t * representation of non-ready Unicode object with char * 
representation of identifier.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-13 Thread Xiang Zhang

Changes by Xiang Zhang :


Added file: http://bugs.python.org/file45480/_PyUnicode_EqualToId_v2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-13 Thread Xiang Zhang

Xiang Zhang added the comment:

> The name _PyUnicode_CompareWithIdEqual looks too long to me. What about 
> _PyUnicode_EqualToId?

+1. I think this name is more clear.

Serhiy's idea on the implementation sounds good. As for _PyUnicode_FROM_ID, I 
think it's better for another issue.

--
nosy: +xiang.zhang
Added file: http://bugs.python.org/file45479/_PyUnicode_EqualToId.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2016-11-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

This issue is not just about readability or performance. It is about 
correctness, since none of callers check for failure of 
_PyUnicode_CompareWithId.

I just came to the same problem from other side (replacing 
PyUnicode_CompareWithASCII with PyUnicode_EqualToASCII or 
_PyUnicode_CompareWithId). Josh's idea in general LGTM, but there are few 
details:

1. None of callers check for failure of new functions as well. In boolean 
context the failure is interpreted as true. What is even worse, an exception is 
set, and this can cause a crash in debug build.

If we don't want to rewrite all the uses of _PyUnicode_CompareWithId by adding 
the code for checking return value for error, we should make new function never 
failing. _PyUnicode_CompareWithId can fail if the first argument is not valid 
string or if there is no memory for allocating a Unicode object for identifier. 
I think it is better to return false value in these circumstances.

If the first argument is not string, it isn't equal to an identifier. If it is 
an invalid string, it can't be equal too. If there is no memory for allocating 
a Unicode object for identifier, we should fallback to comparing the raw 
content (like in PyUnicode_CompareWithASCII).

2. It may be worth to replace _PyUnicode_FromId with the macro:

#define _PyUnicode_FROM_ID(id) ((id)->object ? (id)->object : 
_PyUnicode_FromId(id))

Or rename _PyUnicode_FromId to _PyUnicode_FromIdInternal and _PyUnicode_FROM_ID 
to _PyUnicode_FromId? This would save us from rewriting a lot of code that uses 
_PyUnicode_FromId.

3. The name _PyUnicode_CompareWithIdEqual looks too long to me. What about 
_PyUnicode_EqualToId?

4. Pointers could be checked for equality before checking 
PyUnicode_CHECK_INTERNED(left). The former check is much cheaper than the 
latter.

--
components: +Interpreter Core, Unicode
nosy: +ezio.melotti, serhiy.storchaka
type:  -> behavior
versions: +Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2014-10-22 Thread Josh Rosenberg

Josh Rosenberg added the comment:

Is there someone else who should be looking at this? Having a fast path for 
identifier comparisons makes sense (and the concept of ordering between 
essentially unique identifiers makes no sense). It's not part of the public API 
(limited or not) so I don't think compatibility concerns apply, so it seems 
like this should be a simple change...

--
versions: +Python 3.5, Python 3.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21449
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2014-10-22 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Note that PyUnicode_CompareWithASCII should be quite fast in most cases (it 
uses memcmp() on UCS1 strings).

--
nosy: +pitrou
stage:  - patch review
versions:  -Python 3.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21449
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2014-10-22 Thread Antoine Pitrou

Antoine Pitrou added the comment:

That said, I think it's quite a good idea.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21449
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

2014-05-06 Thread Josh Rosenberg

New submission from Josh Rosenberg:

_PyUnicode_CompareWithId is used exclusively for equality comparisons (after 
all, identifiers aren't really sortable in a meaningful way; they're isolated 
values, not a continuum). But because _PyUnicode_CompareWithId maintains the 
general comparison behavior, not just ==/!=, it serves little purpose; while it 
checks the return of _PyUnicode_FromId, none of its callers check for failure 
anyway, so every use could just as well have been:

PyUnicode_Compare(left, _PyUnicode_FromId(right));

I've attached a patch that replaces _PyUnicode_CompareWithId with 
_PyUnicode_CompareWithIdEqual, that:

1. Only check equality vs. inequality
2. Can optimize for the case where left is an interned string by performing 
direct pointer comparison
3. Even when left is not interned, it can use the optimized unicode_compare_eq 
worker function instead of the slower generalized unicode_compare function

I've replaced all the uses of the old function I could find, and all unit tests 
pass. I don't expect to see any meaningful speed ups as a result of the change 
(the most commonly traversed code that would benefit appears to be the code 
that creates new classes, and the code that creates reprs for objects), but the 
goal here is not immediate speed ups, but enabling future speed ups.

I am looking into writing a PyDict_GetItem fastpath for looking up identifiers 
(that would remove the need to perform memory comparisons when the dictionary, 
as in keyword argument passing, is usually composed of interned keys), possibly 
in combination with making an identifier based version of 
PyArg_ParseTupleAndKeywords; with ArgumentClinic, it might become practical to 
swap in a new argument parser without having to manually change thousands of 
lines of code, and one of the simplest ways to improve speed would be to remove 
the overhead of constantly constructing, hashing, and comparing the same 
keyword strings every time a C function is called.

Adding haypo as nosy since he created the original function in #19512.

--
files: comparewithidequals.patch
keywords: patch
messages: 218022
nosy: haypo, josh.rosenberg
priority: normal
severity: normal
status: open
title: Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual
Added file: http://bugs.python.org/file35163/comparewithidequals.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21449
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com