Re: [Python-Dev] If you shadow a module in the standard library that IDLE depends on, bad things happen

2015-10-30 Thread Serhiy Storchaka

On 30.10.15 09:57, Nathaniel Smith wrote:

Unfortunately I think that (among other things) there are a lot of
scripts out there that blindly do sys.path.pop(0) to remove the ""
entry, so the backcompat costs of changing this would probably be
catastrophic.


You are right. There are too much occurrences even in public libraries.

https://code.openhub.net/search?s=%22sys.path.pop(0)%22&p=0

Possible workaround is to add fake path (or a duplicate of system path) 
at the start of sys.path. Then dropping first element will not break the 
script.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Unable to submit a patch to the tracker

2015-10-31 Thread Serhiy Storchaka
I'm unable to submit any file to any issue, neither via web-form nor via 
e-mail. Checked with different browsers from different computers. 
Meta-tracker doesn't work too.


http://psf.upfronthosting.co.za/roundup/meta/issue575

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unable to submit a patch to the tracker

2015-11-01 Thread Serhiy Storchaka

On 01.11.15 08:29, Serhiy Storchaka wrote:

I'm unable to submit any file to any issue, neither via web-form nor via
e-mail. Checked with different browsers from different computers.
Meta-tracker doesn't work too.

http://psf.upfronthosting.co.za/roundup/meta/issue575


Sorry for the noise. The cause was in my mistake.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Serhiy Storchaka
For now UTF-16 and UTF-32 source encodings are not supported. There is a 
comment in Parser/tokenizer.c:


/* Disable support for UTF-16 BOMs until a decision
   is made whether this needs to be supported.  */

Can we make a decision whether this support will be added in foreseeable 
future (say in near 10 years), or no?


Removing commented out and related code will help to refactor the 
tokenizer, and that can help to fix some existing bugs (e.g. issue14811, 
issue18961, issue20115 and may be others). Current tokenizing code is 
too tangled.


If the support of UTF-16 and UTF-32 is planned, I'll take this to 
attention during refactoring. But in many places besides the tokenizer 
the ASCII compatible encoding of source files is expected.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Support of UTF-16 and UTF-32 source encodings

2015-11-14 Thread Serhiy Storchaka

On 15.11.15 00:56, Victor Stinner wrote:

These encodings are rarely used. I don't think that any text editor use
them. Editors use ascii, latin1, utf8 and... all locale encoding. But I
don't know any OS using UTF-16 as a locale encoding. UTF-32 wastes disk
space.


AFAIK the standard Windows editor Notepad uses UTF-16. And I often 
encountered Windows resource files in UTF-16. UTF-16 was more popular 
than UTF-8 on Windows some time. If this horse is dead I'll throw it away.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Reading Python source file

2015-11-16 Thread Serhiy Storchaka
I'm working on rewriting Python tokenizer (in particular the part that 
reads and decodes Python source file). The code is complicated. For now 
there are such cases:


* Reading from the string in memory.
* Interactive reading from the file.
* Reading from the file:
  - Raw reading ignoring encoding in parser generator.
  - Raw reading UTF-8 encoded file.
  - Reading and recoding to UTF-8.

The file is read by the line. It makes hard to check correctness of the 
first line if the encoding is specified in the second line. And it makes 
very hard problems with null bytes and with desynchronizing buffered C 
and Python files. All this problems can be easily solved if read all 
Python source file in memory and then parse it as string. This would 
allow to drop a large complex and buggy part of code.


Are there disadvantages in this solution? As for memory consumption, the 
source text itself will consume only small part of the memory consumed 
by AST tree and other structures. As for performance, reading and 
decoding all file can be faster then by the line.


[1] http://bugs.python.org/issue25643

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reading Python source file

2015-11-17 Thread Serhiy Storchaka

On 17.11.15 05:00, Guido van Rossum wrote:

If you free the memory used for the source buffer before starting code
generation you should be good.


Thank you. The buffer is freed just after the end of generating AST.


On Mon, Nov 16, 2015 at 5:53 PM, Serhiy Storchaka  wrote:

I'm working on rewriting Python tokenizer (in particular the part that reads
and decodes Python source file). The code is complicated. For now there are
such cases:

* Reading from the string in memory.
* Interactive reading from the file.
* Reading from the file:
   - Raw reading ignoring encoding in parser generator.
   - Raw reading UTF-8 encoded file.
   - Reading and recoding to UTF-8.

The file is read by the line. It makes hard to check correctness of the
first line if the encoding is specified in the second line. And it makes
very hard problems with null bytes and with desynchronizing buffered C and
Python files. All this problems can be easily solved if read all Python
source file in memory and then parse it as string. This would allow to drop
a large complex and buggy part of code.

Are there disadvantages in this solution? As for memory consumption, the
source text itself will consume only small part of the memory consumed by
AST tree and other structures. As for performance, reading and decoding all
file can be faster then by the line.

[1] http://bugs.python.org/issue25643

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/guido%40python.org







___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reading Python source file

2015-11-17 Thread Serhiy Storchaka

On 17.11.15 05:05, MRAB wrote:

As I understand it, *nix expects the shebang to be b'#!', which means
that the
first line should be ASCII-compatible (it's possible that the UTF-8 BOM
might
be present). This kind of suggests that encodings like UTF-16 would cause a
problem on such systems.

The encoding line also needs to be ASCII-compatible.

I believe that the recent thread "Support of UTF-16 and UTF-32 source
encodings" also concluded that UTF-16 and UTF-32 shouldn't be supported.

This means that you could treat the first 2 lines as though they were some
kind of extended ASCII (Latin-1?), the line ending being '\n' or '\r' or
'\r\n'.

Once you'd identify the encoding, you could decode everything (including
the
shebang line) using that encoding.


Yes, that is what I were going to implement (and already halfway here). 
My question is whether it is worth to complicate the code further to 
preserve reading by the line. In any case after reading the first line 
that doesn't contain neither coding cookie, nor non-comment tokens, we 
need to wait the second line.



(What should happen if the encoding line then decoded differently, i.e.
encoding_line.decode(encoding) != encoding_line.decode('latin-1')?)


The parser should got the line decoded with specified encoding.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reading Python source file

2015-11-17 Thread Serhiy Storchaka

On 17.11.15 11:59, M.-A. Lemburg wrote:

I don't think these situations are all that common, though,
so reading in the full source code before compiling it
sounds like a reasonable approach.

We use the same simplification in eGenix PyRun's emulation of
the Python command line interface and it has so far not
caused any problems.


Current implementation of import system went the same way. As a result 
importing the script as a module and running it with command line can 
have different behaviours in corner cases.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reading Python source file

2015-11-17 Thread Serhiy Storchaka

On 17.11.15 17:22, Guido van Rossum wrote:

But more important is the interactive REPL, which parses your input
fully each time you hit ENTER.


Interactive REPL runs different code. It is simpler that the code for 
reading from file, because it have no care about BOM or coding cookie.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reading Python source file

2015-11-17 Thread Serhiy Storchaka

On 17.11.15 18:06, Guido van Rossum wrote:

OK, but what are you going to do about the interactive REPL?


Nothing (except some simplification). This is a separate branch of the code.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reading Python source file

2015-11-19 Thread Serhiy Storchaka

On 17.11.15 18:50, Guido van Rossum wrote:

On Tue, Nov 17, 2015 at 8:20 AM, Serhiy Storchaka  wrote:

Current implementation of import system went the same way. As a result
importing the script as a module and running it with command line can have
different behaviours in corner cases.


I'm confused. *Of course* these two behaviors differ, since Python
uses a different __name__. Not sure how this relates to the REPL.


Sorry for confusing. I meant parser level. File parser has a few bugs, 
that can cause that the source will be differently interpreted with file 
and string parsers. For example attached script produces different 
output, "ä" if executed as a script, and "À" if imported as a module.


And there is a question about the null byte. Now compile(), exec(), 
eval() raises an exception if the script contains the null byte. 
Formerly they accepted it, but the null byte ended the script. The 
behavior of file parser is more weird. The null byte makes parser to 
ignore the end of script including the newline byte [1]. E.g. 
"#\0\nprint('a')" is interpreted as "#print('a')". This is different 
from PyPy (and may be other implementations) that interprets the null 
byte just as ordinal character.


The question is wherever we should support the nu

[Python-Dev] Deleting with setting C API functions

2015-11-24 Thread Serhiy Storchaka
Slots like PyTypeObject.tp_setattr, PySequenceMethods.sq_ass_item, 
PyMappingMethods.mp_ass_subscript are used not only for setting 
attribute/item value, but also for deleting attribute/item (if value is 
NULL). This is not documented and should be. [1]  Correspondingly public 
API functions like PyObject_SetAttr, PyObject_SetItem, 
PySequence_SetItem, PySequence_SetSlice, PyMapping_SetItemString can be 
used for deleting. But all these functions have special counterparts for 
deleting: PyObject_DelAttr etc.


The question is wherever deleting ability of Set-functions is 
intentional, should we document this or deprecate and then delete?


[1] http://bugs.python.org/issue25701

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] Daily reference leaks (e5e507a357a6): sum=103

2015-11-24 Thread Serhiy Storchaka

On 24.11.15 21:31, Brett Cannon wrote:

Someone just added a leak to pickle.


It always was here. I just added tests that expose it.

http://bugs.python.org/issue25725


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deleting with setting C API functions

2015-12-01 Thread Serhiy Storchaka

On 25.11.15 08:39, Nick Coghlan wrote:

On 25 November 2015 at 07:33, Guido van Rossum  wrote:

Ooooh, that's probably really old code. I guess for the slots the
reasoning is to save on slots. For the public functions, alas it will
be hard to know if anyone is depending on it, even if it's
undocumented. Perhaps add a deprecation warning to these if the value
is NULL for one release cycle?


I did a quick scan for "PyObject_SetAttr", and it turns out
PyObject_DelAttr is only a convenience macro for calling
PyObject_SetAttr with NULL as the value argument. bltinmodule.c and
ceval.c also both include direct calls to PyObject_SetAttr with
"(PyObject *)NULL" as the value argument.

Investigating some of the uses that passed a variable as the value
argument, one case is the weakref proxy implementation, which uses
PyObject_SetAttr on the underlying object in its implementation of the
setattr slot in the proxy.

So it looks to me like replicating the NULL-handling behaviour of the
slots in the public Set* APIs was intentional, and it's just the
documentation of that detail that was missed (since most folks
presumably use the Del* convenience APIs instead).


I'm not sure. This looks rather as implementation detail to me. There 
cases found by you are the only cases in the core/stdlib that call 
PyObject_SetAttr with third argument is NULL. Tests are passed after 
replacing Set* functions with Del* functions in these cases and making 
Set* functions to reject value is NULL. [1]


Wouldn't be worth to deprecate deleting with Set* functions? Neither 
other abstract Set* APIs, not concrete Set* APIs don't support deleting. 
Deleting with Set* API can be unintentional and hide a bug.


[1] http://bugs.python.org/issue25773

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deleting with setting C API functions

2015-12-02 Thread Serhiy Storchaka
середа, 02-гру-2015 08:30:35 ви написали:
> Le 1 déc. 2015 16:51, "Serhiy Storchaka"  a écrit :
> > Wouldn't be worth to deprecate deleting with Set* functions? Neither
> > other abstract Set* APIs, not concrete Set* APIs don't support deleting.
> >Deleting with Set* API can be unintentional and hide a bug.
> Wow wow wow, what? No, dont break Python C API for purity. 8 years later,
> we are still porting projects to python 3 And we are not done yet.

I suggest just to deprecate this feature. I'm not suggesting to remove it in 
the foreseeable future (at least before 4.0).

> Practicability beats purity.

I don't think this argument applies here. Two things make the deprecation more 
painless than usual:

1. This feature has never been documented.

2. PyObject_DelAttr() exists from the start (from the time of adding Generic 
Abstract Object Interface).

You have enough time to update your projects, and you can update them 
uniformly for all versions. And may be you will found few weird bugs related 
to misuse of Set* API.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deleting with setting C API functions

2015-12-02 Thread Serhiy Storchaka

On 02.12.15 12:06, Victor Stinner wrote:

2015-12-02 9:42 GMT+01:00 Serhiy Storchaka :

You have enough time to update your projects, and you can update them
uniformly for all versions. And may be you will found few weird bugs related
to misuse of Set* API.


Did you check popular projects using C extensions to check if they
call Set*() functions to delete attributes/items?


I have checked following projects.

regex, simplejson, Pillow, PyQt4, LibreOffice, PyGTK, PyICU, pyOpenSSL, 
libxml2, Boost, psutil, mercurial don't use PyObject_SetAttr at all.


NumPy, pgobject don't use PyObject_SetAttr for deleting.

PyYAML and lxml use PyObject_SetAttr only in code generated by Cython 
and never use it for deleting.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 4 musings (was Re: Deleting with setting C API functions)

2015-12-02 Thread Serhiy Storchaka

On 03.12.15 01:26, Gregory P. Smith wrote:

Except that we should skip version 4 and go directly to 5 in homage to
http://www.montypython.net/scripts/HG-handgrenade.php.


Good point! So now we can assign version 4 as a term of un-realising any 
stupid ideas.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python semantic: Is it ok to replace not x == y with x != y? (no)

2015-12-16 Thread Serhiy Storchaka

On 15.12.15 15:04, Victor Stinner wrote:

Should Python emit a warning when __eq__() is implemented but not __ne__()?


No. Actually I had removed a number of redundant (and often incorrect) 
__ne__ implementations after fixing object.__ne__.



Should Python be modified to call "not __eq__()" when __ne__() is not
implemented?


__ne__() always is implemented (inherited from object). Default __ne__ 
implementation calls __eq__() and negate it's result (if not 
NotImplemented).


But user class can define __ne__ with arbitrary semantic. That is the 
purpose of adding __ne__.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] New poll about a macro for safe reference replacing

2015-12-16 Thread Serhiy Storchaka
I'm bringing this up again, since the results of the previous poll did 
not give an unambiguous result. Related links: [1], [2], [3], [4].


Let me remind you that we are talking about adding the following macro. 
It is needed for safe replacement links. For now there is at least one 
open crash report that can be solved with this macro [5] (I think there 
is yet one, but can't find it just now). And 50 potential bugs for which 
we just still do not have a reproducer.


#define Py_XXX(ptr, value)\
{ \
PyObject *__tmp__ = ptr;  \
ptr = new_value;  \
Py_DECREF(__tmp__);   \
}

The problem is only in the macro name. There are objections against any 
proposed name, and no one name gained convincing majority.


Here are names gained the largest numbers of votes plus names proposed 
during polling.


1. Py_SETREF
2. Py_DECREF_REPLACE
3. Py_REPLACE
4. Py_SET_POINTER
5. Py_SET_ATTR
6. Py_REPLACE_REF

Please put your vote (a floating number from -1 to 1 including) for 
every of proposed name. You also can propose new name.



[1] https://mail.python.org/pipermail/python-dev/2008-May/079862.html
[2] http://comments.gmane.org/gmane.comp.python.devel/145346
[3] http://comments.gmane.org/gmane.comp.python.devel/145974
[4] http://bugs.python.org/issue20440
[5] http://bugs.python.org/issue24103

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New poll about a macro for safe reference replacing

2015-12-16 Thread Serhiy Storchaka

On 16.12.15 16:53, Random832 wrote:

At the risk of bikeshedding, this needs do { ... } while(0), or
it almost certainly will eventually be called incorrectly in an
if/else statement.  Yes, it's ugly, but that's part of the cost
of using macros.


Yes, of course, and the patch for issue20440 uses this idiom. Here it is 
omitted for clearness.



If it were implemented as below, then it could evaluate ptr only
once at the cost of requiring it to refer to an addressable
pointer object:
 PyObject **__tmpp__ == &(ptr);
 PyObject *__tmp__ = *__tmpp__;
 *__tmpp__ = (new_value);
 PY_DECREF(__tmp__);

I'm not entirely sure of the benefit of a macro over an inline
function.


Because the first argument is passed by reference (as in Py_INCREF etc).


Or why it doesn't INCREF the new value, maintaining
the invariant that ptr is an owned reference.


Because in the majority of using cases stealing a reference is what is 
needed. Otherwise we would virtually always need to decref a reference 
just after using this macro. And couldn't use it as Py_XXX(obj->attr, 
PySomething_New()).



I think "SET" names imply that it's safe if the original
reference is NULL. This isn't an objection to the names, but if
it is given one of those names I think it should use Py_XDECREF.


Originally I proposed pairs of functions with and withot X in the name 
(as Py_DECREF/Py_XDECREF). In this poll this detail is omitted for 
clearness. Later we can create a new poll if needed.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New poll about a macro for safe reference replacing

2015-12-21 Thread Serhiy Storchaka

On 16.12.15 16:12, Serhiy Storchaka wrote:

Please put your vote (a floating number from -1 to 1 including) for
every of proposed name. You also can propose new name.


Thank you all for your votes.

Results of the poll:

Py_SETREF:  +5 = +5 (Victor, Steve, Yury, Brett, Nick) +0 (Ryan, Martin)

Py_REPLACE_REF:  +2.5 = +2.5 (Ryan, Victor, Steve, Martin) -0 (Nick)

Py_REPLACE: +0 = +1 (Martin) -1 (Ryan) +0 (Nick)

Py_RESET:  0 = +1 (Ryan) -1 (Martin)

Py_DECREF_REPLACE: -2 = +1 (Ryan, Martin) -3 (Victor, Steve, Nick)

Py_SET_POINTER, Py_SET_ATTR: -5 (Ryan, Victor, Steve, Martin, Nick)

Therefore Py_SETREF is the winner.

But I want also to remember objections against it formulated in previous 
discussion.


1) By analogy with Py_INCREF and Py_DECREF that increment and decrement 
the reference counter of the object, Py_SETREF looks as it *sets* the 
reference counter of the object.


2) By analogy with PyList_SET_ITEM, PyTuple_SET_ITEM, PyCell_SET, etc, 
it is not expected that Py_SETREF decrement the refcounter of the old 
value before overwriting it.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New poll about a macro for safe reference replacing

2015-12-22 Thread Serhiy Storchaka

On 21.12.15 17:37, Nick Coghlan wrote:

Avoiding those misleading associations is a good argument in favour of
Py_REPLACE over Py_SETREF - they didn't occur to me before casting my
votes, and I can definitely see them causing confusion in the future.

So perhaps the combination that makes the most sense is to add
Py_REPLACE (uses Py_DECREF on destination) & Py_XREPLACE (uses
Py_XDECREF on destination) to the existing Py_CLEAR?


And we return to where we started. Although I personally prefer 
Py_REPLACE/Py_XREPLACE, I'm afraid that using them would look like I 
just ignore the results of the poll. Because Py_SETREF looks good for 
most developers at first glance, I hope this will not lead to confusion 
in the future. If there are no new objections, I will commit the trivial 
auto-generated patch today and will provide a patch that covers more 
non-trivial cases. Now is better than never, and we have been 
bikeshedding this too long for "right now".



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New poll about a macro for safe reference replacing

2015-12-22 Thread Serhiy Storchaka

On 21.12.15 23:57, Steve Dower wrote:

Was Py_MOVEREF (or MOVE_REF) ever suggested?


This would be nice name. The macro moves the ownership. But I think it's 
too late. Otherwise we'll never finish the bikeshedding.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New poll about a macro for safe reference replacing

2015-12-23 Thread Serhiy Storchaka

On 22.12.15 18:36, Meador Inge wrote:

On Tue, Dec 22, 2015 at 3:58 AM, Serhiy Storchaka mailto:storch...@gmail.com>> wrote:

On 21.12.15 23:57, Steve Dower wrote:

Was Py_MOVEREF (or MOVE_REF) ever suggested?


This would be nice name. The macro moves the ownership. But I think
it's too late. Otherwise we'll never finish the bikeshedding.


FWIW, I like this name the best.  It is increasingly popular for
languages to talk about moving ownership (e.g. move semantics in C++,
Rust, etc...).


Oh, I'm confused. Should I make a new poll? With new voters Py_MOVEREF 
can get more votes than Py_SETREF.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New poll about a macro for safe reference replacing

2015-12-23 Thread Serhiy Storchaka

On 23.12.15 16:52, Chris Angelico wrote:

On Thu, Dec 24, 2015 at 1:50 AM, Serhiy Storchaka  wrote:

Oh, I'm confused. Should I make a new poll? With new voters Py_MOVEREF can
get more votes than Py_SETREF.


I suggest cutting off the bikeshedding. Both of these options have
reasonable support. Pick either and run with it, and don't worry about
another vote.


This would be a voluntarism.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Update PEP 7 to require curly braces in C

2016-01-18 Thread Serhiy Storchaka

On 17.01.16 21:10, Brett Cannon wrote:

While doing a review of http://bugs.python.org/review/26129/
 I asked to have curly braces put
around all `if` statement bodies. Serhiy pointed out that PEP 7 says
curly braces are optional:
https://www.python.org/dev/peps/pep-0007/#id5. I would like to change that.

My argument is to require them to prevent bugs like the one Apple made
with OpenSSL about two years ago:
https://www.imperialviolet.org/2014/02/22/applebug.html. Skipping the
curly braces is purely an aesthetic thing while leaving them out can
lead to actual bugs.

Anyone object if I update PEP 7 to remove the optionality of curly
braces in PEP 7?


I'm -0. The code without braces looks more clear. Especially if the body 
is one-line return, break, continue or goto statement. Sometimes it is 
appropriate to add an empty line after it for even larger clearness. On 
the other hand, there is no a precedence of bugs like the one Apple made 
in CPython sources. Mandatory braces *may be* will prevent hypothetical 
bug, but for sure make a lot of correct code harder to read.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Update PEP 7 to require curly braces in C

2016-01-18 Thread Serhiy Storchaka

On 18.01.16 13:42, Maciej Szulik wrote:

We'll be soon moving to github, which should simplify the process of
submitting PRs from other developers
interested in making our beautiful language even more awesome. I'm quite
positive that with current review
process that kind of bug should not happen, but you never know.


If moving to GitHub will decrease the quality of source code, it is bad 
idea.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] How to resolve distinguishing between documentation and implementation

2016-01-29 Thread Serhiy Storchaka
How to resolve distinguishing between documentation and implementation 
if current implementation is incorrect, but third-party code can 
implicitly depends on it?


For example see issue26198. Currently buffer overflow of predefined 
buffer for "es#" and "et#" format units causes TypeError (with 
misleading message, but this is other story). The correct and 
*documented* exception is ValueError. User code can depend on current 
behavior, because TypeError is what is raised now for this type of 
errors, and this is what is raised for other types of errors. Unlikely 
authors of such code read the documentation, otherwise this issue would 
be reported earlier. On other hand, looks these format units are rarely 
used with predefined buffer (never in the stdlib since 3.5).


I think it is obvious that the code in the development branch should be 
changed to produce documented and more logical exception. But what about 
bugfix releases? Changing the documentation would be misleading, 
changing the code can break existing code (unlikely, but).


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More optimisation ideas

2016-01-30 Thread Serhiy Storchaka

On 29.01.16 19:05, Steve Dower wrote:

This is probably the code snippet that bothered me the most:

 ### Encoding table
 encoding_table=codecs.charmap_build(decoding_table)

It shows up in many of the encodings modules, and while it is not a bad
function in itself, we are obviously generating a known data structure
on every startup. Storing these in static data is a tradeoff between
disk space and startup performance, and one I think it likely to be
worthwhile.


$ ./python -m timeit -s "import codecs; from encodings.cp437 import 
decoding_table" -- "codecs.charmap_build(decoding_table)"

10 loops, best of 3: 4.36 usec per loop

Getting rid from charmap_build() would save you at most 4.4 microseconds 
per encoding. 0.0005 seconds if you have imported *all* standard encodings!


And how you expected to store encoding_table in more efficient way?

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More optimisation ideas

2016-01-30 Thread Serhiy Storchaka

On 30.01.16 18:31, Steve Dower wrote:

On 30Jan2016 0645, Serhiy Storchaka wrote:

$ ./python -m timeit -s "import codecs; from encodings.cp437 import
decoding_table" -- "codecs.charmap_build(decoding_table)"
10 loops, best of 3: 4.36 usec per loop

Getting rid from charmap_build() would save you at most 4.4 microseconds
per encoding. 0.0005 seconds if you have imported *all* standard
encodings!


Just as happy to be proven wrong. Perhaps I misinterpreted my original
profiling and then, embarrassingly, ran with the result for a long time
without retesting.


AFAIK the most time is spent in system calls like stat or open. 
Archiving the stdlib into the ZIP file and using zipimport can decrease 
Python startup time (perhaps there is an open issue about this).



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-02 Thread Serhiy Storchaka

On 01.02.16 21:10, Yury Selivanov wrote:

To measure the max/average memory impact, I tuned my code to optimize
*every* code object on *first* run.  Then I ran the entire Python test
suite.  Python test suite + standard library both contain around 72395
code objects, which required 20Mb of memory for caches.  The test
process consumed around 400Mb of memory.  Thus, the absolute worst case
scenario, the overhead is about 5%.


Test process consumes such much memory because few tests creates huge 
objects. If exclude these tests (note that tests that requires more than 
1Gb are already excluded by default) and tests that creates a number of 
threads (threads consume much memory too), the rest of tests needs less 
than 100Mb of memory. Absolute required minimum is about 25Mb. Thus, the 
absolute worst case scenario, the overhead is about 100%.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-02 Thread Serhiy Storchaka

On 02.02.16 19:45, Yury Selivanov wrote:

On 2016-02-02 12:41 PM, Serhiy Storchaka wrote:

On 01.02.16 21:10, Yury Selivanov wrote:

To measure the max/average memory impact, I tuned my code to optimize
*every* code object on *first* run.  Then I ran the entire Python test
suite.  Python test suite + standard library both contain around 72395
code objects, which required 20Mb of memory for caches.  The test
process consumed around 400Mb of memory.  Thus, the absolute worst case
scenario, the overhead is about 5%.


Test process consumes such much memory because few tests creates huge
objects. If exclude these tests (note that tests that requires more
than 1Gb are already excluded by default) and tests that creates a
number of threads (threads consume much memory too), the rest of tests
needs less than 100Mb of memory. Absolute required minimum is about
25Mb. Thus, the absolute worst case scenario, the overhead is about 100%.

Can you give me the exact configuration of tests (command line to run)
that would only consume 25mb?


I don't remember what exact tests consume the most of memory, but 
following tests are failed when run with less than 30Mb of memory:


test___all__ test_asynchat test_asyncio test_bz2 test_capi 
test_concurrent_futures test_ctypes test_decimal test_descr 
test_distutils test_docxmlrpc test_eintr test_email test_fork1 
test_fstring test_ftplib test_functools test_gc test_gdb test_hashlib 
test_httplib test_httpservers test_idle test_imaplib test_import 
test_importlib test_io test_itertools test_json test_lib2to3 test_list 
test_logging test_longexp test_lzma test_mmap test_multiprocessing_fork 
test_multiprocessing_forkserver test_multiprocessing_main_handling 
test_multiprocessing_spawn test_os test_pickle test_poplib test_pydoc 
test_queue test_regrtest test_resource test_robotparser test_shutil 
test_smtplib test_socket test_sqlite test_ssl test_subprocess 
test_tarfile test_tcl test_thread test_threaded_import 
test_threadedtempfile test_threading test_threading_local 
test_threadsignals test_tix test_tk test_tools test_ttk_guionly 
test_ttk_textonly test_tuple test_unicode test_urllib2_localnet 
test_wait3 test_wait4 test_xmlrpc test_zipfile test_zlib



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-02 Thread Serhiy Storchaka

On 02.02.16 21:23, Yury Selivanov wrote:

Alright, I modified the code to optimize ALL code objects, and ran unit
tests with the above tests excluded:

-- Max process mem (ru_maxrss) = 131858432
-- Opcode cache number of objects  = 42109
-- Opcode cache total extra mem= 10901106


Thank you for doing these tests. Now results are more convincing to me.


And asyncio tests:

-- Max process mem (ru_maxrss) = 57081856
-- Opcode cache number of objects  = 4656
-- Opcode cache total extra mem= 1766681



FWIW, here are stats for asyncio with only hot objects being optimized:

-- Max process mem (ru_maxrss) = 54775808
-- Opcode cache number of objects  = 121
-- Opcode cache total extra mem= 43521


Interesting, 57081856 - 54775808 = 2306048, but 1766681 - 43521 = 
1723160. There are additional 0.5Mb lost during fragmentation.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-02 Thread Serhiy Storchaka

On 02.02.16 21:41, Yury Selivanov wrote:

I can write a ceval.txt file explaining what's going on
in ceval loop, with details on the opcode cache and other
things.  I think it's even better than a PEP, to be honest.


I totally agree.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-10 Thread Serhiy Storchaka

On 08.02.16 16:32, Victor Stinner wrote:

On Python 2, it wasn't possible to use Unicode for filenames, many
functions fail badly with Unicode, especially when you mix bytes and
Unicode.


Even not all os functions support Unicode.
See http://bugs.python.org/issue18695.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-11 Thread Serhiy Storchaka

On 11.02.16 00:20, Georg Brandl wrote:

**Group 1: liberal (like this PEP)**

* D [2]_
* Perl 5 (although docs say it's more restricted) [3]_
* Rust [4]_
* Swift (although textual description says "between digits") [5]_

**Group 2: only between digits, multiple consecutive underscores**

* C# (open proposal for 7.0) [6]_
* Java [7]_

**Group 3: only between digits, only one underscore**

* Ada [8]_
* Julia (but not in the exponent part of floats) [9]_
* Ruby (docs say "anywhere", in reality only between digits) [10]_


C++ is in this group too.

The documentation of Perl explicitly says that Perl is in this group too 
(23__500 is not legal). Perhaps there is a bug in Perl implementation. 
And may be Swift is intended to be in this group.


I think we should follow the majority of languages and use simple rule: 
"only between digits".


I have provided an implementation.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-11 Thread Serhiy Storchaka

On 11.02.16 14:14, Georg Brandl wrote:

On 02/11/2016 11:17 AM, Serhiy Storchaka wrote:


**Group 3: only between digits, only one underscore**

* Ada [8]_
* Julia (but not in the exponent part of floats) [9]_
* Ruby (docs say "anywhere", in reality only between digits) [10]_


C++ is in this group too.

The documentation of Perl explicitly says that Perl is in this group too
(23__500 is not legal). Perhaps there is a bug in Perl implementation.
And may be Swift is intended to be in this group.

I think we should follow the majority of languages and use simple rule:
"only between digits".

I have provided an implementation.


Thanks for the alternate patch.  I used the two-function approach you took
in ast.c for my latest revision.

I still think that some cases (like two of the examples in the PEP,
0b__ and 1.5_j) are worth having, and therefore a more relaxed
rule is preferable.


Should I write an alternative PEP for strong rule?


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-11 Thread Serhiy Storchaka

On 11.02.16 19:40, Georg Brandl wrote:

On 02/11/2016 06:19 PM, Serhiy Storchaka wrote:


Thanks for the alternate patch.  I used the two-function approach you took
in ast.c for my latest revision.

I still think that some cases (like two of the examples in the PEP,
0b__ and 1.5_j) are worth having, and therefore a more relaxed
rule is preferable.


Should I write an alternative PEP for strong rule?


That seems excessive for a minor point.  Let's collect feedback for
a few days, and we can also collect some informal votes.


I suspect that my arguments can be lost otherwise.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals

2016-02-11 Thread Serhiy Storchaka

On 11.02.16 10:22, Georg Brandl wrote:

Abstract and Rationale
==

This PEP proposes to extend Python's syntax so that underscores can be used in
integral, floating-point and complex number literals.

This is a common feature of other modern languages, and can aid readability of
long literals, or literals whose value should clearly separate into parts, such
as bytes or words in hexadecimal notation.


I have strong preference for more strict and simpler rule, used by most 
other languages -- "only between two digits". Main arguments:


1. Simple rule is easier to understand, remember and recognize. I care 
not about the complexity of the implementation (there is no large 
difference), but about cognitive complexity.


2. Most languages use this rule. It is better to follow non-formal 
standard that invent the rule that differs from rules in every other 
language. This will help programmers that use multiple languages.


I have provided an alternative patch and can provide an alternative PEP 
if it is needed.



The production list for integer literals would therefore look like this::

integer: decimalinteger | octinteger | hexinteger | bininteger
decimalinteger: nonzerodigit (digit | "_")* | "0" ("0" | "_")*
nonzerodigit: "1"..."9"
digit: "0"..."9"
octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")*


octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)*


hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")*


hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)*


bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")*


bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)*


octdigit: "0"..."7"
hexdigit: digit | "a"..."f" | "A"..."F"
bindigit: "0" | "1"

For floating-point and complex literals::

floatnumber: pointfloat | exponentfloat
pointfloat: [intpart] fraction | intpart "."
exponentfloat: (intpart | pointfloat) exponent
intpart: digit (digit | "_")*


intpart: digit (["_"] digit)*


fraction: "." intpart
exponent: ("e" | "E") ["+" | "-"] intpart
imagnumber: (floatnumber | intpart) ("j" | "J")



**Group 1: liberal**

This group is the least homogeneous: the rules vary slightly between languages.
All of them allow trailing underscores.  Some allow underscores after non-digits
like the ``e`` or the sign in exponents.

* D [2]_
* Perl 5 (underscores basically allowed anywhere, although docs say it's more
   restricted) [3]_
* Rust (allows between exponent sign and digits) [4]_
* Swift (although textual description says "between digits") [5]_

**Group 2: only between digits, multiple consecutive underscores**

* C# (open proposal for 7.0) [6]_
* Java [7]_

**Group 3: only between digits, only one underscore**

* Ada [8]_
* Julia (but not in the exponent part of floats) [9]_
* Ruby (docs say "anywhere", in reality only between digits) [10]_


This classification is misleading. The difference between groups 2 and 3 
is less then between different languages in group 1. To be fair, groups 
2 and 3 should be united in one group. C++ should be included in this 
group. Perl 5 and Swift should be either included in both groups or 
excluded from any group, because they have inconsistencies between the 
documentation and the implementation or between different parts of the 
documentation.


With correct classification it is obvious what variant is the most popular.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Py_SETREF again

2016-02-12 Thread Serhiy Storchaka

Sorry to bringing this up again. I was hoping we were done with that.

When discussing the name of the Py_SETREF macro I was supposed to add a 
pair of macros: for Py_DECREF and Py_XDECREF. But I got a lot of 
opinions to be limited to only one macro.


On 28.02.14 15:58, Kristján Valur Jónsson wrote:
> Also, for the equivalence to hold there is no separate Py_XSETREF, the X
> behaviour is implied, which I favour.  Enough of this X-proliferation
> already!

On 16.12.15 16:53, Random832 wrote:
> I think "SET" names imply that it's safe if the original
> reference is NULL. This isn't an objection to the names, but if
> it is given one of those names I think it should use Py_XDECREF.

It was my initial intension. But then I had got a number of voices for 
single macros.


On 16.12.15 23:16, Victor Stinner wrote:
> I would prefer a single macro to avoid bugs, I don't think that such
> macro has a critical impact on performances. It's more designed for
> safety, no?

On 17.12.15 08:22, Nick Coghlan wrote:
>> 1. Py_SETREF
>
> +1 if it always uses Py_XDECREF on the previous value (as I'd expect
> this to work even if the previous value was NULL)

There was no (besides my) clearly expressed vote for two macros.
As a result I have replaced both Py_DECREF and Py_XDECREF with the macro 
that always uses Py_XDECREF.


Now Raymond, who was not involved in the previous discussions, expressed 
the view that we should to rename Py_SETREF to Py_XSETREF and add new 
Py_SETREF that uses Py_DECREF for using in the code that used Py_DECREF 
previously. [1]


We should discuss the need for this, and may be re-discuss the names for 
the macros.


[1] http://bugs.python.org/issue26200

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Py_SETREF again

2016-02-12 Thread Serhiy Storchaka

On 12.02.16 15:43, Georg Brandl wrote:

On 02/12/2016 10:45 AM, Serhiy Storchaka wrote:

Sorry to bringing this up again. I was hoping we were done with that.

When discussing the name of the Py_SETREF macro I was supposed to add a
pair of macros: for Py_DECREF and Py_XDECREF. But I got a lot of
opinions to be limited to only one macro.

There was no (besides my) clearly expressed vote for two macros.


I would have voted in favor.

Spelling the SETREF out, as Nick proposes, kind of defies the purpose of
the macro: it's not strictly a convenience macro, it helps prevent
refcounting bugs.


As a result I have replaced both Py_DECREF and Py_XDECREF with the macro
that always uses Py_XDECREF.


Can you roughly say which fraction of replacements changed DECREF to an
implicit XDECREF?


Changesets c4e8751ce637, bc7c56a225de, 539ba7267701, b02d256b8827, 
1118dfcbcc35. Rough estimation:


Py_DECREF - 62
Py_XDECREF - 57
Py_CLEAR - 46

Total statistic of using macros in current code:

Py_SETREF174   2.5%
Py_CLEAR 781  11%
Py_XDECREF  1443  20.5%
Py_DECREF   4631  66%


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 515: Underscores in Numeric Literals (revision 3)

2016-02-13 Thread Serhiy Storchaka

On 13.02.16 10:48, Georg Brandl wrote:

Following the same rules for placement, underscores will be allowed in
the following constructors:

- ``int()`` (with any base)
- ``float()``
- ``complex()``
- ``Decimal()``


What about float.fromhex()? Should underscores be allowed in it (I think 
no)?



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] pickle and copy discrepancy

2016-03-01 Thread Serhiy Storchaka
The pickle and the copy modules use the same protocol. The reconstruct 
the object by data returned by the __reduce_ex__/__reduce__ method, but 
does it in different and incompatible way.


In general case the result of __reduce__ includes:

1) The class of the object and arguments to __new__().
2) The state passed to __setstate__() (or a dict of attributes and 
possible a tuple of __slots__ values).
3) An iterator of list items that should be appended to the object by 
calling extend() or append().
4) An iterator of key-value value pairs that should be set in the object 
by calling update() or __setitem__().


The difference is that the copy module sets object's state before adding 
items and key-value pairs, but the pickle module sets object's state 
after adding items and key-value pairs. If append() or __setitem__() 
depend on the state of the object, the pickling is incompatible with the 
copying.


The behaviour of copy was changed in issue1100562 [1] (see also 
issue1099746 [2]). But this caused a problem with other classes (see 
issue10131 [3]). Changing either pickle or copy for sure will break 
existing code. But keeping current discrepancy makes existing code not 
correct and makes too hard to write correct code that works with both 
pickle and copy. For sure most existing code for which it is matter is 
not correct.


The behaviour of default reducing method for dicts/lists subclasses is 
not documented [4].


We should choose in what direction we have to break backward 
compatibility. The behaviour of the copy module looks more natural. It 
allows to work correctly most naive implementations (as in [2]). The 
pickle module is more used and breaking it can cause more harm. But the 
order of object reconstruction is determined at pickling time, thus 
already pickled data will be reconstructed with old order. The change 
will only affect new pickles.


[1] http://bugs.python.org/issue1100562
[2] http://bugs.python.org/issue1099746
[3] http://bugs.python.org/issue10131
[4] http://bugs.python.org/issue4712

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pickle and copy discrepancy

2016-03-02 Thread Serhiy Storchaka

On 01.03.16 18:34, Ethan Furman wrote:

On 03/01/2016 03:14 AM, Serhiy Storchaka wrote:

The difference is that the copy module sets object's state before adding
items and key-value pairs, but the pickle module sets object's state
after adding items and key-value pairs. If append() or __setitem__()
depend on the state of the object, the pickling is incompatible with the
copying.


Aren't there tests to ensure the unpickled/copied object are identical
to the original object?


We have no pickle/copy tests for every class. And of course we can't 
test third-party classes. But even if write tests and they will fail, 
what to do? The problem is that for some classes pickle and copy 
contradict. An implementation that works with copy doesn't work with 
pickle or vice verse.



Under which circumstances would they be different?


If append() or __setitem__() depend on the state or change the state. 
See examples in issue1099746 and issue10131.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What does a double coding cookie mean?

2016-03-15 Thread Serhiy Storchaka

On 15.03.16 22:30, Guido van Rossum wrote:

I came across a file that had two different coding cookies -- one on
the first line and one on the second. CPython uses the first, but mypy
happens to use the second. I couldn't find anything in the spec or
docs ruling out the second interpretation. Does anyone have a
suggestion (apart from following CPython)?

Reference: https://github.com/python/mypy/issues/1281


There is similar question. If a file has two different coding cookies on 
the same line, what should win? Currently the last cookie wins, in 
CPython parser, in the tokenize module, in IDLE, and in number of other 
code. I think this is a bug.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What does a double coding cookie mean?

2016-03-16 Thread Serhiy Storchaka

On 16.03.16 08:34, Glenn Linderman wrote:

 From the PEP 263:


More precisely, the first or second line must match the regular
expression "coding[:=]\s*([-\w.]+)". The first group of this
expression is then interpreted as encoding name. If the encoding
is unknown to Python, an error is raised during compilation. There
must not be any Python statement on the line that contains the
encoding declaration.


Clearly the regular expression would only match the first of multiple
cookies on the same line, so the first one should always win... but
there should only be one, from the first PEP quote "a magic comment".


"The first group of this expression" means the first regular expression 
group. Only the part between parenthesis "([-\w.]+)" is interpreted as 
encoding name, not all expression.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What does a double coding cookie mean?

2016-03-16 Thread Serhiy Storchaka

On 16.03.16 02:28, Guido van Rossum wrote:

I agree that the spirit of the PEP is to stop at the first coding
cookie found. Would it be okay if I updated the PEP to clarify this?
I'll definitely also update the docs.


Could you please also update the regular expression in PEP 263 to
"^[ \t\v]*#.*?coding[:=][ \t]*([-.a-zA-Z0-9]+)"?

Coding cookie must be in comment, only the first occurrence in the line 
must be taken to account (here is a bug in CPython), encoding name must 
be ASCII, and there must not be any Python statement on the line that 
contains the encoding declaration. [1]


[1] https://bugs.python.org/issue18873

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What does a double coding cookie mean?

2016-03-16 Thread Serhiy Storchaka

On 16.03.16 09:46, Glenn Linderman wrote:

On 3/16/2016 12:09 AM, Serhiy Storchaka wrote:

On 16.03.16 08:34, Glenn Linderman wrote:

 From the PEP 263:


More precisely, the first or second line must match the regular
expression "coding[:=]\s*([-\w.]+)". The first group of this
expression is then interpreted as encoding name. If the encoding
is unknown to Python, an error is raised during compilation. There
must not be any Python statement on the line that contains the
encoding declaration.


Clearly the regular expression would only match the first of multiple
cookies on the same line, so the first one should always win... but
there should only be one, from the first PEP quote "a magic comment".


"The first group of this expression" means the first regular
expression group. Only the part between parenthesis "([-\w.]+)" is
interpreted as encoding name, not all expression.


Sure.  But there is no mention anywhere in the PEP of more than one
being legal: just more than one position for it, EITHER line 1 or line
2. So while the regular expression mentioned is not anchored, to allow
variation in syntax between emacs and vim, "must match the regular
expression" doesn't imply "several times", and when searching for a
regular expression that might not be anchored, one typically expects to
find the first.


Actually "must match the regular expression" is not correct, because 
re.match() implies anchoring at the start. I have proposed more correct 
regular expression in other branch of this thread.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What does a double coding cookie mean?

2016-03-18 Thread Serhiy Storchaka

On 17.03.16 16:55, Guido van Rossum wrote:

On Thu, Mar 17, 2016 at 5:04 AM, Serhiy Storchaka  wrote:

Should we recommend that everyone use tokenize.detect_encoding()?


Likely. However the interface of tokenize.detect_encoding() is not very
simple.


I just found that out yesterday. You have to give it a readline()
function, which is cumbersome if all you have is a (byte) string and
you don't want to split it on lines just yet. And the readline()
function raises SyntaxError when the encoding isn't right. I wish
there were a lower-level helper that just took a line and told you
what the encoding in it was, if any. Then the rest of the logic can be
handled by the caller (including the logic of trying up to two lines).


The simplest way to detect encoding of bytes string:

lines = data.splitlines()
encoding = tokenize.detect_encoding(iter(lines).__next__)[0]

If you don't want to split all data on lines, the most efficient way in 
Python 3.5 is:


encoding = tokenize.detect_encoding(io.BytesIO(data).readline)[0]

In Python 3.5 io.BytesIO(data) has constant complexity.

In older versions for detecting encoding without copying data or 
splitting all data on lines you should write line iterator. For example:


def iterlines(data):
start = 0
while True:
end = data.find(b'\n', start) + 1
if not end:
break
yield data[start:end]
start = end
yield data[start:]

encoding = tokenize.detect_encoding(iterlines(data).__next__)[0]

or

it = (m.group() for m in re.finditer(b'.*\n?', data))
encoding = tokenize.detect_encoding(it.__next__)

I don't know what approach is more efficient.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What does a double coding cookie mean?

2016-03-19 Thread Serhiy Storchaka

On 17.03.16 19:23, M.-A. Lemburg wrote:

On 17.03.2016 15:02, Serhiy Storchaka wrote:

On 17.03.16 15:14, M.-A. Lemburg wrote:

On 17.03.2016 01:29, Guido van Rossum wrote:

Should we recommend that everyone use tokenize.detect_encoding()?


I'd prefer a separate utility for this somewhere, since
tokenize.detect_encoding() is not available in Python 2.

I've attached an example implementation with tests, which works
in Python 2.7 and 3.


Sorry, but this code doesn't match the behaviour of Python interpreter,
nor other tools. I suggest to backport tokenize.detect_encoding() (but
be aware that the default encoding in Python 2 is ASCII, not UTF-8).


Yes, I got the default for Python 3 wrong. I'll fix that. Thanks
for the note.

What other aspects are different than what Python implements ?


1. If there is a BOM and coding cookie, the source encoding is "utf-8-sig".

2. If there is a BOM and coding cookie is not 'utf-8', this is an error.

3. If the first line is not blank or comment line, the coding cookie is 
not searched in the second line.


4. Encoding name should be canonized. "UTF8", "utf8", "utf_8" and 
"utf-8" is the same encoding (and all are changed to "utf-8-sig" with BOM).


5. There isn't the limit of 400 bytes. Actually there is a bug with 
handling long lines in current code, but even with this bug the limit is 
larger.


6. I made a mistake in the regular expression, missed the underscore.

tokenize.detect_encoding() is the closest imitation of the behavior of 
Python interpreter.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What does a double coding cookie mean?

2016-03-19 Thread Serhiy Storchaka

On 17.03.16 02:29, Guido van Rossum wrote:

I've updated the PEP. Please review. I decided not to update the
Unicode howto (the thing is too obscure). Serhiy, you're probably in a
better position to fix the code looking for cookies to pick the first
one if there are two on the same line (or do whatever you think should
be done there).


http://bugs.python.org/issue26581


Should we recommend that everyone use tokenize.detect_encoding()?


Likely. However the interface of tokenize.detect_encoding() is not very 
simple.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What does a double coding cookie mean?

2016-03-19 Thread Serhiy Storchaka

On 17.03.16 21:11, Guido van Rossum wrote:

This will raise SyntaxError if the encoding is unknown. That needs to
be caught in mypy's case and then it needs to get the line number from
the exception.


Good point. "lineno" and "offset" attributes of SyntaxError is set to 
None by tokenize.detect_encoding() and to 0 by CPython interpreter. They 
should be set to useful values.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What does a double coding cookie mean?

2016-03-19 Thread Serhiy Storchaka

On 17.03.16 21:11, Guido van Rossum wrote:

I tried this and it was too painful, so now I've just
changed the regex that mypy uses to use non-eager matching
(https://github.com/python/mypy/commit/b291998a46d580df412ed28af1ba1658446b9fe5).


\s* matches newlines.

{0,1}? is the same as ??.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What does a double coding cookie mean?

2016-03-19 Thread Serhiy Storchaka

On 16.03.16 08:03, Serhiy Storchaka wrote:

On 15.03.16 22:30, Guido van Rossum wrote:

I came across a file that had two different coding cookies -- one on
the first line and one on the second. CPython uses the first, but mypy
happens to use the second. I couldn't find anything in the spec or
docs ruling out the second interpretation. Does anyone have a
suggestion (apart from following CPython)?

Reference: https://github.com/python/mypy/issues/1281


There is similar question. If a file has two different coding cookies on
the same line, what should win? Currently the last cookie wins, in
CPython parser, in the tokenize module, in IDLE, and in number of other
code. I think this is a bug.


I just tested with Emacs, and it looks that when specify different 
codings on two different lines, the first coding wins, but when specify 
different codings on the same line, the last coding wins.


Therefore current CPython behavior can be correct, and the regular 
expression in PEP 263 should be changed to use greedy repetition.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What does a double coding cookie mean?

2016-03-19 Thread Serhiy Storchaka

On 19.03.16 19:36, Glenn Linderman wrote:

On 3/19/2016 8:19 AM, Serhiy Storchaka wrote:

On 16.03.16 08:03, Serhiy Storchaka wrote:
I just tested with Emacs, and it looks that when specify different
codings on two different lines, the first coding wins, but when
specify different codings on the same line, the last coding wins.

Therefore current CPython behavior can be correct, and the regular
expression in PEP 263 should be changed to use greedy repetition.


Just because emacs works that way (and even though I'm an emacs user),
that doesn't mean CPython should act like emacs.


Yes. But current CPython works that way. The behavior of Emacs is the 
argument that maybe this is not a bug.



(4) there is no benefit to specifying the coding twice on a line, it
only adds confusion, whether in CPython, emacs, or vim.
(4a) Here's an untested line that emacs would interpret as utf-8, and
CPython with the greedy regulare expression would interpret as latin-1,
because emacs looks only between the -*- pair, and CPython ignores that.
   # -*- coding: utf-8 -*- this file does not use coding: latin-1


Since Emacs allows to specify the coding twice on a line, and this can 
be ambiguous, and CPython already detects some ambiguous situations 
(UTF-8 BOM and non-UTF-8 coding cookie), it may be worth to add a check 
that the coding is specified only once on a line.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What does a double coding cookie mean?

2016-03-19 Thread Serhiy Storchaka

On 17.03.16 15:14, M.-A. Lemburg wrote:

On 17.03.2016 01:29, Guido van Rossum wrote:

Should we recommend that everyone use tokenize.detect_encoding()?


I'd prefer a separate utility for this somewhere, since
tokenize.detect_encoding() is not available in Python 2.

I've attached an example implementation with tests, which works
in Python 2.7 and 3.


Sorry, but this code doesn't match the behaviour of Python interpreter, 
nor other tools. I suggest to backport tokenize.detect_encoding() (but 
be aware that the default encoding in Python 2 is ASCII, not UTF-8).



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: hashtable.h now supports keys of any size

2016-03-23 Thread Serhiy Storchaka

On 21.03.16 23:01, victor.stinner wrote:

https://hg.python.org/cpython/rev/aca4e9af1ca6
changeset:   100640:aca4e9af1ca6
user:Victor Stinner 
date:Mon Mar 21 22:00:58 2016 +0100
summary:
   hashtable.h now supports keys of any size

Issue #26588: hashtable.h now supports keys of any size, not only
sizeof(void*). It allows to support key larger than sizeof(void*), but also to
use less memory for key smaller than sizeof(void*).


If key size is compile time constant, Py_MEMCPY() and memcpy() can be 
optimized in one machine instruction. If it is ht->key_size, it adds 
more overhead. These changes can have negative performance effect.


It can be eliminated if pass a compile time constant to 
_Py_HASHTABLE_ENTRY_READ_KEY() etc.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: hashtable.h now supports keys of any size

2016-03-23 Thread Serhiy Storchaka

On 23.03.16 19:37, Serhiy Storchaka wrote:

On 21.03.16 23:01, victor.stinner wrote:

https://hg.python.org/cpython/rev/aca4e9af1ca6
changeset:   100640:aca4e9af1ca6
user:Victor Stinner 
date:Mon Mar 21 22:00:58 2016 +0100
summary:
   hashtable.h now supports keys of any size

Issue #26588: hashtable.h now supports keys of any size, not only
sizeof(void*). It allows to support key larger than sizeof(void*), but
also to
use less memory for key smaller than sizeof(void*).


If key size is compile time constant, Py_MEMCPY() and memcpy() can be
optimized in one machine instruction. If it is ht->key_size, it adds
more overhead. These changes can have negative performance effect.

It can be eliminated if pass a compile time constant to
_Py_HASHTABLE_ENTRY_READ_KEY() etc.



Please ignore this message. It was sent to Python-Dev by mistake.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Not receiving bug tracker emails

2016-03-29 Thread Serhiy Storchaka

On 30.03.16 03:23, Victor Stinner wrote:

same for me, i'm using using gmail with a @gmail.com email.

Victor

2016-03-30 1:30 GMT+02:00 Martin Panter :

For the last ~36 hours I have stopped receiving emails for messages
posted in the bug tracker. Is anyone else having this problem? Has
anything changed recently?


Same for me.

This is very sad, because some comments can be unnoticed and some 
patches can be unreviewed.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The next major Python version will be Python 8

2016-03-31 Thread Serhiy Storchaka

On 01.04.16 00:40, Victor Stinner wrote:

The PSF is happy to announce that the new Python release will be
Python 8!


Does it combine the base of Python 2 with the power of Python 3?


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Py_SETREF vs. Py_XSETREF

2016-04-03 Thread Serhiy Storchaka
Originally I proposed a pair of macros for safe reference replacing to 
reflects the duality of Py_DECREF/Py_XDECREF. [1], [2]  The one should 
use Py_DECREF and the other should use Py_XDECREF.


But then I got a number of voices for the single name [3], and no one 
voice (except mine) for the pair of names. Thus in final patches the 
single name Py_SETREF that uses Py_XDECREF is used. Due to adding some 
overhead in comparison with using Py_DECREF, this macros is not used in 
critical performance code such as PyDict_SetItem().


Now Raymond says that we should have separate Py_SETREF/Py_XSETREF names 
to avoid any overhead. [4]  And so I'm raising this issue on Python-Dev.


Should we rename Py_SETREF to Py_XSETREF and introduce new Py_SETREF 
that uses Py_DECREF?


[1] http://comments.gmane.org/gmane.comp.python.devel/145346
[2] http://comments.gmane.org/gmane.comp.python.devel/145974
[3] http://bugs.python.org/issue26200#msg259784
[4] http://bugs.python.org/issue26200

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When should pathlib stop being provisional?

2016-04-05 Thread Serhiy Storchaka

On 06.04.16 01:41, Brett Cannon wrote:

After a rather extensive discussion on python-ideas about
pathlib.PurePath not inheriting from str, another point that came up was
that the use of pathlib has been rather light. Unfortunately even the
stdlib doesn't really use pathlib because it's currently marked as
provisional (or at least that's why I haven't tried to use it where
possible in importlib).

Do we have a plan of what is required to remove the provisional label
from pathlib?


The behavior of the Path.resolve() method likely should be changed with 
breaking backward compatibility. There is an open issue about this.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When should pathlib stop being provisional?

2016-04-05 Thread Serhiy Storchaka

On 06.04.16 05:44, Nick Coghlan wrote:

The next challenge would then be to make a list of APIs to be updated
for 3.6 to implicitly accept "rich path" objects via the agreed
convention, with pathlib.PurePath used as a test class:

* open()
* codecs.open() (et al)
* io.*
* os.path.*
* other os functions
* shutil.*
* tempfile.*
* shelve.*
* csv.*


Not sure about os.path.*. The purpose of os.path module is manipulating 
string paths. From the perspective of pathlib it can look lower level.


Supporting pathlib.Path will complicate and slow down os.path functions 
(they are already more complex and slow than were in Python 2). Since 
os.path functions often called several times in a loop, their 
performance is important. On other hand, some Path methods are more 
efficient than os.path functions, and Path specialized code at higher 
level can be more preferable.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When should pathlib stop being provisional?

2016-04-05 Thread Serhiy Storchaka

On 06.04.16 05:44, Nick Coghlan wrote:

The most promising option for that is probably "getattr(path, 'path',
path)", since the "path" attribute is being added to pathlib, and the
given idiom can be readily adopted in Python 2/3 compatible code
(since normal strings and any other object without a "path" attribute
are passed through unchanged). Alternatively, since it's a protocol,
double-underscores on the property name may be appropriate (i.e.
"getattr(path, '__path__', path)")


This was already discussed. Current conclusion is using the "path" 
attribute. See http://bugs.python.org/issue22570 .



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When should pathlib stop being provisional?

2016-04-05 Thread Serhiy Storchaka

On 06.04.16 08:52, Greg Ewing wrote:

Nick Coghlan wrote:

The most promising option for that is probably "getattr(path, 'path',
path)",


Is there something seriously wrong with str(path)?


What if path is None or bytes?

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Question about the current implementation of str

2016-04-09 Thread Serhiy Storchaka

On 09.04.16 10:52, Victor Stinner wrote:

Le 9 avr. 2016 03:04, "Larry Hastings" mailto:la...@hastings.org>> a écrit :
 > Although the str object is immutable from Python's perspective, the C
object itself is mutable.  For example, for dynamically-created strings
the hash field may be lazy-computed and cached inside the object.

Yes, the hash is computed once on demand. It doesn't matter how you
build the string.

 > I was wondering if there were other fields like this.  For example,
are there similar lazy-computed cached objects for the different encoded
versions (utf8 utf16) of the str?

Cached utf8 is only cached when you call the C functions filling this
cache. The Python str.encode('utf8') doesn't fill the cache, but it uses it.

On Windows, there is a cache for wchar_t* which is utf16. This format is
used by all C functions of the Windows API (Python should only use the
Unicode flavor of the Windows API).

I don't recall other caches.

 > What would really help an exhaustive list of the fields of a str
object that may ever change after the object's initial creation.

I don't recall exactly what happens if a cache is created and then the
string is modified. If I recall correctly, the cache is invalidated.


You must remember, some bugs with desynchronized utf8 and wchar_t* 
caches were fixed just few months ago.



But the hash is used as an heuristic to decide if a string is
"immutable" or not, the refcount is also used by the heuristic. If the
string is immutable, an operation like resize must create a new string.

You can document the PEP 393 in Include/unicodeobject.h.


In normal case the string object can be mutated only at creation time. 
But CPython uses some tricks that modifies already created strings if 
they have no external references and are not interned. For example "a += 
b" or "a = a + b" can resize the "a" string.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-10 Thread Serhiy Storchaka

On 10.04.16 19:51, Jon Ribbens wrote:

On Sun, Apr 10, 2016 at 02:51:23PM +1000, Nick Coghlan wrote:

On 9 April 2016 at 22:43, Victor Stinner  wrote:

See pysandbox test suite for a lot of ways to escape a sandbox. CPython has
a list of know code to crash CPython (I don't recall the dieectory in
sources), even with the latest version of CPython.


They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers


Thanks. I take your point that sandboxing Python requires CPython to
free of code execution bugs. However I will note that none of the
crashers in that directory will work inside my experiment (except
"infinite_loop_re.py", which isn't a crasher just a long loop).


Try following example:

it = iter([1])
for i in range(100):
it = filter(None, it)
next(it)


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-11 Thread Serhiy Storchaka

On 11.04.16 00:53, Jon Ribbens wrote:

Try following example:

 it = iter([1])
 for i in range(100):
 it = filter(None, it)
 next(it)


That does indeed segfault. I guess you should report that as a bug!


There is old issue that doesn't have adequate solution. And this is only 
one example, you can get segfault with other recursive iterators.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython (2.7): Issue #25910: Fixed more links in the docs.

2016-04-11 Thread Serhiy Storchaka

On 11.04.16 17:41, Tim Golden wrote:

On 11/04/2016 15:38, serhiy.storchaka wrote:

-  `__.
+  `__.


Is there any intended irony in our link to openssl not being via https?

:)


http://bugs.python.org/issue26736


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Bytes path

2016-04-14 Thread Serhiy Storchaka

What types should be accepted as bytes path?

For now os.path is strict and accepts only bytes and bytes subclasses 
(even bytearray is not accepted) as bytes path. This is enough for 
working with low-level Posix paths and supporting backward compatibility.


On other hand, most os functions is too permissive since 3.3 and accept 
any type that supports the buffer protocol as bytes path. Accepted even 
such meaningless objects as array('h').


Some functions (zipimport.zipimporter() in 3.x, _imp.load_dynamic() in 
3.3+, builtin compile() etc in 3.4) accept even arbitrary iterables, 
e.g. [116, 101, 115, 116] (see http://bugs.python.org/issue26754).


I think we should accept only bytes (and subclasses). Even bytearray is 
less acceptable since it is mutable and can't be used as a key in caches.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Not receiving bug tracker emails

2016-04-14 Thread Serhiy Storchaka

On 13.04.16 07:39, Terry Reedy wrote:

On 4/4/2016 5:05 PM, Terry Reedy wrote:

Since a few days, I am getting bug tracker emails again, in my Inbox.  I
just got a Rietveld review in the Inbox and I believe it went there
directly instead of first to Junk.  Thank you to whoever made the
improvements.


AFAIK David just disabled IPv6 support.

Most bug tracker emails still went in the Spam folder. I have a filter 
for Roundap emails, but there is no any mark that I can use for 
filtering Rietveld emails.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Most 3.x buildbots are green again, please don't break them and watch them!

2016-04-14 Thread Serhiy Storchaka

On 13.04.16 14:40, Victor Stinner wrote:

Last months, most 3.x buildbots failed randomly. Some of them were
always failing. I spent some time to fix almost all Windows and Linux
buildbots. There were a lot of different issues.


Excelent! Many thanks for doing this. And new features of regrtest look 
nice.



So please try to not break buildbots again and remind to watch them sometimes:

   
http://buildbot.python.org/all/waterfall?category=3.x.stable&category=3.x.unstable


A desirable but nonexistent feature is to write emails to authors of 
commits that broke buildbots. How hard to implement this?



Next weeks, I will try to backport some fixes to Python 3.5 (if
needed) to make these buildbots more stable too.

Python 2.7 buildbots are also in a sad state (ex: test_marshal
segfaults on Windows, see issue #25264). But it's not easy to get a
Windows with the right compiler to develop on Python 2.7 on Windows.


What are you think about backporting recent regrtest to 2.7? Most needed 
features to me are the -m and -G options.



Maybe it's time to move more 3.x buildbots to the "stable" category?
http://buildbot.python.org/all/waterfall?category=3.x.stable


+1


By the way, I don't understand why "AMD64 OpenIndiana 3.x" is
considered as stable since it's failing with multiple issues since
many months and nobody is working on these failures. I suggest to move
this buildbot back to the unstable category.


I think the main cause is the lack of memory in this buildbot. I tried 
to minimize memory consumption and leaks, but some leaks are left, and 
they provoke other tests failures, and additional resource leaks. Would 
be nice to add a feature for running every test in separate subprocess. 
This will isolate the effect of failed tests.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Not receiving bug tracker emails

2016-04-14 Thread Serhiy Storchaka

On 14.04.16 13:33, Martin Panter wrote:

On 14 April 2016 at 08:51, Serhiy Storchaka  wrote:

Most bug tracker emails still went in the Spam folder. I have a filter for
Roundap emails, but there is no any mark that I can use for filtering
Rietveld emails.


FWIW I set up the following filter in Gmail for Rietveld reviews:

Matches: http://bugs.python.org/review
Do this: Never send it to Spam

I suspect it helps, but occasionally I think stuff still goes to spam.
(Just don’t tell this secret rule to actual spammers :)


Thank you and Victor for this advise.

But this filter is not quite robust, for example it will cause this mail 
to be moved to the folder for Rietveld reviews.


I was going to try a different approach, append "+py" to my address for 
the tracker, as in your address.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 8 updated on whether to break before or after a binary update

2016-04-15 Thread Serhiy Storchaka

On 15.04.16 20:03, Victor Stinner wrote:

Hum.

 if (width == 0
 and height == 0
 and color == 'red'
 and emphasis == 'strong'
 or highlight > 100):
 raise ValueError("sorry, you lose")

Please remove one space to vertically align "and" operators with the
opening parenthesis:

 if (width == 0
and height == 0
and color == 'red'
and emphasis == 'strong'
or highlight > 100):
 raise ValueError("sorry, you lose")


I would rather *add* spaces to wrapped condition lines.

if (width == 0
and height == 0
and color == 'red'
and emphasis == 'strong'
or highlight > 100):
raise ValueError("sorry, you lose")


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-23 Thread Serhiy Storchaka

On 13.04.16 19:33, Guido van Rossum wrote:

Nice work. I think that for CPython, speed is much more important than
memory use for the code. Disk space is practically free for anything
smaller than a video. :-)


I collected statistics for use opcodes with different arguments during 
running CPython tests. Estimated size with using wordcode is 1.33 times 
less than with using current bytecode.


[1] http://comments.gmane.org/gmane.comp.python.ideas/38293

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Inconsistency of PyModule_AddObject()

2016-04-27 Thread Serhiy Storchaka
There are three functions (or at least three documented functions) in C 
API that "steals" references: PyList_SetItem(), PyTuple_SetItem() and 
PyModule_AddObject(). The first two "steals" references even on failure, 
and this is well known behaviour. But PyModule_AddObject() "steals" a 
reference only on success. There is nothing in the documentation that 
points on this. Most usages of PyModule_AddObject() in the stdlib don't 
decref the reference to the value on PyModule_AddObject() failure. The 
only exceptions are in _json, _io, and _tkinter modules. In many cases, 
including examples in the documentation, the successfulness of 
PyModule_AddObject() is not checked either, but this is different issue.


We can just fix the documentation but adding a note that 
PyModule_AddObject() doesn't steal a reference on failure. And add 
explicit decrefs after PyModule_AddObject() in hundreds of places in the 
code.


But I think it would be better to "fix" PyModule_AddObject() by making 
it decrefing a reference on failure as expected by most developers. But 
this is dangerous change, because if the author of third-party code read 
not only the documentation, but CPython code, and added explicit decref 
on PyModule_AddObject() failure, we will get a double decrefing.


I think that we can resolve this issue by following steps:

1. Add a new function PyModule_AddObject2(), that steals a reference 
even on failure.


2. Introduce a special macro like PY_SSIZE_T_CLEAN (any suggestions 
about a name?). If it is defined, define PyModule_AddObject as 
PyModule_AddObject2. Define this macro before including Python.h in all 
CPython modules except _json, _io, and _tkinter.


3. Make old PyModule_AddObject to emit a warning about possible leak and 
a suggestion to define above macro.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistency of PyModule_AddObject()

2016-04-27 Thread Serhiy Storchaka

On 27.04.16 10:14, Serhiy Storchaka wrote:

There are three functions (or at least three documented functions) in C
API that "steals" references: PyList_SetItem(), PyTuple_SetItem() and
PyModule_AddObject(). The first two "steals" references even on failure,
and this is well known behaviour. But PyModule_AddObject() "steals" a
reference only on success. There is nothing in the documentation that
points on this. Most usages of PyModule_AddObject() in the stdlib don't
decref the reference to the value on PyModule_AddObject() failure. The
only exceptions are in _json, _io, and _tkinter modules. In many cases,
including examples in the documentation, the successfulness of
PyModule_AddObject() is not checked either, but this is different issue.

We can just fix the documentation but adding a note that
PyModule_AddObject() doesn't steal a reference on failure. And add
explicit decrefs after PyModule_AddObject() in hundreds of places in the
code.

But I think it would be better to "fix" PyModule_AddObject() by making
it decrefing a reference on failure as expected by most developers. But
this is dangerous change, because if the author of third-party code read
not only the documentation, but CPython code, and added explicit decref
on PyModule_AddObject() failure, we will get a double decrefing.

I think that we can resolve this issue by following steps:

1. Add a new function PyModule_AddObject2(), that steals a reference
even on failure.

2. Introduce a special macro like PY_SSIZE_T_CLEAN (any suggestions
about a name?). If it is defined, define PyModule_AddObject as
PyModule_AddObject2. Define this macro before including Python.h in all
CPython modules except _json, _io, and _tkinter.

3. Make old PyModule_AddObject to emit a warning about possible leak and
a suggestion to define above macro.


Opened an issue: http://bugs.python.org/issue26871 .

Provided patch introduces new macros PY_MODULE_ADDOBJECT_CLEAN that 
controls the behavior of PyModule_AddObject() as PY_SSIZE_T_CLEAN 
controls the behavior of PyArg_Parse* functions. If the macro is defined 
before including "Python.h", PyModule_AddObject() steals a reference 
unconditionally.  Otherwise it steals a reference only on success, and 
the caller is responsible for decref'ing it on error (current behavior).



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistency of PyModule_AddObject()

2016-04-27 Thread Serhiy Storchaka

On 27.04.16 15:31, Hrvoje Niksic wrote:

On 04/27/2016 09:14 AM, Serhiy Storchaka wrote:

There are three functions (or at least three documented functions) in C
API that "steals" references: PyList_SetItem(), PyTuple_SetItem() and
PyModule_AddObject(). The first two "steals" references even on failure,
and this is well known behaviour. But PyModule_AddObject() "steals" a
reference only on success. There is nothing in the documentation that
points on this.


This inconsistency has caused bugs (or, more fairly, potential leaks)
before, see http://bugs.python.org/issue1782


Glad to hear I'm not the first faced with this problem.


Unfortunately, the suggested Python 3 change to PyModule_AddObject was
not accepted.


Bad. May be it happened because of the risk to break third-party working 
code.


I propose a gradual path to change PyModule_AddObject.


1. Add a new function PyModule_AddObject2(), that steals a reference
even on failure.


This sounds like a good idea, except the name could be prettier :), e.g.
PyModule_InsertObject. PyModule_AddObject could be deprecated.


I have decided to not introduce new public function. But just control 
the behavior of old function with the macro. This needs minimal changes 
to user code.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistency of PyModule_AddObject()

2016-04-27 Thread Serhiy Storchaka

On 27.04.16 16:08, Nick Coghlan wrote:

On 27 April 2016 at 17:14, Serhiy Storchaka  wrote:

I think that we can resolve this issue by following steps:

1. Add a new function PyModule_AddObject2(), that steals a reference even on
failure.


I'd suggest a variant on this that more closely matches the
PyList_SetItem and PyTuple_SetItem cases: PyModule_SetAttrString

The first two match the signature of PySequence_SetItem, but steal the
reference instead of making a new one, and the same relationship would
exist between PyObject_SetAttrString and the new
PyModule_SetAttrString.


I think it is better to have relation with PyModule_AddIntConstant() etc 
than with PyObject_SetAttrString.


My patch doesn't introduce new public function, but changes the behavior 
of the old function. This needs minimal changes to user code that mostly 
use PyModule_AddObject() incorrectly (not blaming authors).



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistency of PyModule_AddObject()

2016-04-28 Thread Serhiy Storchaka

On 28.04.16 01:24, Case Van Horsen wrote:

On Wed, Apr 27, 2016 at 11:06 AM, Serhiy Storchaka  wrote:

I think it is better to have relation with PyModule_AddIntConstant() etc
than with PyObject_SetAttrString.

My patch doesn't introduce new public function, but changes the behavior of
the old function. This needs minimal changes to user code that mostly use
PyModule_AddObject() incorrectly (not blaming authors).


How will this impact code that uses PyModule_AddObject() correctly?


No impact except emitting a deprecation warning at build time. But we 
can remove a deprecation warning and add it in future release if this is 
annoying.


But are you sure, that your code uses PyModule_AddObject() correctly? 
Only two modules in the stdlib (_json and _tkinter) used it correctly. 
Other modules have bugs even in tries to use PyModule_AddObject() 
correctly for some operations.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistency of PyModule_AddObject()

2016-04-28 Thread Serhiy Storchaka

On 28.04.16 11:38, Stefan Krah wrote:

Serhiy Storchaka  gmail.com> writes:

No impact except emitting a deprecation warning at build time. But we
can remove a deprecation warning and add it in future release if this is
annoying.

But are you sure, that your code uses PyModule_AddObject() correctly?
Only two modules in the stdlib (_json and _tkinter) used it correctly.
Other modules have bugs even in tries to use PyModule_AddObject()
correctly for some operations.


Could you perhaps stop labeling this as a bug? Usually we are talking
about a *single* "leak" that a) does not even show up in Valgrind and
b) only occurs under severe memory pressure when the OOM-killer is
already waiting.


I'm honestly mystified by your terminology and it's beginning to feel
that you need to justify this patch at all costs.


I say this is a bug because

1. PyModule_AddObject() behavior doesn't match the documentation.

2. Most code that use PyModule_AddObject() doesn't work as intended. 
Since the bahavior of PyModule_AddObject() contradicts the documentation 
and is contrintuitive, we can't blame authors in this.


I don't say this is a high-impacting bug, I even agree that there is no 
need to fix the second part in maintained releases. But this is a bug 
unless you propose different definition for a bug.


What can we do with this?

1. Change the documentation of PyModule_AddObject(). I think this is not 
questionable, and Berker provided a patch in

http://bugs.python.org/issue26868 .

2. Update examples in the documentation to correctly handle errors of 
PyModule_AddObject(). This is more questionable, due to the case (3c) 
below and because correct error handling code distracts attention from 
main purpose of examples.


3. One of alternatives:

3a) Fix almost all usages of PyModule_AddObject() in stdlib extension 
modules. This is hundreds occurrences in over a half-hundred files.


3b) Allow to change the behavior of PyModule_AddObject() to match most 
authors expectations. This needs to add only one line to switch on new 
behavior in most files.


3c) Ignore issue. In this case we can not check the result of 
PyModule_AddObject() at all. But I afraid that correct fixing issues 
with subinterpreters will need us to return to this issue.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Return type of alternative constructors

2016-05-07 Thread Serhiy Storchaka
Some types have alternative constructors -- class methods used to create 
an instance of the class. For example: int.from_bytes(), 
float.fromhex(), dict.fromkeys(), Decimal.from_float().


But what should return these methods for subclasses? Should they return 
an instance of base class or an instance of subclass? Almost all 
alternative constructors return an instance of subclass (exceptions are 
new in 3.6 bytes.fromhex() and bytearray.fromhex() that return bare 
bytes and bytearray). But there is a problem, because this allows to 
break invariants provided by the main constructor.


For example, there are only two instances of the bool class: False and 
True. But with the from_bytes() method inherited from int you can create 
new boolean values!


   >>> Confusion = bool.from_bytes(b'\2', 'big')
   >>> isinstance(Confusion, bool)
   True
   >>> Confusion == True
   False
   >>> bool(Confusion)
   True
   >>> Confusion
   False
   >>> not Confusion
   False

bool is just the most impressive example, the same problem exists with 
IntEnum and other enums derived from float, Decimal, datetime. [1]


The simplest solution is to return an instance of base class. But this 
can breaks a code, and for this case we should be use static method 
(like str.maketrans), not class method.


Should alternative constructor call __new__ and __init__ methods? Thay 
can change signature in derived class. Should it complain if __new__ or 
__init__ were overridden?


[1] http://bugs.python.org/issue23640

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] file system path protocol PEP

2016-05-11 Thread Serhiy Storchaka

On 11.05.16 19:43, Brett Cannon wrote:

os.path
'''

The various path-manipulation functions of ``os.path`` [#os-path]_
will be updated to accept path objects. For polymorphic functions that
accept both bytes and strings, they will be updated to simply use
code very much similar to
``path.__fspath__() if  hasattr(path, '__fspath__') else path``. This
will allow for their pre-existing type-checking code to continue to
function.


I afraid that this will hit a performance. Some os.path functions are 
used in tight loops, they are hard optimized, and adding support of path 
protocol can have visible negative effect.


I suggest first implement other changes and then look whether it is 
worth to add support of path protocol in os.path functions.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] file system path protocol PEP

2016-05-11 Thread Serhiy Storchaka

On 11.05.16 23:51, Ethan Furman wrote:

On 05/11/2016 01:44 PM, Serhiy Storchaka wrote:

I afraid that this will hit a performance. Some os.path functions are
used in tight loops, they are hard optimized, and adding support of path
protocol can have visible negative effect.


Do you have an example of os.path functions being used in a tight loop?


posixpath.realpath(), os.walk(), glob.glob() calls split() and join() 
for every path component. dirname() and basename() are also often 
called. I doesn't count functions like islink() and isfile() since they 
just pass the argument to underlying stat function and don't need 
conversion.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] file system path protocol PEP

2016-05-11 Thread Serhiy Storchaka

On 12.05.16 01:13, Brett Cannon wrote:



On Wed, 11 May 2016 at 13:45 Serhiy Storchaka mailto:storch...@gmail.com>> wrote:

On 11.05.16 19:43, Brett Cannon wrote:
 > os.path
 > '''''''
 >
 > The various path-manipulation functions of ``os.path`` [#os-path]_
 > will be updated to accept path objects. For polymorphic functions
that
 > accept both bytes and strings, they will be updated to simply use
 > code very much similar to
 > ``path.__fspath__() if  hasattr(path, '__fspath__') else path``. This
 > will allow for their pre-existing type-checking code to continue to
 > function.

I afraid that this will hit a performance. Some os.path functions are
used in tight loops, they are hard optimized, and adding support of path
protocol can have visible negative effect.


As others have asked, what specific examples do you have that os.path is
used in a tight loop w/o any I/O that would overwhelm the performance?


Most examples does some I/O (like os.lstat()): posixpath.realpath(), 
os.walk(), glob.glob(). But for example os.walk() was significantly 
boosted with using os.scandir(), it would be sad to make it slower 
again. os.path is used in number of files, sometimes in loops, sometimes 
indirectly. It is hard to find all examples.


Such functions as glob.glob() calls split() and join() for every 
component, but they also use string or bytes operations with paths. So 
they need to convert argument to str or bytes before start iteration, 
and always call os.path functions only with str or bytes. Additional 
conversion in every os.path function is redundant. I suppose most other 
high-level functions that manipulates paths in a loop also should 
convert arguments once at the start and don't need the support of path 
protocol in os.path functions.



I see this whole discussion breaking down into a few groups which
changes what gets done upfront and what might be done farther down the line:

 1. Maximum acceptance: do whatever we can to make all representation of
paths just work, which means making all places working with a path
in the stdlib accept path objects, str, and bytes.
 2. Safely use path objects: __fspath__() is there to signal an object
is a file system path and to get back a lower-level representation
so people stop calling str() on everything, providing some interface
signaling that someone doesn't misuse an object as a path and only
changing path consumptions APIs -- e.g. open() -- and not path
manipulation APIs -- e.g. os.path -- in the stdlib.
 3. It ain't worth it: those that would rather just skip all of this and
drop pathlib from the stdlib.

Ethan and Koos are in group #1 and I'm personally in group #2 but I
tried to compromise somewhat and find a middle ground in the PEP with
the level of changes in the stdlib but being more restrictive with
os.fspath(). If I were doing a pure group #2 PEP I would drop os.path
changes and make os.fspath() do what Ethan and Koos have suggested and
simply pass through without checks whatever path.__fspath__() returned
if the argument wasn't str or bytes.


I'm for adding conversions in C implemented path consuming APIs and may 
be in high-level path manipulation functions like os.walk(), but left 
low-level API of os.path, fnmatch and glob unchanged.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] file system path protocol PEP

2016-05-12 Thread Serhiy Storchaka

On 12.05.16 10:54, Ethan Furman wrote:

Currently, any of these functions that already take a string have to do
a couple pointer comparisons to make sure they have a string; any of
these functions that take both a string and a bytes have to do a couple
pointer comparisons to make sure they have a string or a bytes;  the
only difference if this PEP is accepted is the fall-back path when those
first checks fail.


This is cheap in C, but os.path functions are implemented in Python. 
They have to make at least one function call (os.fspath(), hasattr() or 
isinstance()), not counting a bytecode for retrieving arguments, 
resolving attributes, comparing, jumps. Currently os.path functions use 
tricks to avoid overheads


Yet one problem is that currently many os,path functions work with 
duck-typed strings (e.g. UserString). Using os.fspath() likely limit 
supported types to str, bytes and types that support the path protocol.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Improving the bytecode

2016-06-04 Thread Serhiy Storchaka
Following the converting 8-bit bytecode to 16-bit bytecode (wordcode), 
there are other issues for improving the bytecode.


1. http://bugs.python.org/issue27129
Make the bytecode more 16-bit oriented.

2. http://bugs.python.org/issue27140
Add new opcode BUILD_CONST_KEY_MAP for building a dict with constant 
keys. This optimize the common case and especially helpful for two 
following issues (creating and calling functions).


3. http://bugs.python.org/issue27095
Simplify MAKE_FUNCTION/MAKE_CLOSURE. Instead packing three numbers in 
oparg the new MAKE_FUNCTION takes built tuples and dicts from the stack. 
MAKE_FUNCTION and MAKE_CLOSURE are merged in the single opcode.


4. http://bugs.python.org/issue27213
Rework CALL_FUNCTION* opcodes. Replace four existing opcodes with three 
simpler and more efficient opcodes.


5. http://bugs.python.org/issue27127
Rework the for loop implementation.

6. http://bugs.python.org/issue17611
Move unwinding of stack for "pseudo exceptions" from interpreter to 
compiler.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Improving the bytecode

2016-06-05 Thread Serhiy Storchaka

On 05.06.16 21:24, Raymond Hettinger wrote:

On Jun 4, 2016, at 1:08 AM, Serhiy Storchaka  wrote:
1. http://bugs.python.org/issue27129
Make the bytecode more 16-bit oriented.


I don' think this should be done.  Adding the /2 and *2 just complicates the 
code and messes with my ability to reason about jumps.

With VM opcodes, there is always a tension between being close to 
implementation (what byte address are we jumping to) and being high level (what 
is the word offset).  In this case, I think we should stay with the former 
because they are primarily used in ceval.c and peephole.c which are close to 
the implementation.  At the higher level, there isn't any real benefit either 
(because dis.py already does a nice job of translating the jump targets).

Here is one example of the parts of the diff that cause concern that future 
maintenance will be made more difficult by the change:

-j = blocks[j + i + 2] - blocks[i] - 2;
+j = (blocks[j * 2 + i + 2] - blocks[i] - 2) / 2;

Reviewing the original line only gives me a mild headache while the second one 
really makes me want to avert my eyes ;-)


The /2 and *2 are added just because Victor wants to keep f_lineno 
counting bytes. Please look at my first patch. It doesn't contain /2 and 
*2. It even contains much less +2 and -2. For example the above change 
looks as:


-j = blocks[j + i + 2] - blocks[i] - 2;
+j = blocks[j + i + 1] - blocks[i] - 1;

Doesn't this give you less headache?


2. http://bugs.python.org/issue27140
Add new opcode BUILD_CONST_KEY_MAP for building a dict with constant keys. This 
optimize the common case and especially helpful for two following issues 
(creating and calling functions).


This shows promise.

The proposed name BUILD_CONST_KEY_MAP is much more clear than BUILD_MAP_EX.


If you accept this patch, I'll commit it. At least two other issues wait 
this.



5. http://bugs.python.org/issue27127
Rework the for loop implementation.


I'm unclear what problem is being solved by requiring that GET_ITER always 
followed immediately by FOR_ITER.


As I understand, the purpose was to decrease the number of executed 
opcodes. It looks to me that existing patch is not acceptable, because 
there is a reason for using two opcodes in the for loop start. But I 
think that we can use other optimization here. I'll try to write a patch.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Daily reference leaks (30f33e6a04c1): sum=975756

2013-10-18 Thread Serhiy Storchaka

18.10.13 21:04, Brett Cannon написав(ла):

That was it, so Antoine and Zach were right about the location. Should
be fixed now.


Thank you Brett.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Issue #19330: Handle the no-docstrings case in tests

2013-10-26 Thread Serhiy Storchaka

26.10.13 15:50, Stefan Krah написав(ла):

nick.coghlan  wrote:

http://hg.python.org/cpython/rev/a9bbc2d0c1dc
-HAVE_DOCSTRINGS = (check_impl_detail(cpython=False) or
-   sys.platform == 'win32' or
-   sysconfig.get_config_var('WITH_DOC_STRINGS'))
+# Rather than trying to enumerate all the cases where docstrings may be
+# disabled, we just check for that directly
+
+def _check_docstrings():
+"""Just used to check if docstrings are enabled"""
+
+HAVE_DOCSTRINGS = (_check_docstrings.__doc__ is not None)

  requires_docstrings = unittest.skipUnless(HAVE_DOCSTRINGS,


I think that does not detect --without-doc-strings (i.e. the C docstrings are
empty).


Indeed. HAVE_DOCSTRINGS was introduced to skip tests for the C 
docstrings. Python docstrings tests are skipped if sys.flags.optimize >= 2.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Issue #19330: Handle the no-docstrings case in tests

2013-10-26 Thread Serhiy Storchaka

26.10.13 20:32, Nick Coghlan написав(ла):

On 27 October 2013 01:10, Serhiy Storchaka  wrote:

26.10.13 15:50, Stefan Krah написав(ла):


nick.coghlan  wrote:


http://hg.python.org/cpython/rev/a9bbc2d0c1dc
-HAVE_DOCSTRINGS = (check_impl_detail(cpython=False) or
-   sys.platform == 'win32' or
-   sysconfig.get_config_var('WITH_DOC_STRINGS'))
+# Rather than trying to enumerate all the cases where docstrings may be
+# disabled, we just check for that directly
+
+def _check_docstrings():
+"""Just used to check if docstrings are enabled"""
+
+HAVE_DOCSTRINGS = (_check_docstrings.__doc__ is not None)

   requires_docstrings = unittest.skipUnless(HAVE_DOCSTRINGS,



I think that does not detect --without-doc-strings (i.e. the C docstrings
are
empty).



Indeed. HAVE_DOCSTRINGS was introduced to skip tests for the C docstrings.
Python docstrings tests are skipped if sys.flags.optimize >= 2.


That's *extraordinarily* confusing, especially when Serhiy suggested I
use the flag when testing a pure Python module.


I'm sorry for misleading you.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Compiler security

2013-10-31 Thread Serhiy Storchaka

31.10.13 16:56, Benjamin Peterson написав(ла):

I believe the 5 problems they found in Python were dealt with here
http://bugs.python.org/issue17016


Ah, now I have remembered author's name.

http://bugs.python.org/issue18684 contains some other fixes of this kind.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issue 19332: Guard against changing dict during iteration

2013-11-06 Thread Serhiy Storchaka

06.11.13 07:41, Nick Coghlan написав(ла):

If the benchmark suite indicates there's no measurable speed penalty
then such a patch may be worth reconsidering.


I don't see any significant speed difference even in artificial 
presumably worst case (a lot of items assignment in tight loop).

If you have tests which demonstrate a difference, please show them.


I'd be astonished if that
was actually the case, though - the lowest impact approach I can think
of is to check for live iterators when setting a dict entry, and that
still has non-trivial size and speed implications.


Actually we should guard not against changing dict during iteration, but 
against iterating modifying dict (and only such modifications which 
change dict's keys). For this we need only keys modification counter in 
a dict and it's copy in an iterator (this doesn't increase memory 
requirements however). I suppose Java use same technique in HashMap.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issue 19332: Guard against changing dict during iteration

2013-11-06 Thread Serhiy Storchaka

06.11.13 21:12, Eric Snow написав(ла):

Just to clarify, do you mean we should only guard against
modifications to the dict's keys during its iterator's __next__()?
Changes to the values during __next__() would be okay, as would
changes to the keys if they happen outside __next__()?


Dict iteration order can be changed after adding or deleting a key so 
continue iterating after this is not well defined. __next__() is invalid 
if dict keys was modified after iterators creation. This is common 
assertion for hashtables in many programming languages. The raising an 
exception in such situation just helps a programmer to find his mistake. 
Java raises an exception, in C++ this is an undefined behavior.



Presumably the above restriction also applies to the iterators of the
dict's views.


Yes. The proposed patch have tests for these cases.


OrderedDict would also need to be changed, meaning this counter would
have to be accessible to and incrementable by subclasses.  OrderedDict
makes use of the MappingView classes in collections.abc, so those
would also have be adjusted to check this counter.


Perhaps issue19414 is related. OrderedDict not needs such behavior 
because it's iteration order is well defined and we can implement 
another reasonable behavior (for example deleted keys are skipped and 
newly added keys are always iterated, because they added at the end of 
iteration order).



Would MutableMapping also need to be adjusted to accommodate the
counter?  Given that the MappingView classes would rely on the
counter, I'd expect MutableMapping to need some method or variable
that the views can rely on.  How would we make sure custom methods,
particularly __setitem__() and __delitem__(), increment the counter?


MutableMapping doesn't implement neither __iter__() nor __setitem__(). 
Concrete implementation is responsible of this, either provide reliable 
__iter__() which have predictable behavior after __setitem__(), or 
detect such modification and raise an exception, or just ignore the problem.


The MappingView classes would get this for free because their iterators 
iterate over mapping itself.



A strictly monotonic counter, right?  Every mutation method of dict
would have to increment this counter.  So would that put a limit
(albeit a high one) on the number of mutations that can be made to a
dict?  Would there be conditions under which we'd reset the counter?
If so, how would existing iterators cope?


It is very unlikely that unintentional causes exact 2**32 or 2**64 
mutations between dict iterations. If this will happen the programmer 
should use other methods to find his mistakes.



Because the iterators (and views) already have a pointer to the dict, right?


Currently the PyDictObject object contains 3 words and adding yet one 
word unlikely change actually allocated size. Dict iterators already 
contain a field for dict's size, it will be replaced by a field for 
dict's counter.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Avoid formatting an error message on attribute error

2013-11-07 Thread Serhiy Storchaka

07.11.13 00:32, Victor Stinner написав(ла):

I'm trying to avoid unnecessary temporary Unicode strings when
possible (see issue #19512). While working on this, I saw that Python
likes preparing an user friendly message to explain why getting an
attribute failed. The problem is that in most cases, the caller
doesn't care of the message: the exception is simply deleted. For
example, hasattr() deletes immediatly the AttributeError.

It would be nice to only format the message on demand. The
AttributeError would keep a reference to the type. Keeping a strong
reference to the type might change the behaviour of some applications
in some corner cases. (Holding a reference to the object would be
worse, and the type looks to be preferred over the type to format the
error message.)


See also:

http://bugs.python.org/issue18156 : Add an 'attr' attribute to 
AttributeError

http://bugs.python.org/issue18162 : Add index attribute to IndexError
http://bugs.python.org/issue18163 : Add a 'key' attribute to KeyError


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] "*zip-bomb" via codecs

2013-11-14 Thread Serhiy Storchaka
It is possible make a DDoS using the fact that codecs registry provides 
access to gzip and bzip2 decompressor. Someone can send HTTP request or 
email message with specified "gzip_codec" or "bzip2_codec" as content 
encoding and great well compressed gzip- or bzip2-file as a content. 
Naive server will use the bytes.decode() method to decompress a content. 
It is possible to create small compressed files which require very much 
time and memory to decompress. Of course bytes.decode() will fail 
becouse decoder returns bytes instead string, but time and memory are 
already wasted.


I have no working example but I'm sure it will be easy to create it. I 
suspect many services will be vulnerable for this attack.


Simple solution for this problem is check any foreign encoding that it 
is conteined in a special set of safe encodings. But every program 
should check it explicitly. For more general solution bytes.decode() 
should reject encoding *before* starting of decoding. I.e. either all 
bytes->str decoders should be registered in separated registry, or all 
codecs should have additional attributes which determines input and 
output type.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-14 Thread Serhiy Storchaka

15.11.13 01:03, Nick Coghlan написав(ла):

We already do this check in the existing convenience methods - it raises
TypeError.


The problem with this check is that it happens *after* 
encoding/decoding. This opens door for DoS (see my last message).



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-14 Thread Serhiy Storchaka

15.11.13 00:32, Victor Stinner написав(ла):

And add transform() and untransform() methods to bytes and str types.
In practice, it might be same codecs registry for all codecs just with
a new attribute.


If the transform() method will be added, I prefer to have only one 
transformation method and specify a direction by the transformation name 
("bzip2"/"unbzip2").


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Serhiy Storchaka

15.11.13 12:02, Steven D'Aprano написав(ла):

It would be really good to be able to query the available codecs. For
example, many applications offer an "Encoding" menu, where you can
specify the codec used for text. That's hard in Python, since you
can't retrieve a list of known codecs.


And you can't determine which codec is binary<->text encoding.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


<    1   2   3   4   5   6   7   8   9   10   >