Re: [Python-Dev] New calling convention to avoid temporarily tuples when calling functions

2016-08-22 Thread Victor Stinner
Hi,

I pushed the most basic implementation of _PyObject_FastCall(), it
doesn't support keyword parameters yet:
https://hg.python.org/cpython/rev/a1a29d20f52d
https://bugs.python.org/issue27128

Then I patched a lot of call sites calling PyObject_Call(),
PyObject_CallObject(), PyEval_CallObject(), etc. with a temporary
tuple. Just one example:

-args = PyTuple_Pack(1, match);
-if (!args) {
-Py_DECREF(match);
-goto error;
-}
-item = PyObject_CallObject(filter, args);
-Py_DECREF(args);
+item = _PyObject_FastCall(filter, &match, 1, NULL);

The next step is to support keyword parameters. In fact, it's already
supported in all cases except of Python functions:
https://bugs.python.org/issue27809

Supporting keyword parameters will allow to patch much code to avoid
temporary tuples, but it is also required for a much more interesting
change:
https://bugs.python.org/issue27810
"Add METH_FASTCALL: new calling convention for C functions"

I propose to add a new METH_FASTCALL calling convention. The example
using METH_VARARGS | METH_KEYWORDS:
   PyObject* func(DirEntry *self, PyObject *args, PyObject *kwargs)
becomes:
   PyObject* func(DirEntry *self, PyObject **args, int nargs, PyObject *kwargs)

Later, Argument Clinic will be modified to *generate* code using the
new METH_FASTCALL calling convention. Code written with Argument
Clinic will only need to be updated by Argument Clinic to get the new
faster calling convention (avoid the creation of a temporary tuple for
positional arguments).

Victor
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] File system path encoding on Windows

2016-08-22 Thread Stephen J. Turnbull
Nick Coghlan writes:
 > On 21 August 2016 at 06:31, Steve Dower  wrote:

 > > My biggest concern is that it then falls onto users to know how
 > > to start Python with that flag.

The users I'm most worried about belong to organizations where
concerted effort has been made to "purify" the environment so that
they *can* use bytes-oriented code.  That is, getfilesystemencoding()
== getpreferredencoding() == what is actually used throughout the
system.  Such organizations will be able to choose the flag correctly,
and implement it organization-wide, I'm pretty sure.  I doubt that all
will choose UTF-8 at this point in time, though I wish they would.

 > Not necessarily, as this is one of the areas where commercial
 > redistributors can earn their revenue stream - by deciding that
 > flipping the default behaviour is the right thing to do for *their*
 > user base (which is inevitably only a subset of the overall Python
 > user base).

This assumes that the Python applications are the mission-critical
ones for their clients.  What if they're not?  I think the commercial
redistributors will have to make their decisions on a client-by-client
basis, too.  They may be in a better position to do so, but why buy
trouble?  They'll be quite conservative (unless they're basically
monopoly IT supplier to a whole organization, but they'll still have
to face potential problems from existing files on users' storage, and
perhaps applications that they supply but don't "own").

I have real trouble seeing trying to force UTF-8 as a good idea until
the organization mandates UTF-8. :-(  This really is an organizational
decision, to be implemented with client resources.  We can't do it for
them, redistributors can't do it for them.  It needs to be an option
in Python.

Python itself is already ready for UTF-8, except that on Windows
getfilesystemencoding and getpreferredencoding can't honestly return
'utf-8', AIUI.  I understand that that is exactly what Steve wants to
change, but "honestly" is the rub.  What happens if Python 3.6 is only
part of a bytes-oriented system, receives a filename forced to UTF-8-
encoded bytes, and passes that over a pipe or in shared memory or in a
file to a non-Python-3.6 application that trusts the system defaults?
"Boom!", no?  Is there any experience anywhere in any implementation
language with systems used on Windows that use this approach of
pretending the Windows world is UTF-8?  If not, why is it a good idea
for Python to go first?

 > Making that possible doesn't mean redistributors will actually follow
 > through, but it's an option worth keeping in mind, as while it does
 > increase the ecosystem complexity in the near term (since default
 > behaviour may vary based on how you obtained your Python runtime), in
 > the longer term it can allow for better informed design decisions at
 > the reference interpreter level. (For business process wonks, it's
 > essentially like running through a deliberate divergence/convergence
 > cycle at the level of the entire language ecosystem:
 > http://theagilepirate.net/archives/1392 )

It's worse than "the entire language ecosystem" -- it's your whole
business.[1]  If the proposed change to getfilesystemencoding and file
system APIs creates issues at all, it matters because files on disk,
or other applications that receive bytes from Python, refer to
filenames encoded in the preferred encoding != UTF-8.  It's unlikely
in the extreme that all such files are exclusively used by Python,
which at best means individual users will need to manage encodings
file by file.  At worst, some of the filenames so encoded will be
shared with applications that expect the preferred encoding, and then
you've got a war on your hands.

 > > On the other hand, having code opt-in or out of the new handling
 > > requires changing code (which is presumably not going to happen,
 > > or we wouldn't consider keeping the old behaviour and/or letting
 > > the user control it),

I don't understand why this argument doesn't cut both ways equally.
If you believe that, you should also believe that the same people who
won't change code to opt in also won't use a Python containing fix #1,
and may not install it at all.  Doesn't that matter?

 > I think you'll want to escalate this to a PEP as well

+1 for the reasons Nick gives.  The conclusions of this discussion
need a canonical URL.


Footnotes: 
[1]  I'm assuming that readers are going to associated "language" <-->
"Python".  The blog post Nick refers to is about the whole business.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Anyone know Brendan Scott, author of 'Python for Kids'?

2016-08-22 Thread Nick Coghlan
On 22 August 2016 at 07:22, Terry Reedy  wrote:
> So, if you agree with me, please either write Brendan personally if you know
> him, or just leave your own comment on the blog.

Brendan spoke at the inaugural PyCon Australia Education Seminar last
year, so I've contacted him (cc you) to suggest making the fix.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] File system path encoding on Windows

2016-08-22 Thread Steve Dower

On 22Aug2016 0247, Stephen J. Turnbull wrote:

Nick Coghlan writes:
 > On 21 August 2016 at 06:31, Steve Dower  wrote:

 > > My biggest concern is that it then falls onto users to know how
 > > to start Python with that flag.

The users I'm most worried about belong to organizations where
concerted effort has been made to "purify" the environment so that
they *can* use bytes-oriented code.  That is, getfilesystemencoding()
== getpreferredencoding() == what is actually used throughout the
system.  Such organizations will be able to choose the flag correctly,
and implement it organization-wide, I'm pretty sure.  I doubt that all
will choose UTF-8 at this point in time, though I wish they would.


I think that these are also the people who are likely to read a PEP and 
enable an environment variable to preserve the current behaviour. I'm 
more concerned about uncontrolled environments where a library breaks on 
a random user's machine because random user downloaded a file from a 
foreign website.


I don't recall whether I mentioned an environment variable (i.e. 
PYTHONUSELEGACYENCODING or similar) to switch back to mbcs:ignore by 
default, but it was always my intent to have one.



Python itself is already ready for UTF-8, except that on Windows
getfilesystemencoding and getpreferredencoding can't honestly return
'utf-8', AIUI.  I understand that that is exactly what Steve wants to
change, but "honestly" is the rub.  What happens if Python 3.6 is only
part of a bytes-oriented system, receives a filename forced to UTF-8-
encoded bytes, and passes that over a pipe or in shared memory or in a
file to a non-Python-3.6 application that trusts the system defaults?
"Boom!", no?  Is there any experience anywhere in any implementation
language with systems used on Windows that use this approach of
pretending the Windows world is UTF-8?  If not, why is it a good idea
for Python to go first?


The Windows world is Unicode. Mostly represented in UTF-16, but UTF-8 is 
entirely equivalent.


All MSVC users have been pushed towards Unicode for many years. The .NET 
Framework has defaulted to UTF-8 its entire existence. The use of code 
pages has been discouraged for decades. We're not going first :)



 > > On the other hand, having code opt-in or out of the new handling
 > > requires changing code (which is presumably not going to happen,
 > > or we wouldn't consider keeping the old behaviour and/or letting
 > > the user control it),

I don't understand why this argument doesn't cut both ways equally.
If you believe that, you should also believe that the same people who
won't change code to opt in also won't use a Python containing fix #1,
and may not install it at all.  Doesn't that matter?


People already do this (e.g. Python 2.7). I don't think it should matter 
enough to prevent us from making changes in new versions of Python. 
Otherwise, why would we ever release new versions?


So I guess the question here is: for organisations who have already 
(incorrectly) assumed that the file system encoding and the active code 
page are always the same, have built solid infrastructure around this 
using bytes (including ensuring that their systems never encounter 
external paths in glob/listdir/etc.), are currently using 3.5 and want 
to migrate to 3.6 - is an environment variable to change back to mbcs 
sufficient to meet their needs?


Cheers,
Steve
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] File system path encoding on Windows

2016-08-22 Thread eryk sun
On Mon, Aug 22, 2016 at 3:58 PM, Steve Dower  wrote:
> All MSVC users have been pushed towards Unicode for many years. The .NET
> Framework has defaulted to UTF-8 its entire existence. The use of code pages
> has been discouraged for decades. We're not going first :)

I just wrote a simple function to enumerate the 822 system locales on
my Windows box (using EnumSystemLocalesEx and GetLocaleInfoEx, which
are Unicode-only functions), and 36.7% of them lack an ANSI codepage.
They're Unicode-only locales. UTF-8 is the only way to support these
locales with a bytes API.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Deprecating invalid escape sequences: review request

2016-08-22 Thread Emanuel Barry
Hello Python-dev,
some time ago I went ahead and implemented a patch to deprecate the invalid
escape sequences (e.g. \c, \g, \h, etc.) in str and bytes literals. The
change itself is pretty straightforward, and shouldn't be hard to review.
The change was split in two patches; one which does the actual deprecation
and adds relevant tests, and the other fixes all invalid escape sequences in
the entire CPython distribution (this resulted in a substantial patch file
of over 2000 lines). I'd like to get this reviewed and merged in time, so
I'm asking here. Thanks in advance!

http://bugs.python.org/issue27364

-Emanuel
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] File system path encoding on Windows

2016-08-22 Thread Stephen J. Turnbull
Steve Dower writes:

 > The Windows world is Unicode. Mostly represented in UTF-16, but UTF-8 is 
 > entirely equivalent.

Sort of, yes, and not for present purposes.

AFAICS, the Windows world is mostly application/* media that require
substantial developer effort to extract text from; character encoding
is a minor annoyance.  These are not Unicode, even if the embedded
text uses the Unicode coded character set.  When in comes to text/*
media (including file system names), my personal experience is that
non-Unicode encodings are used often, even where they're forbidden
(and, ironically enough, where forbidden only by Windows users[1]).

As far as the UTF in use, I concede your expertise.

UTF-8 is absolutely not equivalent to UTF-16 from the point of view of
developers. Passing it to Windows APIs requires decoding to UTF-16 (or
from a Python developer's point of view, decoding to str and use of
str APIs).  That fact is what got you started on this whole proposal!

 > All MSVC users have been pushed towards Unicode for many years.

But that "push" is due to the use of UTF-16-based *W APIs and
deprecation of ACP-based *A APIs, right?  The input to *W APIs must be
decoded from all text/* content "out there", including UTF-8 content.
I don't see evidence that users have been pushed toward *UTF-8* in that
statement; they may be decoding from something else.  Unicode != UTF-8
for our purposes!

In any case, I suspect lot of people use Python to avoid C, and so
existing Python users may not be affected by MSVC "pressure".

 > The .NET Framework has defaulted to UTF-8

Default != enforce, though.  Do you know that almost nobody changes
the default, and that behavior is fairly uniform across different
classes of organization (specifically by language)?  Or did you mean
"enforce"?

 > its entire existence. The use of code pages has been discouraged
 > for decades. We're not going first :)

The fact that a framework, which by definition provides a world-
within-a-world, can insist on UTF-8 from the start is very different
from a generic programming language, which has deliberately provided
multiscript capability for decades.  People who buy in to .NET do so
because the disadvantages (which may include character encoding
conversion at the boundary, or "purification" of the environment to
use only UTF-8) are outweighed by both the individual features of the
framework and their packaging into a consistent whole.  This is
closely related to my idea about "effective monopoly IT providers".

On the contrary, people who use Python may very well have done to
*avoid* the Unicode strictures of .NET (or at least consider it a
convenience compared to changing user behavior to conform to .NET),
perhaps "localized" to a particular department or use case.  I believe
I've mentioned that my employers' various downloadable database
queries (course catalog, student rosters) are mostly structured as CSV
files, with the option to encode as UTF-8 or Shift-JIS.  I suspect
that is very common in Japanese universities because of the popularity
of Macs among educators, professionals, and students.  I don't know
about business and government, which is very Windows-oriented.  There,
I suspect Shift-JIS is the rule for text/* media, but Excel for data
tables and Word, Powerpoint, and PDF for "rich text" may be used almost
exclusively, so text/* may not be relevant in information interchange.

 > > I don't understand why this argument doesn't cut both ways
 > > equally.  If you believe that, you should also believe that the
 > > same people who won't change code to opt in also won't use a
 > > Python containing fix #1, and may not install it at all.  Doesn't
 > > that matter?
 > 
 > People already do this (e.g. Python 2.7). I don't think it should
 > matter enough to prevent us from making changes in new versions of
 > Python.

Of course it shouldn't, for the generic idea of change.  But the
argument you made is that "if we don't *force* UTF-8, users who won't
change code won't get the benefit of UTF-8".  My rebuttal is that "if
we *do* force UTF-8, those same users lose the benefit of both Python
3.6 and UTF-8."  It matters how many are in that situation, but
unfortunately we'll just have to guess about that.

 > So I guess the question here is: for organisations who have already
 > (incorrectly) assumed that the file system encoding and the active
 > code page are always the same,

Stop bashing the users, please!  This "users are stupid, we know
better" is the attitude that scares me about this proposal.  In the
enterprises I'm talking about, that is an organizational decision, not
an assumption.  (It is likely to be "close enough" to true in some
cases that lack such a policy, too.)  Or are you telling me that
Windows will change the active code page behind the users' backs even
if it's told not to do so?

Now, you can argue that few organizations actually have such policies,
and you may be right.  I don't know, and you don't know.  The damag

Re: [Python-Dev] socket.setsockopt() with optval=NULL

2016-08-22 Thread Benjamin Peterson
Another option would be add a setalg method with whatever (nice,
pythonic) API we want. Emulating the crummy C-level API needn't be a
goal I think.

On Sun, Aug 21, 2016, at 05:37, Christian Heimes wrote:
> Hi,
> 
> the socket.setsockopt(level, optname, value) method has two calling
> variants. When it is called with a buffer-like object as value, it calls
> the C API function setsockopt() with optval=buffer.buf and
> optlen=buffer.len. When value is an integer, setsockopt() packs it as
> int32 and sends it with optlen=4.
> 
> ---
> # example.py
> import socket
> sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR,
> b'\x00\x00\x00\x00')
> sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
> ---
> 
> $ strace -e setsockopt ./python example.py
> setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [0], 4) = 0
> setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> 
> 
> For AF_ALG (Linux Kernel crypto) I need a way to call the C API function
> setsockopt() with optval=NULL and optlen as any arbitrary number. I have
> been playing with multiple ideas. So far I liked the idea of
> value=(None, int) most.
> 
> setsockopt(socket.SOL_ALG, socket.ALG_SET_AEAD_AUTHSIZE, (None, taglen))
> 
> What do you think?
> 
> Christian
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/benjamin%40python.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com