Re: [Python-Dev] Python 2.7 patch levels turning two digit

2014-06-23 Thread Nick Coghlan
On 24 Jun 2014 07:29, "Donald Stufft"  wrote:
>
>
> On Jun 23, 2014, at 5:22 PM, Barry Warsaw  wrote:
>
> > On Jun 23, 2014, at 05:15 PM, Donald Stufft wrote:
> >
> >> Normally when I see someone suggest that switching compilers
> >> in 2.7.x is likely to be less work than releasing a 2.8 It normally
> >> appears to me they haven’t looked at the impact on the packaging
> >> tooling.
> >
> > Just to be clear, releasing a Python 2.8 has enormous impact outside of
just
> > the amount of work to do it.  It's an exceedingly bad idea.
>
> Can you clarify?
>
> Also FWIW I’m not really married to the 2.8 thing, it’s mostly that, on
Windows, the X.Y release
> prior to the ABI thing in 3.x _was_ the ABI so all the tooling builds on
that. So you need to
> either
>
> 1) Stick with the old Compiler

This is what we're going with. Steve is working on making that more
manageable from the Visual Studio side, and there are some folks in the
numeric/scientific community looking at improving the usability of the
MinGW toolchain for the purpose of building Python 2.7 C extensions.

> 2) Release 2.8

Impractical for the various reasons Barry listed.

> 3) Do all the work to fix all the tooling to cope with the fact that X.Y
isn’t the ABI on 2.x anymore

Impractical for the various reasons you listed.

> I don’t think a reasonable option is:
>
> 4) Just switch compilers and leave it on someone else’s doorsteps to fix
the entire packaging
> tool chain to cope.

Agreed. We discussed this option in detail when the Stackless folks asked
about it a while ago, and the conclusion was that the risk of obscure
breakage was just too high.

Cheers,
Nick.

>
> -
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372
DCFA
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7

2014-06-24 Thread Nick Coghlan
On 25 Jun 2014 07:05, "Ethan Furman"  wrote:
>
> On 06/24/2014 12:54 PM, Ned Deily wrote:
>>
>>
>> Yes, we are committed to maintaining
>> Python 2.7 for multiple years but that doesn't mean we have to fix every
>> open issue or even most open issues.  Any or all of the above costs may
>> apply to any changes we make.  For many of our users, the best
>> maintenance policy for Python 2.7 would be the least change possible.
>
>
> +1
>
> We need to keep 2.7 running, but we don't need to kill ourselves doing
it.  If a bug has been there for a while, the affected users are probably
working around it by now.  ;)

Aye, in this case, I'm in the "officially deprecate the feature" camp.
Don't actively try to break it further, just slap a warning in the docs to
say it is no longer a supported configuration.

In my own personal case, I not only wasn't aware that there was still an
option to turn off the Unicode support, but I also wouldn't really class a
build with it turned off as still being Python. As Jim noted, there are
quite a lot of APIs that don't make sense if there's no Unicode type
available.

Cheers,
Nick.

>
> --
> ~Ethan~
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7

2014-06-25 Thread Nick Coghlan
On 26 Jun 2014 01:13, "Serhiy Storchaka"  wrote:
>
> 25.06.14 16:29, Victor Stinner написав(ла):
>>
>> 2014-06-25 14:58 GMT+02:00 Serhiy Storchaka :
>>>
>>> Other benefit: patches exposed several bugs in code (mainly errors in
>>> backporting from 3.x).
>>
>>
>> Oh, interesting. Do you have examples of such bugs?
>
>
> In posixpath branches for unicode and str should be reversed.
> In multiprocessing .encode('utf-8') is applied on utf-8 encoded str (this
is unicode string in Python 3). And there is similar error in at least one
other place. Tests for bytearray actually test bytes, not bytearray. That
is what I remember.

OK, *that* sounds like an excellent reason to keep the Unicode disabled
builds functional, and make sure they stay that way with a buildbot: to
help make sure we're not accidentally running afoul of the implicit
interoperability between str and unicode when backporting fixes from Python
3.

Helping to ensure correct handling of str values makes this capability
something of benefit to *all* Python 2 users, not just those that turn off
the Unicode support. It also makes it a potentially useful testing tool
when assessing str/unicode handling in general.

Regards,
Nick.

>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Binary CPython distribution for Linux

2014-06-27 Thread Nick Coghlan
On 27 Jun 2014 17:33, "Bohuslav Kabrda"  wrote:
>
> It's not true that 2.7 wasn't released until few weeks ago. It was
released few weeks ago as part of RHEL 7, but Red Hat has been shipping Red
Hat Software Collections (RHSCL) 1.0, that contain Python 2.7 and Python
3.3, for almost a year now [1] - RHSCL is installable on RHEL 6; RHSCL 1.1
(also with 2.7 and 3.3) has been released few weeks ago and is supported on
RHEL 6 and 7. Also, these collections now have their community rebuilds at
[2], so you can just download them without needing to talk to Red Hat at
all. But yeah, these are all RPMs, so you have to be root to install them.

Indeed, while there are still some rough edges, software collections look
like the best approach to doing maintainable system installs of Python
runtimes other than the system Python into Fedora/RHEL/CentOS et al (and I
say that while wearing both my upstream and downstream hats).

Collections solve this problem in a general (rather than CPython specific)
way, since they can be used to get upgraded versions of language runtimes,
databases, web servers, etc, all without risking the stability of the OS
itself. I hope to see someone put together collections for PyPy and PyPy3
as well.

The approaches used for runtime isolation of software collections should
also be applicable to Debian systems, but (as far as I am aware) the
tooling to build them as debs rather than RPMs doesn't exist yet.

> Please don't take this as a criticism of your ideas, I see what you're
trying to solve. I just think the way you're trying to solve it is
unachievable or would consume so much community resources, that it would
end up unmaintained and buggy most of the time.

For prebuilt userland installs on Linux, I think "miniconda" is the current
best available approach. It has its challenges (especially around its
handling of security concerns), but it's designed to offer a full cross
platform package management system that makes it well suited to the task of
managing prebuilt language runtimes in user space.

Cheers,
Nick.

>
> --
> Regards,
> Bohuslav "Slavek" Kabrda.
>
> [1] http://developerblog.redhat.com/2013/09/12/rhscl1-ga/
> [2] https://www.softwarecollections.org/en/scls/
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-27 Thread Nick Coghlan
On 28 Jun 2014 01:27, "Jonas Wielicki"  wrote:
>
> On 27.06.2014 00:59, Ben Hoyt wrote:
> > Specifics of proposal
> > =
> > [snip] Each ``DirEntry`` object has the following
> > attributes and methods:
> > [snip]
> > Notes on caching
> > 
> >
> > The ``DirEntry`` objects are relatively dumb -- the ``name`` attribute
> > is obviously always cached, and the ``is_X`` and ``lstat`` methods
> > cache their values (immediately on Windows via ``FindNextFile``, and
> > on first use on Linux / OS X via a ``stat`` call) and never refetch
> > from the system.
>
> I find this behaviour a bit misleading: using methods and have them
> return cached results. How much (implementation and/or performance
> and/or memory) overhead would incur by using property-like access here?
> I think this would underline the static nature of the data.
>
> This would break the semantics with respect to pathlib, but they’re only
> marginally equal anyways -- and as far as I understand it, pathlib won’t
> cache, so I think this has a fair point here.

Indeed - using properties rather than methods may help emphasise the
deliberate *difference* from pathlib in this case (i.e. value when the
result was retrieved from the OS, rather than the value right now). The
main benefit is that switching from using the DirEntry object to a pathlib
Path will require touching all the places where the performance
characteristics switch from "memory access" to "system call". This benefit
is also the main downside, so I'd actually be OK with either decision on
this one.

Other comments:

* +1 on the general idea
* +1 on scandir() over iterdir, since it *isn't* just an iterator version
of listdir
* -1 on including Windows specific globbing support in the API
* -0 on including cross platform globbing support in the initial iteration
of the API (that could be done later as a separate RFE instead)
* +1 on a new section in the PEP covering rejected design options (calling
it iterdir, returning a 2-tuple instead of a dedicated DirEntry type)
* regarding "why not a 2-tuple", we know from experience that operating
systems evolve and we end up wanting to add additional info to this kind of
API. A dedicated DirEntry type lets us adjust the information returned over
time, without breaking backwards compatibility and without resorting to
ugly hacks like those in some of the time and stat APIs (or even our own
codec info APIs)
* it would be nice to see some relative performance numbers for NFS and
CIFS network shares - the additional network round trips can make excessive
stat calls absolutely brutal from a speed perspective when using a network
drive (that's why the stat caching added to the import system in 3.3
dramatically sped up the case of having network drives on sys.path, and why
I thought AJ had a point when he was complaining about the fact we didn't
expose the dirent data from os.listdir)

Regards,
Nick.

>
> regards,
> jwi
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-28 Thread Nick Coghlan
On 28 June 2014 19:17, Nick Coghlan  wrote:
> Agreed, but walking even a moderately large tree over the network can
> really hammer home the point that this offers a significant
> performance enhancement as the latency of access increases. I've found
> that kind of comparison can be eye-opening for folks that are used to
> only operating on local disks (even spinning disks, let alone SSDs)
> and/or relatively small trees (distro build trees aren't *that* big,
> but they're big enough for this kind of difference in access overhead
> to start getting annoying).

Oops, forgot to add - I agree this isn't a blocking issue for the PEP,
it's definitely only in "nice to have" territory.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-28 Thread Nick Coghlan
On 28 June 2014 16:17, Gregory P. Smith  wrote:
> On Fri, Jun 27, 2014 at 2:58 PM, Nick Coghlan  wrote:
>> * it would be nice to see some relative performance numbers for NFS and
>> CIFS network shares - the additional network round trips can make excessive
>> stat calls absolutely brutal from a speed perspective when using a network
>> drive (that's why the stat caching added to the import system in 3.3
>> dramatically sped up the case of having network drives on sys.path, and why
>> I thought AJ had a point when he was complaining about the fact we didn't
>> expose the dirent data from os.listdir)
>
> fwiw, I wouldn't wait for benchmark numbers.
>
> A needless stat call when you've got the information from an earlier API
> call is already brutal. It is easy to compute from existing ballparks remote
> file server / cloud access: ~100ms, local spinning disk seek+read: ~10ms.
> fetch of stat info cached in memory on file server on the local network:
> ~500us.  You can go down further to local system call overhead which can
> vary wildly but should likely be assumed to be at least 10us.
>
> You don't need a benchmark to tell you that adding needless >= 500us-100ms
> blocking operations to your program is bad. :)

Agreed, but walking even a moderately large tree over the network can
really hammer home the point that this offers a significant
performance enhancement as the latency of access increases. I've found
that kind of comparison can be eye-opening for folks that are used to
only operating on local disks (even spinning disks, let alone SSDs)
and/or relatively small trees (distro build trees aren't *that* big,
but they're big enough for this kind of difference in access overhead
to start getting annoying).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-28 Thread Nick Coghlan
On 29 June 2014 05:48, Ben Hoyt  wrote:
>>> But the underlying system calls -- ``FindFirstFile`` /
>>> ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X --
>>
>> What about FreeBSD, OpenBSD, NetBSD, Solaris, etc. They don't provide 
>> readdir?
>
> I guess it'd be better to say "Windows" and "Unix-based OSs"
> throughout the PEP? Because all of these (including Mac OS X) are
> Unix-based.

*nix and POSIX-based are the two conventions I use.


>> Crazy idea: would it be possible to "convert" a DirEntry object to a
>> pathlib.Path object without losing the cache? I guess that
>> pathlib.Path expects a full  stat_result object.
>
> The main problem is that pathlib.Path objects explicitly don't cache
> stat info (and Guido doesn't want them to, for good reason I think).
> There's a thread on python-dev about this earlier. I'll add it to a
> "Rejected ideas" section.

The key problem with caches on pathlib.Path objects is that you could
end up with two separate path objects that referred to the same
filesystem location but returned different answers about the
filesystem state because their caches might be stale. DirEntry is
different, as the content is generally *assumed* to be stale
(referring to when the directory was scanned, rather than the current
filesystem state). DirEntry.lstat() on POSIX systems will be an
exception to that general rule (referring to the time of first lookup,
rather than when the directory was scanned, so the answer rom lstat()
may be inconsistent with other data stored directly on the DirEntry
object), but one we can probably live with.

More generally, as part of the pathlib PEP review, we figured out that
a *per-object* cache of filesystem state would be an inherently bad
idea, but a string based *process global* cache might make sense for
modules like walkdir (not part of the stdlib - it's an iterator
pipeline based approach to file tree scanning I wrote a while back,
that currently suffers badly from the performance impact of repeated
stat calls at different stages of the pipeline). We realised this was
getting into a space where application and library specific concerns
are likely to start affecting the caching design, though, so the
current status of standard library level stat caching is "it's not
clear if there's an available approach that would be sufficiently
general purpose to be appropriate for inclusion in the standard
library".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-28 Thread Nick Coghlan
On 29 June 2014 05:55, Ben Hoyt  wrote:
> Re is_dir etc being properties rather than methods:
>
>>> I find this behaviour a bit misleading: using methods and have them
>>> return cached results. How much (implementation and/or performance
>>> and/or memory) overhead would incur by using property-like access here?
>>> I think this would underline the static nature of the data.
>>>
>>> This would break the semantics with respect to pathlib, but they're only
>>> marginally equal anyways -- and as far as I understand it, pathlib won't
>>> cache, so I think this has a fair point here.
>>
>> Indeed - using properties rather than methods may help emphasise the
>> deliberate *difference* from pathlib in this case (i.e. value when the
>> result was retrieved from the OS, rather than the value right now). The main
>> benefit is that switching from using the DirEntry object to a pathlib Path
>> will require touching all the places where the performance characteristics
>> switch from "memory access" to "system call". This benefit is also the main
>> downside, so I'd actually be OK with either decision on this one.
>
> The problem with this is that properties "look free", they look just
> like attribute access, so you wouldn't normally handle exceptions when
> accessing them. But .lstat() and .is_dir() etc may do an OS call, so
> if you're needing to be careful with error handling, you may want to
> handle errors on them. Hence I think it's best practice to make them
> functions().
>
> Some of us discussed this on python-dev or python-ideas a while back,
> and I think there was general agreement with what I've stated above
> and therefore they should be methods. But I'll dig up the links and
> add to a Rejected ideas section.

Yes, only the stuff that *never* needs a system call (regardless of
OS) would be a candidate for handling as a property rather than a
method call. Consistency of access would likely trump that idea
anyway, but it would still be worth ensuring that the PEP is clear on
which values are guaranteed to reflect the state at the time of the
directory scanning and which may imply an additional stat call.

>> * it would be nice to see some relative performance numbers for NFS and CIFS
>> network shares - the additional network round trips can make excessive stat
>> calls absolutely brutal from a speed perspective when using a network drive
>> (that's why the stat caching added to the import system in 3.3 dramatically
>> sped up the case of having network drives on sys.path, and why I thought AJ
>> had a point when he was complaining about the fact we didn't expose the
>> dirent data from os.listdir)
>
> Don't know if you saw, but there are actually some benchmarks,
> including one over NFS, on the scandir GitHub page:
>
> https://github.com/benhoyt/scandir#benchmarks

No, I hadn't seen those - may be worth referencing explicitly from the
PEP (and if there's already a reference... oops!)

> os.walk() was 23 times faster with scandir() than the current
> listdir() + stat() implementation on the Windows NFS file system I
> tried. Pretty good speedup!

Ah, nice!

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-29 Thread Nick Coghlan
On 29 June 2014 20:52, Steven D'Aprano  wrote:
> Speaking of caching, is there a way to freshen the cached values?

Switch to a full Path object instead of relying on the cached DirEntry data.

This is what makes me wary of including lstat, even though Windows
offers it without the extra stat call. Caching behaviour is *really*
hard to make intuitive, especially when it *sometimes* returns data
that looks fresh (as it on first call on POSIX systems).

Regards,
Nick.


-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-29 Thread Nick Coghlan
On 29 June 2014 21:45, Paul Moore  wrote:
> On 29 June 2014 12:08, Nick Coghlan  wrote:
>> This is what makes me wary of including lstat, even though Windows
>> offers it without the extra stat call. Caching behaviour is *really*
>> hard to make intuitive, especially when it *sometimes* returns data
>> that looks fresh (as it on first call on POSIX systems).
>
> If it matters that much we *could* simply call it cached_lstat(). It's
> ugly, but I really don't like the idea of throwing the information
> away - after all, the fact that we currently throw data away is why
> there's even a need for scandir. Let's not make the same mistake
> again...

Future-proofing is the reason DirEntry is a full fledged class in the
first place, though.

Effectively communicating the behavioural difference between DirEntry
and pathlib.Path is the main thing that makes me nervous about
adhering too closely to the Path API.

To restate the problem and the alternative proposal, these are the
DirEntry methods under discussion:

is_dir(): like os.path.isdir(), but requires no system calls on at
least POSIX and Windows
is_file(): like os.path.isfile(), but requires no system calls on
at least POSIX and Windows
is_symlink(): like os.path.islink(), but requires no system calls
on at least POSIX and Windows
lstat(): like os.lstat(), but requires no system calls on Windows

For the almost-certain-to-be-cached items, the suggestion is to make
them properties (or just ordinary attributes):

is_dir
is_file
is_symlink

What do with lstat() is currently less clear, since POSIX directory
scanning doesn't provide that level of detail by default.

The PEP also doesn't currently state whether the is_dir(), is_file()
and is_symlink() results would be updated if a call to lstat()
produced different answers than the original directory scanning
process, which further suggests to me that allowing the stat call to
be delayed on POSIX systems is a potentially problematic and
inherently confusing design. We would have two options:

- update them, meaning calling lstat() may change those results from
being a snapshot of the setting at the time the directory was scanned
- leave them alone, meaning the DirEntry object and the
DirEntry.lstat() result may give different answers

Those both sound ugly to me.

So, here's my alternative proposal: add an "ensure_lstat" flag to
scandir() itself, and don't have *any* methods on DirEntry, only
attributes.

That would make the DirEntry attributes:

is_dir: boolean, always populated
is_file: boolean, always populated
is_symlink boolean, always populated
lstat_result: stat result, may be None on POSIX systems if
ensure_lstat is False

(I'm not particularly sold on "lstat_result" as the name, but "lstat"
reads as a verb to me, so doesn't sound right as an attribute name)

What this would allow:

- by default, scanning is efficient everywhere, but lstat_result may
be None on POSIX systems
- if you always need the lstat result, setting "ensure_lstat" will
trigger the extra system call implicitly
- if you only sometimes need the stat result, you can call os.lstat()
explicitly when the DirEntry lstat attribute is None

Most importantly, *regardless of platform*, the cached stat result (if
not None) would reflect the state of the entry at the time the
directory was scanned, rather than at some arbitrary later point in
time when lstat() was first called on the DirEntry object.

There'd still be a slight window of discrepancy (since the filesystem
state may change between reading the directory entry and making the
lstat() call), but this could be effectively eliminated from the
perspective of the Python code by making the result of the lstat()
call authoritative for the whole DirEntry object.

Regards,
Nick.

P.S. We'd be generating quite a few of these, so we can use __slots__
to keep the memory overhead to a minimum (that's just a general
comment - it's really irrelevant to the methods-or-attributes
question).


-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-30 Thread Nick Coghlan
On 30 Jun 2014 19:13, "Glenn Linderman"  wrote:
>
>
> If it is, use ensure_lstat=False, and use the proposed (by me) .refresh()
API to update the data for those that need it.

I'm -1 on a refresh API for DirEntry - just use pathlib in that case.

Cheers,
Nick.

>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Nick Coghlan
On 1 Jul 2014 07:31, "Victor Stinner"  wrote:
>
> 2014-07-01 15:00 GMT+02:00 Ben Hoyt :

> > 2) Nick Coghlan's proposal on the previous thread
> > (https://mail.python.org/pipermail/python-dev/2014-June/135261.html)
> > suggesting an ensure_lstat keyword param to scandir if you need the
> > lstat_result value
>
> I don't like this idea because it makes error handling more complex.
> The syntax to catch exceptions on an iterator is verbose (while: try:
> next() except ...).

Actually, we may need to copy the os.walk API and accept an "onerror"
callback as a scandir argument. Regardless of whether or not we have
"ensure_lstat", the iteration step could fail, so I don't believe we can
just transfer the existing approach of catching exceptions from the listdir
call.

> Whereas calling os.lstat(entry.fullname()) is explicit and it's easy
> to surround it with try/except.
>
>
> > .lstat_result being None sometimes (on POSIX),
>
> Don't do that, it's not how Python handles portability. We use hasattr().

That's not true in general - we do either, depending on context.

With the addition of an os.walk style onerror callback, I'm still in favour
of a "get_lstat" flag (tweaked as Ben suggests to always be None unless
requested, so Windows code is less likely to be inadvertently non-portable)

> > would it ever really happen that readdir() would succeed but an
os.stat() immediately after would fail?
>
> Yes, it can happen. The filesystem is system-wide and shared by all
> users. The file can be deleted.

We need per-iteration error handling for the readdir call anyway, so I
think an onerror callback is a better option than dropping the ability to
easily obtain full stat information as part of the iteration.

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Nick Coghlan
On 1 July 2014 08:42, Ben Hoyt  wrote:
>> We need per-iteration error handling for the readdir call anyway, so I think
>> an onerror callback is a better option than dropping the ability to easily
>> obtain full stat information as part of the iteration.
>
> I don't mind the idea of an "onerror" callback, but it's adding
> complexity. Putting aside the question of caching/timing for a second
> and assuming .lstat() as per the current PEP 471, do we really need
> per-iteration error handling for readdir()? When would that actually
> fail in practice?

An NFS mount dropping the connection or a USB key being removed are
the first that come to mind, but I expect there are others. I find
it's generally better to just assume that any system call may fail for
obscure reasons and put the infrastructure in place to deal with it
rather than getting ugly, hard to track down bugs later.

Cheers,
Nick.



-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Network Security Backport Status

2014-07-01 Thread Nick Coghlan
On 1 Jul 2014 11:28, "Alex Gaynor"  wrote:
>
> I've come up with a new approach, which I believe is most likely to be
> successful, but I'll need help to implement it.
>
> The idea is to find the most recent commit which is a parent of both the
> ``2.7`` and ``default`` branches. Then take every single change to an
``ssl``
> related file on the ``default`` branch, and attempt to replay it on the
``2.7``
> branch. Require manual review on each commit to make sure it compiles,
and to
> ensure it doesn't make any backwards incompatible changes.
>
> I think this provides the most iterative and guided approach to getting
this
> done.

Sounds promising, although it may still have some challenges if the SSL
code depends on earlier changes to other code.

> I can do all the work of reviewing each commit, but I need some help from
a
> mercurial expert to automate the cherry-picking/rebasing of every single
> commit.
>
> What do folks think? Does this approach make sense? Anyone willing to
help with
> the mercurial scripting?

For the Mercurial part, it's probably worth posing that as a Stack Overflow
question:

Given two named branches in http://hg.python.org  (default and 2.7) and 4
files (Python module, C module, tests, docs):
- find the common ancestor
- find all the commits affecting those files on default & graft them to 2.7
(with a chance to test and edit each one first)

It's just a better environment for asking & answering that kind of question
:)

Cheers,
Nick.

>
> Cheers,
> Alex
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] My summary of the scandir (PEP 471)

2014-07-01 Thread Nick Coghlan
On 1 July 2014 14:20, Paul Moore  wrote:
> On 1 July 2014 14:00, Ben Hoyt  wrote:
>> 2) Nick Coghlan's proposal on the previous thread
>> (https://mail.python.org/pipermail/python-dev/2014-June/135261.html)
>> suggesting an ensure_lstat keyword param to scandir if you need the
>> lstat_result value
>>
>> I would make one small tweak to Nick Coghlan's proposal to make
>> writing cross-platform code easier. Instead of .lstat_result being
>> None sometimes (on POSIX), have it None always unless you specify
>> ensure_lstat=True. (Actually, call it get_lstat=True to kind of make
>> this more obvious.) Per (b) above, this means Windows developers
>> wouldn't accidentally write code which failed on POSIX systems -- it'd
>> fail fast on Windows too if you accessed .lstat_result without
>> specifying get_lstat=True.
>
> This is getting very complicated (at least to me, as a Windows user,
> where the basic idea seems straightforward).
>
> It seems to me that the right model is the standard "thin wrapper
> round the OS feature" that acts as a building block - it's typical of
> the rest of the os module. I think that thin wrapper is needed - even
> if the various bells and whistles are useful, they can be built on top
> of a low-level version (whereas the converse is not the case).
> Typically, such thin wrappers expose POSIX semantics by default, and
> Windows behaviour follows as closely as possible (see for example
> stat, where st_ino makes no sense on Windows, but is present). In this
> case, we're exposing Windows semantics, and POSIX is the one needing
> to fit the model, but the principle is the same.
>
> On that basis, optional attributes (as used in stat results) seem
> entirely sensible.
>
> The documentation for DirEntry could easily be written to parallel
> that of a stat result:
>
> """
> The return value is an object whose attributes correspond to the data
> the OS returns about a directory entry:
>
>   * name - the object's name
>   * full_name - the object's full name (including path)
>   * is_dir - whether the object is a directory
>   * is file - whether the object is a plain file
>   * is_symlink - whether the object is a symbolic link
>
> On Windows, the following attributes are also available
>
>   * st_size - the size, in bytes, of the object (only meaningful for files)
>   * st_atime - time of last access
>   * st_mtime - time of last write
>   * st_ctime - time of creation
>   * st_file_attributes - Windows file attribute bits (see the
> FILE_ATTRIBUTE_* constants in the stat module)
> """
>
> That's no harder to understand (or to work with) than the equivalent
> stat result. The only difference is that the unavailable attributes
> can be queried on POSIX, there's just a separate system call involved
> (with implications in terms of performance, error handling and
> potential race conditions).
>
> The version of scandir with the ensure_lstat argument is easy to write
> based on one with optional arguments (I'm playing fast and loose with
> adding attributes to DirEntry values here, just for the sake of an
> example - the details are left as an exercise)
>
> def scandir_ensure(path='.', ensure_lstat=False):
> for entry in os.scandir(path):
> if ensure_lstat and not hasattr(entry, 'st_size'):
> stat_data = os.lstat(entry.full_name)
> entry.st_size = stat_data.st_size
> entry.st_atime = stat_data.st_atime
> entry.st_mtime = stat_data.st_mtime
> entry.st_ctime = stat_data.st_ctime
> # Ignore file_attributes, as we'll never get here on Windows
> yield entry
>
> Variations on how you handle errors in the lstat call, etc, can be
> added to taste.
>
> Please, let's stick to a low-level wrapper round the OS API for the
> first iteration of this feature. Enhancements can be added later, when
> real-world usage has proved their value.

+1 from me - especially if this recipe goes in at least the PEP, and
potentially even the docs.

I'm also OK with postponing onerror support for the time being - that
should be straightforward to add later if we decide we need it.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] buildbot.python.org down again?

2014-07-07 Thread Nick Coghlan
On 7 Jul 2014 10:47, "Guido van Rossum"  wrote:
>
> It would still be nice to know who "the appropriate persons" are. Too
much of our infrastructure seems to be maintained by house elves or the ITA.

I volunteered to be the board's liaison to the infrastructure team, and
getting more visibility around what the infrastructure *is* and how it's
monitored and supported is going to be part of that. That will serve a
couple of key purposes:

- making the points of escalation clearer if anything breaks or needs
improvement (although "infrastruct...@python.org" is a good default choice)
- making the current "todo" list of the infrastructure team more visible
(both to calibrate resolution time expectations and to provide potential
contributors an idea of what's involved)

Noah has already set up http://status.python.org/ to track service status,
I can see about getting buildbot.python.org added to the list.

Cheers,
Nick.

>
>
> On Sun, Jul 6, 2014 at 11:33 PM, Terry Reedy  wrote:
>>
>> On 7/6/2014 7:54 PM, Ned Deily wrote:
>>>
>>> As of the moment, buildbot.python.org seems to be down again.
>>
>>
>> Several hours later, back up.
>>
>>
>> > Where is the best place to report problems like this?
>>
>> We should have, if not already, an automatic system to detect down
servers and report (email) to appropriate persons.
>>
>> --
>> Terry Jan Reedy
>>
>>
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] == on object tests identity in 3.x - summary

2014-07-07 Thread Nick Coghlan
On 7 Jul 2014 19:22, "Andreas Maier"  wrote:
>
> Thanks to all who responded.
>
> In absence of class-specific equality test methods, the default
implementations revert to use the identity (=address) of the object as a
basis for the test, in both Python 2 and Python 3.
>
> In absence of specific ordering test methods, the default implementations
revert to use the identity (=address) of the object as a basis for the
test, in Python 2. In Python 3, an exception is raised in that case.

In Python 2, it orders by type, and only then by id (which happens to be
the address in CPython).

>
> The bottom line of the discussion seems to be that this behavior is
intentional, and a lot of code depends on it.
>
> We still need to figure out how to document this. Options could be:
>
> 1. We define that the default for the value of an object is its identity.
That allows to describe the behavior of the equality test without special
casing such objects, but it does not work for ordering. Also, I have
difficulties stating what constitutes that default case, because it can
really only be explained by referring to the presence or absence of the
class-specific equality test and ordering test methods.
>
> 2. We don't say anything about the default value of an object, and
describe the behavior of the equality test and ordering test, which both
need to cover the case that the object does not have the respective test
methods.

The behaviour of Python 3's type system is fully covered by equality
defaulting to comparing by identity, and ordering comparisons having to be
defined explicitly. The docs at
https://docs.python.org/3/reference/expressions.html#not-in could likely be
clarified, but they do cover this (they just cover a lot about the builtins
at the same time).

> It seems to me that only option 2 really works.

Indeed, and that's the version already documented.

Regards,
Nick.

>
>
> Comments and further options welcome.
>
> Andy
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-09 Thread Nick Coghlan
On 9 Jul 2014 17:14, "Ethan Furman"  wrote:
>
> On 07/09/2014 02:42 PM, Ben Hoyt wrote:
>>>
>>>
>>> Okay, so using that [no platform specific] logic we should head over to
the os module and remove:
>>>
>>>
>>> ctermid, getenv, getegid...
>>>
>>> Okay, I'm tired of typing, but that list is not even half-way through
the os
>>> page, and those are all methods or attributes that are not available on
>>> either Windows or Unix or some flavors of Unix.
>>
>>
>> True, is this really the precedent we want to *aim for*. listdir() is
>> cross-platform,
>
>
> and listdir has serious performance issues, which is why you developed
scandir.
>
>>> Oh, and all those [snipped] upper-case attributes?  Yup, documented.
 And when we
>>>
>>> don't document it ourselves we often refer readers to their system
>>> documentation because Python does not, in fact, return exactly the same
>>> results on all platforms -- particularly when calling into the OS.
>>
>>
>> But again, why a worse, less cross-platform API when a simple,
>> cross-platform one is a method call away?
>
>
> For the same reason we don't use code that makes threaded behavior
better, but kills the single thread application.
>
> If the programmer would rather have consistency on all platforms rather
than performance on the one being used, `info='lstat'` is the option to use.
>
> I like the 'onerror' API better primarily because it gives a single point
to deal with the errors.  This has at least a couple advantages:
>
>   - less duplication of code: in the tree_size example, the error
> handling is duplicated twice
>
>   - readablity: with the error handling in a separate routine, one
> does not have to jump around the try/except blocks looking for
> what happens if there are no errors

The "onerror" approach can also deal with readdir failing, which the PEP
currently glosses over.

I'm somewhat inclined towards the current approach in the PEP, but I'd like
to see an explanation of two aspects:

1. How a scandir variant with an 'onerror' option could be implemented
given the version in the PEP

2. How the existing scandir module handles the 'onerror' parameter to its
directory walking function

Regards,
Nick.

>
> --
> ~Ethan~
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-10 Thread Nick Coghlan
On 10 Jul 2014 03:39, "Victor Stinner"  wrote:
>
> 2014-07-10 9:04 GMT+02:00 Paul Moore :
> > As someone (Tim?) pointed out later in the thread,
> > FindFirstFile/FindNextFile doesn't follow symlinks by default (and nor
> > do the dirent entries on Unix). So whether or not it's "natural", the
> > "free" functionality provided by the OS is that of lstat, not that of
> > stat. Presumably because it's possible to build symlink-following code
> > on top of non-following code, but not the other way around.
>
> DirEntry methods will remain free (no syscall) for directories and
> regular files. One extra syscall will be needed only for symlinks,
> which are more rare than other file types (for example, you wrote "
> Windows typically makes little use of symlinks").

The info we want for scandir is that of the *link itself*. That makes it
easy to implement things like the "followlinks" flag of os.walk. The *far
end* of the link isn't relevant at this level.

The docs just need to be clear that DirEntry objects always match lstat(),
never stat().

Cheers,
Nick.

>
> See my pseudo-code:
> https://mail.python.org/pipermail/python-dev/2014-July/135439.html
>
> On Windows, _lstat and _stat attributes will be filled directly in the
> constructor on Windows for regular files and directories.
>
> Victor
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-12 Thread Nick Coghlan
On 11 Jul 2014 12:46, "Ben Hoyt"  wrote:
>
> [replying to python-dev this time]
>
> >> The "onerror" approach can also deal with readdir failing, which the
> >>  PEP currently glosses over.
> >
> >
> > Do we want this, though?  I can see an error handler for individual
entries,
> > but if one of the *dir commands fails that would seem to be fairly
> > catastrophic.
>
> Very much agreed that this isn't necessary for just readdir/FindNext
> errors. We've never had this level of detail before -- if listdir()
> fails half way through (very unlikely) it just bombs with OSError and
> you get no entries at all.
>
> If you really really want this (again very unlikely), you can always
> use call next() directly and catch OSError around that call.

Agreed - I think the PEP should point this out explicitly, and show that
the approach it takes offers a lot of flexibility in error handling from
"just let it fail", to a single try/catch around the whole loop, to
try/catch just around the operations that might call lstat(), to try/catch
around the individual iteration steps.

os.walk remains the higher level API that most code should be using, and
that has to retain the current listdir based behaviour (any error = ignore
all entries in that directory) for backwards compatibility reasons.

Cheers,
Nick.

>
> -Ben
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3121, 384 Refactoring Issues

2014-07-12 Thread Nick Coghlan
On 10 Jul 2014 19:59, "Alexander Belopolsky" 
wrote:
>
>
> On Thu, Jul 10, 2014 at 2:59 PM, Mark Lawrence 
wrote:
>>
>> I'm just curious as to why there are 54 open issues after both of these
PEPs have been accepted and 384 is listed as finished.  Did we hit some
unforeseen technical problem which stalled development?
>
>
> I tried to bring some sanity to that effort by opening a "meta issue":
>
> http://bugs.python.org/issue15787
>
> My enthusiasm, however, vanished after I reviewed the refactoring for the
datetime module:
>
> http://bugs.python.org/issue15390
>
> My main objections are to following PEP 384 (Stable ABI) within stdlib
modules.  I see little benefit for the stdlib (which is shipped fresh with
every new version of Python) from following those guidelines.

The main downside of "do as we say, not as we do" in this case is that we
miss out on the feedback loop of what the stable ABI is like to *use*. For
example, the docs problem, where it's hard to tell whether an API is part
of the stable ABI or not, or the performance problem Stefan mentions.

Using the stable ABI for standard library extensions also serves to
decouple them further from the internal details of the CPython runtime,
making it more likely they will be able to run correctly on alternative
interpreters (since emulating or otherwise supporting the limited API is
easier than supporting the whole thing).

Cheers,
Nick.

>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-13 Thread Nick Coghlan
On 13 July 2014 11:34, Chris Angelico  wrote:
> On Mon, Jul 14, 2014 at 2:23 AM, Steven D'Aprano  wrote:
>>> We will see
>>> later that that happens. Further, when comparing float NaNs of the same
>>> identity, the list implementation forgot to special-case NaNs. Which
>>> would be a bug, IMHO.
>>
>> "Forgot"? I don't think the behaviour of list comparisons is an
>> accident.
>
> Well, "forgot" is on the basis that the identity check is intended to
> be a mere optimization. If that were the case ("don't actually call
> __eq__ when you reckon it'll return True"), then yes, failing to
> special-case NaN would be a bug. But since it's intended behaviour, as
> explained further down, it's not a bug and not the result of
> forgetfulness.

Right, it's not a mere optimisation - it's the only way to get
containers to behave sensibly. Otherwise we'd end up with nonsense
like:

>>> x = float("nan")
>>> x in [x]
False

That currently returns True because of the identity check - it would
return False if we delegated the check to float.__eq__ because the
defined IEEE754 behaviour for NaN's breaks the mathematical definition
of an equivalence class as a transitive, reflexive and commutative
operation. (It breaks it for *good reasons*, but we still need to
figure out a way of dealing with the impedance mismatch between the
definition of floats and the definition of container invariants like
"assert x in [x]")

The current approach means that the lack of reflexivity of NaN's stays
confined to floats and similar types - it doesn't leak out and infect
the behaviour of the container types.

What we've never figured out is a good place to *document* it. I
thought there was an open bug for that, but I can't find it right now.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-13 Thread Nick Coghlan
On 13 July 2014 13:16, Chris Angelico  wrote:
> On Mon, Jul 14, 2014 at 4:11 AM, Nick Coghlan  wrote:
>> What we've never figured out is a good place to *document* it. I
>> thought there was an open bug for that, but I can't find it right now.
>
> Yeah. The Py3 docs explain why "x in [x]" is True, but I haven't found
> a parallel explanation of sequence equality.

We might need to expand the tables of sequence operations to cover
equality and inequality checks - those are currently missing.

Cheers,
Nick.

>
> ChrisA
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com



-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Another case for frozendict

2014-07-13 Thread Nick Coghlan
On 13 July 2014 13:43,   wrote:
> In its previous form, the PEP seemed more focused on some false
> optimization capabilities of a read-only type, rather than as here, the
> far more interesting hashability properties. It might warrant a fresh
> PEP to more thoroughly investigate this angle.

RIght, the use case would be "frozendict as a simple alternative to a
full class definition", but even less structured than namedtuple in
that the keys may vary as well. That difference means that frozendict
applies more cleanly to semi-structured data manipulated as
dictionaries (think stuff deserialised from JSON) than namedtuple
does.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-13 Thread Nick Coghlan
On 13 Jul 2014 20:54, "Tim Delaney"  wrote:
>
> On 14 July 2014 10:33, Ben Hoyt  wrote:
>>
>>
>>
>> If we go with Victor's link-following .is_dir() and .is_file(), then
>> we probably need to add his suggestion of a follow_symlinks=False
>> parameter (defaults to True). Either that or you have to say
>> "stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit
>> less nice.
>
>
> Absolutely agreed that follow_symlinks is the way to go, disagree on the
default value.
>
>>
>> Given the above arguments for symlink-following is_dir()/is_file()
>> methods (have I missed any, Victor?), what do others think?
>
>
> I would say whichever way you go, someone will assume the opposite. IMO
not following symlinks by default is safer. If you follow symlinks by
default then everyone has the following issues:
>
> 1. Crossing filesystems (including onto network filesystems);
>
> 2. Recursive directory structures (symlink to a parent directory);
>
> 3. Symlinks to non-existent files/directories;
>
> 4. Symlink to an absolutely huge directory somewhere else (very annoying
if you just wanted to do a directory sizer ...).
>
> If follow_symlinks=False by default, only those who opt-in have to deal
with the above.

Or the ever popular symlink to "." (or a directory higher in the tree).

I think os.walk() is a good source of inspiration here: call the flag
"followlink" and default it to False.

Cheers,
Nick.

>
> Tim Delaney
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3121, 384 Refactoring Issues

2014-07-15 Thread Nick Coghlan
On 14 Jul 2014 11:41, "Brett Cannon"  wrote:
>
>
> I agree for PEP  3121 which is the initialization/finalization work. The
stable ABi is not necessary. So maybe we should re-examine the patches and
accept the bits that clean up init/finalization and leave out any
ABi-related changes.

Martin's right about improving the subinterpreter support - every type
declaration we move from a static struct to the dynamic type creation API
is one that isn't shared between subinterpreters any more.

That argument is potentially valid even for *builtin* modules and types,
not just those in extension modules.

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-15 Thread Nick Coghlan
On 14 Jul 2014 22:50, "Ben Hoyt"  wrote:
>
> In light of that, I propose I update the PEP to basically follow
> Victor's model of is_X() and stat() following symlinks by default, and
> allowing you to specify follow_symlinks=False if you want something
> other than that.
>
> Victor had one other question:
>
> > What happens to name and full_name with followlinks=True?
> > Do they contain the name in the directory (name of the symlink)
> > or name of the linked file?
>
> I would say they should contain the name and full path of the entry --
> the symlink, NOT the linked file. They kind of have to, right,
> otherwise they'd have to be method calls that potentially call the
> system.

It would be worth explicitly pointing out "os.readlink(entry.full_name)" in
the docs as the way to get the target of a symlink entry.

Alternatively, it may be worth including a readlink() method directly on
the entry objects. (That can easily be added later though, so no need for
it in the initial proposal).

>
> In any case, here's the modified proposal:
>
> scandir(path='.') -> generator of DirEntry objects, which have:
>
> * name: name as per listdir()
> * full_name: full path name (not necessarily absolute), equivalent of
> os.path.join(path, entry.name)
> * is_dir(follow_symlinks=True): like os.path.isdir(entry.full_name),
> but free in most cases; cached per entry
> * is_file(follow_symlinks=True): like os.path.isfile(entry.full_name),
> but free in most cases; cached per entry
> * is_symlink(): like os.path.islink(), but free in most cases; cached per
entry
> * stat(follow_symlinks=True): like os.stat(entry.full_name,
> follow_symlinks=follow_symlinks); cached per entry
>
> The above may not be quite perfect, but it's good, and I think there's
> been enough bike-shedding on the API. :-)

+1, sounds good to me (and I like having the caching guarantees listed -
helps make it clear how DirEntry differs from pathlib.Path)

Cheers,
Nick.

>
> So please speak now or forever hold your peace. :-) I intend to update
> the PEP to reflect this and make a few other clarifications in the
> next few days.
>
> -Ben
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cStringIO vs io.BytesIO

2014-07-16 Thread Nick Coghlan
On 16 Jul 2014 20:00,  wrote:
> On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote:
> > I believe this problem affects tornado (
https://github.com/tornadoweb/tornado/
> > Do you know if there a workaround? Maybe there is some stdlib part that
I'm
> > missing, or a module on PyPI? It is not that hard to write an own
wrapper that
> > won't do copies (or to port [c]StringIO to 3.x), but I wonder if there
is an
> > existing solution or plans to fix it in Python itself - this BytesIO
use case
> > looks quite important.
>
> Regarding a fix, the problem seems mostly that the StringI/StringO
> specializations were removed, and the new implementation is basically
> just a StringO.

Right, I don't think there's a major philosophy change here, just a missing
optimisation that could be restored in 3.5.

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-21 Thread Nick Coghlan
On 22 Jul 2014 02:46, "Steve Dower"  wrote:
>
> Personally I'd make it a string subclass and put one-shot properties on
it (i.e. call/cache stat() on first access where we don't already know the
answer), which I think is close enough to where it's landed that I'm happy.
(As far as bikeshedding goes, I prefer "_DirEntry" and no docs :) )

+1 for "_DirEntry" as the name in the implementation, and documenting its
behaviour under "scandir" rather than as a standalone object.

Only -0 for full documentation as a standalone class, though.

Cheers,
Nick.

>
> Cheers,
> Steve
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 "scandir" accepted

2014-07-22 Thread Nick Coghlan
On 23 Jul 2014 02:18, "Victor Stinner"  wrote:
>
> 2014-07-22 17:52 GMT+02:00 Ben Hoyt :
> > However, given that we have to support this for listdir() anyway, I
> > think it's worth reconsidering whether scandir()'s directory argument
> > can be an integer FD. Given that listdir() already supports it, it
> > will almost certainly be asked for later anyway for someone who's
> > porting some listdir code that uses an FD. Thoughts, Victor?
>
> Please focus on what was accepted in the PEP. We should first test
> os.scandir(). In a few months, with better feedbacks, we can consider
> extending os.scandir() to support a file descriptor. There are
> different issues which should be discussed and decided to implement it
> (ex: handle the lifetime of the directory file descriptor).

As Victor suggests, getting the core version working and incorporated first
is a good way to go. Future enhancements (like accepting a file descriptor)
and refactorings (like eliminating the code duplication with listdir) don't
need to (and hence shouldn't) go into the initial patch.

Cheers,
Nick.

>
> Victor
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [PEP466] SSLSockets, and sockets, _socketobjects oh my!

2014-07-22 Thread Nick Coghlan
On 23 Jul 2014 07:28, "Antoine Pitrou"  wrote:
>
> Le 22/07/2014 17:03, Alex Gaynor a écrit :
>
>>
>> The question is:
>>
>> a) Should we backport weak referencing _socket.sockets (changing the
structure
>> of the module seems overly invasive, albeit completely backwards
>> compatible)?
>> b) Does anyone know why weak references are used in the first place? The
commit
>> message just alludes to fixing a leak with no reference to an issue.
>
>
> Because :
> - the SSLSocket has a strong reference to the ssl object (self._sslobj)
> - self._sslobj having a strong reference to the SSLSocket would mean both
would only get destroyed on a GC collection
>
> I assume that's what "leak" means here :-)
>
> As for 2.x, I don't see why you couldn't just continue using a strong
reference.

As Antoine says, if the cycle already exists in Python 2 (and it sounds
like it does), we can just skip backporting the weak reference change.

I'll also give the Fedora Python list a heads up about your repo to see if
anyone there can help you with the backport.

Cheers,
Nick.

>
> Regards
>
> Antoine.
>
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [PEP466] SSLSockets, and sockets, _socketobjects oh my!

2014-07-23 Thread Nick Coghlan
On 24 Jul 2014 05:37, "Alex Gaynor"  wrote:
>
> Possible solutions are:
>
> * Pass the SSLObject *in addition* to the _socket.socket object to the C
code.
>   This generates some additional divergence from the Python3 code, but is
>   probably basically straightforward.
> * Try to refactor the socket code in the same way as Python3 did, so we
can
>   pass *only* the SSLObject here. This is some nasty scope creep for
PEP466,
>   but would make the overall _ssl.c diff smaller.
> * Some super sweet and simple thing I haven't thought of yet.
>
> Thoughts?

Wearing my "risk management" hat, option 1 sounds significantly more
appealing than option 2 :)

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Does Zip Importer have to be Special?

2014-07-24 Thread Nick Coghlan
On 25 Jul 2014 03:51, "Brett Cannon"  wrote:

> The problem with all of this is you are essentially asking for a hook to
let you have code have access to the interpreter state before it is fully
initialized. Zipimport and the various bits of code that get loaded during
startup are special since they are coded to avoid touching anything that
isn't ready to be used. So if we expose something that allows access prior
to full initialization it would have to be documented as having no
guarantees of interpreter state, etc. so we are not held to some API that
makes future improvements difficult.

Note that this is *exactly* the problem PEP 432 is designed to handle:
separating the configuration of the core interpreter from the configuration
of the operating system interfaces, so the latter can run relatively
normally (at least compared to today).

As you say, though it's a niche problem compared to something like
packaging, which is why it got bumped down my personal priority list. I
haven't even got back to the first preparatory step I identified which is
to separate out our main functions to a separate "Programs" directory so
it's easier to distinguish "embeds Python" sections of the code from the
more typical "is part of Python" and "extends Python" code.

> IOW allowing for easy patching of Python is probably the best option I
can think of.

Yeah, that sounds reasonable - IIRC, Christian ended up going with a
similar "make it patch friendly" approach for the hashing changes, rather
than going overboard with configuration options.

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Does Zip Importer have to be Special?

2014-07-25 Thread Nick Coghlan
On 25 July 2014 19:33, Phil Thompson  wrote:
> On 24/07/2014 9:42 pm, Nick Coghlan wrote:
>> As you say, though it's a niche problem compared to something like
>> packaging, which is why it got bumped down my personal priority list. I
>> haven't even got back to the first preparatory step I identified which is
>> to separate out our main functions to a separate "Programs" directory so
>> it's easier to distinguish "embeds Python" sections of the code from the
>> more typical "is part of Python" and "extends Python" code.
>
>
> Is there any way for somebody you don't trust :) to be able to help move it
> forward?

This thread prompted me to finally commit one of the smaller pieces of
preparatory refactoring, moving the 3 applications we have that embed
the CPython runtime out to a separate directory:
http://bugs.python.org/issue18093 (that seems like a trivial change,
but I found it made a surprisingly big difference when trying to keep
the various moving parts of the initialisation sequence straight in my
head)

The other preparatory refactoring would be to split the monster
pythonrun.c file in 2, by creating a separate "lifecycle.c" file. In
my original PEP 432 branch I split it into 3 (pythonrun.c,
bootstrap.c, shutdown.c) but that's actually quite an intrusive change
- you end up have to expose a lot of otherwise static variables to the
linker so the startup and shutdown code can both see them. Splitting
in two should achieve most of the same benefits (i.e. separating the
lifecycle management of the interpreter itself from the normal runtime
operation code) without having to expose so much additional
information to the linker (and hence change the names to include the
_Py prefix).

The origin of those refactorings is the fact that attempting to merge
the default branch into my PEP 432 development branch
(https://bitbucket.org/ncoghlan/cpython_sandbox/branch/pep432_modular_bootstrap)
was generally a pain due to the merge conflicts around the structural
changes. Doing the structural refactorings *first* makes it more
feasible to work on the patch and do regular merges in from default.
Since these are areas that aren't likely to change in a maintenance
release, the risk of merge conflicts when merging forward from 3.4 to
default is low even with code moved around on default. By contrast, I
regularly hit significant problems when trying to merge from default
to the feature branch.

The existing feature branch is dated enough now (more than 18 months
since the last commit!) that I wouldn't try to use it directly.
Instead, I'd recommend starting a new clone based on the GitHub or
BitBucket mirror (according to version control system and hosting
service preference), and then use the current PEP draft and my old
feature branch as a point of reference for starting another
implementation attempt. (You may also be able to find some interested
collaborators on http://bugs.python.org/issue13533, as I suspect PEP
432 is a prerequisite to resolving their issues as well)

Cheers,
Nick.

P.S. I'm also starting to think that PEP 432 may pave the way for a
locale independent startup sequence, which would let us offer a "-X
utf8" option to tell the interpreter to ignore the OS locale settings
entirely when deciding which encodings to use for various things. That
would be a possible future enhancement rather than something to pursue
in the initial implementation, though.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Contribute to Python.org

2014-07-30 Thread Nick Coghlan
On 30 July 2014 01:40, Victor Stinner  wrote:
> Hi,
>
> You should read the  Python Developer Guide:
>
> https://docs.python.org/devguide/
>
> You can also join the core mentorship mailing list:
>
> http://pythonmentors.com/

For python.org *itself* (as in, the Django application now powering
the site), the contribution process is not yet as clear, but the code
and issue tracker are at https://github.com/python/pythondotorg and
https://mail.python.org/mailman/listinfo/pydotorg-www is the relevant
mailing list.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Exposing the Android platform existence to Python modules

2014-08-03 Thread Nick Coghlan
On 4 Aug 2014 03:18, "Phil Thompson"  wrote:
>
> On 03/08/2014 4:58 pm, Guido van Rossum wrote:
>>
>> But *are* we going to support Android officially? What's the point? Do
you
>> have a plan for getting Python apps to first-class status in the App
Store
>> (um, Google Play)?
>
>
> I do...
>
> http://pyqt.sourceforge.net/Docs/pyqtdeploy/introduction.html

Nice!

I've only been skimming this thread, but +1 for Android mostly reading as
Linux, but with an extra method in the platform module that gives more
details.

For those interested in mobile app development, Russell Keith-Magee also
announced the release of "toga" [1] here at PyCon AU. That's a Python
specific GUI library that maps directly to native widgets (rather than
using theming as Kivy does). I mention it as one of the things Russell is
specifically looking for is more participation from folks that know the
Android side of things :)

[1] http://pybee.org/toga/

Cheers,
Nick.

>
> Phil
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Surely "nullable" is a reasonable name?

2014-08-04 Thread Nick Coghlan
On 4 Aug 2014 18:16, "Oleg Broytman"  wrote:
>
> Hi!
>
> On Mon, Aug 04, 2014 at 05:12:47PM +1000, Larry Hastings <
la...@hastings.org> wrote:
> > "nullable=True", which means "also accept None
> > for this parameter".  This was originally intended for use with
> > strings (compare the "s" and "z" format units for PyArg_ParseTuple),
> > however it looks like we'll have a use for "nullable ints" in the
> > ongoing Argument Clinic conversion work.
> >
> > Several people have said they found the name "nullable" surprising,
> > suggesting I use another name like "allow_none" or "noneable".  I,
> > in turn, find their surprise surprising; "nullable" is a term long
> > associated with exactly this concept.  It's used in C# and SQL, and
> > the term even has its own Wikipedia page:
> >
> >http://en.wikipedia.org/wiki/Nullable_type
>
>In my very humble opinion, "nullable" is ok, but "allow_none" is
> better.

Yup, this is where I stand as well. The main concern I have with nullable
is that we *are* writing C code when dealing with Argument Clinic, and
"nullable" may make me think of a C NULL rather than Python's None.

Cheers,
Nick.

>
> Oleg.
> --
>  Oleg Broytmanhttp://phdru.name/p...@phdru.name
>Programmers don't die, they just GOSUB without RETURN.
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.walk() is going to be *fast* with scandir

2014-08-09 Thread Nick Coghlan
On 10 August 2014 13:20, Antoine Pitrou  wrote:
> Le 09/08/2014 12:43, Ben Hoyt a écrit :
>
>> Just thought I'd share some of my excitement about how fast the all-C
>> version [1] of os.scandir() is turning out to be.
>>
>> Below are the results of my scandir / walk benchmark run with three
>> different versions. I'm using an SSD, which seems to make it
>> especially faster than listdir / walk. Note that benchmark results can
>> vary a lot, depending on operating system, file system, hard drive
>> type, and the OS's caching state.
>>
>> Anyway, os.walk() can be FIFTY times as fast using os.scandir().
>
>
> Very nice results, thank you :-)

Indeed!

This may actually motivate me to start working on a redesign of
walkdir at some point, with scandir and DirEntry objects as the basis.
My original approach was just too slow to be useful in practice (at
least when working with trees on the scale of a full Fedora or RHEL
build hosted on an NFS share).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Nick Coghlan
On 12 Aug 2014 03:03, "Chris Barker - NOAA Federal" 
wrote:
>
> My confusion is still this:
>
> Repeated summation of strings has been optimized in cpython even
> though it's not the recommended way to solve that problem.

The quadratic behaviour of repeated str summation is a subtle, silent
error. It *is* controversial that CPython silently optimises some cases of
it away, since it can cause problems when porting affected code to other
interpreters that don't use refcounting and thus have a harder time
implementing such a trick.

It's considered worth the cost, since it dramatically improves the
performance of common naive code in a way that doesn't alter the semantics.

> So why not special case optimize sum() for strings? We are already
> special-case strings to raise an exception.
>
> It seems pretty pedantic to say: we cod make this work well, but we'd
> rather chide you for not knowing the "proper" way to do it.

Yes, that's exactly what this is - a nudge towards the right way to
concatenate strings without incurring quadratic behaviour. We *want* people
to learn that distinction, not sweep it under the rug. That's the other
reason the implicit optimisation is controversial - it hides an important
difference in algorithmic complexity from users.

> Practicality beats purity?

Teaching users the difference between linear time operations and quadratic
ones isn't about purity, it's about passing along a fundamental principle
of algorithm scalability.

We do it specifically for strings because they *do* have an optimised
algorithm available that we can point users towards, and concatenating
multiple strings is common.

Other containers don't tend to be concatenated like that in the first
place, so there's no such check pushing other iterables towards
itertools.chain.

Regards,
Nick.

>
> -Chris
>
>
>
>
> > Although that's not the whole story: in
> > practice even numerical sums get split into multiple functions because
> > floating point addition isn't associative, and so needs careful
> > treatment to preserve accuracy.  At that point I'm strongly +1 on
> > abandoning attempts to "rationalize" summation.
> >
> > I'm not sure how I'd feel about raising an exception if you try to sum
> > any iterable containing misbehaved types like float.  But not only
> > would that be a Python 4 effort due to backward incompatibility, but
> > it sorta contradicts the main argument of proponents ("any type
> > implementing __add__ should be sum()-able").
> >
> > ___
> > Python-Dev mailing list
> > Python-Dev@python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Multiline with statement line continuation

2014-08-11 Thread Nick Coghlan
On 12 Aug 2014 09:09, "Allen Li"  wrote:
>
> This is a problem I sometimes run into when working with a lot of files
> simultaneously, where I need three or more `with` statements:
>
> with open('foo') as foo:
> with open('bar') as bar:
> with open('baz') as baz:
> pass
>
> Thankfully, support for multiple items was added in 3.1:
>
> with open('foo') as foo, open('bar') as bar, open('baz') as baz:
> pass
>
> However, this begs the need for a multiline form, especially when
> working with three or more items:
>
> with open('foo') as foo, \
>  open('bar') as bar, \
>  open('baz') as baz, \
>  open('spam') as spam \
>  open('eggs') as eggs:
> pass

I generally see this kind of construct as a sign that refactoring is
needed. For example, contextlib.ExitStack offers a number of ways to manage
multiple context managers dynamically rather than statically.

Regards,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-12 Thread Nick Coghlan
On 12 Aug 2014 11:21, "Chris Barker - NOAA Federal" 
wrote:
>
> Sorry for the bike shedding here, but:
>
>> The quadratic behaviour of repeated str summation is a subtle, silent
error.
>
> OK, fair enough. I suppose it would be hard and ugly to catch those
instances and raise an exception pointing users to "".join.
>>
>> *is* controversial that CPython silently optimises some cases of it
away, since it can cause problems when porting affected code to other
interpreters that don't use refcounting and thus have a harder time
implementing such a trick.
>
> Is there anything in the language spec that says string concatenation is
O(n^2)? Or for that matter any of the performs characteristics of build in
types? Those striker as implementation details that SHOULD be particular to
the implementation.

If you implement strings so they have multiple data segments internally (as
is the case for StringIO these days), yes, you can avoid quadratic time
concatenation behaviour. Doing so makes it harder to meet other complexity
expectations (like O(1) access to arbitrary code points), and isn't going
to happen in CPython regardless due to C API backwards compatibility
constraints.

For the explicit loop with repeated concatenation, we can't say "this is
slow, don't do it". People do it anyway, so we've opted for the "fine, make
it as fast as we can" option as being preferable to an obscure and
relatively hard to debug performance problem.

For sum(), we have the option of being more direct and just telling people
Python's answer to the string concatenation problem (i.e. str.join). That
is decidedly *not* the series of operations described in sum's
documentation as "Sums start and the items of an iterable from left to
right and returns the total."

Regards,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Multiline with statement line continuation

2014-08-13 Thread Nick Coghlan
On 12 August 2014 22:15, Steven D'Aprano  wrote:
> Compare the natural way of writing this:
>
> with open("spam") as spam, open("eggs", "w") as eggs, frobulate("cheese") as 
> cheese:
> # do stuff with spam, eggs, cheese
>
> versus the dynamic way:
>
> with ExitStack() as stack:
> spam, eggs = [stack.enter_context(open(fname), mode) for fname, mode in
>   zip(("spam", "eggs"), ("r", "w")]
> cheese = stack.enter_context(frobulate("cheese"))
> # do stuff with spam, eggs, cheese

You wouldn't necessarily switch at three. At only three, you have lots
of options, including multiple nested with statements:

with open("spam") as spam:
with open("eggs", "w") as eggs:
with frobulate("cheese") as cheese:
# do stuff with spam, eggs, cheese

The "multiple context managers in one with statement" form is there
*solely* to save indentation levels, and overuse can often be a sign
that you may have a custom context manager trying to get out:

@contextlib.contextmanager
def dish(spam_file, egg_file, topping):
with open(spam_file), open(egg_file, 'w'), frobulate(topping):
yield

with dish("spam", "eggs", "cheese") as spam, eggs, cheese:
# do stuff with spam, eggs & cheese

ExitStack is mostly useful as a tool for writing flexible custom
context managers, and for dealing with context managers in cases where
lexical scoping doesn't necessarily work, rather than being something
you'd regularly use for inline code.

"Why do I have so many contexts open at once in this function?" is a
question developers should ask themselves in the same way its worth
asking "why do I have so many local variables in this function?"

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reviving restricted mode?

2014-08-14 Thread Nick Coghlan
On 14 August 2014 07:25, Victor Stinner  wrote:
> Hi,
>
> I heard that PyPy sandbox cannot be used out of the box. You have to write a
> policy to allow syscalls. The complexity is moved to this policy which is
> very hard to write, especially if you only use whitelists.
>
> Correct me if I'm wrong. To be honest, I never take a look at this sandbox.

By default, the PyPy sandbox requires all system access to be proxied
through the host application (which is running in a separate process).
Similarly, using "sandbox" on Fedora (et al) will get you a default
deny OS level sandbox, where you have to provide selective access to
things outside the box.

The effective decision taken when rexec and Bastion were removed from
the standard library was "sandboxing is hard enough for operating
systems to get right, we're not going to try to tackle the even harder
problem of an in-process sandbox".

"Deny all" sandboxes are relatively easy, but also relatively useless.
It's "allow these activities, but no others" that's difficult, since
any kind of access can often be leveraged into greater access than was
intended.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documenting enum types

2014-08-14 Thread Nick Coghlan
On 14 August 2014 19:25, Victor Stinner  wrote:
> Hi,
>
> IMO we should not document enum types because Python implementations other
> than CPython may want to implement them differently (ex: not all Python
> implementations have an enum module currently). By experience, exposing too
> many things in the public API becomes a problem later when you want to
> modify the code.

Implementations claiming conformance with Python 3.4 will have to have
an enum module - there just aren't any of those other than CPython at
this point (I expect PyPy3 will catch up before too long, since the
changes between 3.2 and 3.4 shouldn't be too dramatic from an
implementation perspective).

In this particular case, though, I think the relevant question is "Why
are they enums?" and the answer is "for the better representations".
I'm not clear on the use case for exposing and documenting the enum
types themselves (although I don't have any real objection either).

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 467: Minor API improvements for bytes & bytearray

2014-08-14 Thread Nick Coghlan
I just posted an updated version of PEP 467 after recently finishing
the updates to the Python 3.4+ binary sequence docs to decouple them
from the str docs.

Key points in the proposal:

* deprecate passing integers to bytes() and bytearray()
* add bytes.zeros() and bytearray.zeros() as a replacement
* add bytes.byte() and bytearray.byte() as counterparts to ord() for binary data
* add bytes.iterbytes(), bytearray.iterbytes() and memoryview.iterbytes()

As far as I am aware, that last item poses the only open question,
with the alternative being to add an "iterbytes" builtin with a
definition along the lines of the following:

def iterbytes(data):
try:
getiter = type(data).__iterbytes__
except AttributeError:
iter = map(bytes.byte, data)
else:
iter = getiter(data)
return iter

Regards,
Nick.

PEP URL: http://www.python.org/dev/peps/pep-0467/

Full PEP text:
=
PEP: 467
Title: Minor API improvements for bytes and bytearray
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-03-30
Python-Version: 3.5
Post-History: 2014-03-30 2014-08-15


Abstract


During the initial development of the Python 3 language specification, the
core ``bytes`` type for arbitrary binary data started as the mutable type
that is now referred to as ``bytearray``. Other aspects of operating in
the binary domain in Python have also evolved over the course of the Python
3 series.

This PEP proposes a number of small adjustments to the APIs of the ``bytes``
and ``bytearray`` types to make it easier to operate entirely in the binary
domain.


Background
==

To simplify the task of writing the Python 3 documentation, the ``bytes``
and ``bytearray`` types were documented primarily in terms of the way they
differed from the Unicode based Python 3 ``str`` type. Even when I
`heavily revised the sequence documentation
<http://hg.python.org/cpython/rev/463f52d20314>`__ in 2012, I retained that
simplifying shortcut.

However, it turns out that this approach to the documentation of these types
had a problem: it doesn't adequately introduce users to their hybrid nature,
where they can be manipulated *either* as a "sequence of integers" type,
*or* as ``str``-like types that assume ASCII compatible data.

That oversight has now been corrected, with the binary sequence types now
being documented entirely independently of the ``str`` documentation in
`Python 3.4+ 
<https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview>`__

The confusion isn't just a documentation issue, however, as there are also
some lingering design quirks from an earlier pre-release design where there
was *no* separate ``bytearray`` type, and instead the core ``bytes`` type
was mutable (with no immutable counterpart).

Finally, additional experience with using the existing Python 3 binary
sequence types in real world applications has suggested it would be
beneficial to make it easier to convert integers to length 1 bytes objects.


Proposals
=

As a "consistency improvement" proposal, this PEP is actually about a few
smaller micro-proposals, each aimed at improving the usability of the binary
data model in Python 3. Proposals are motivated by one of two main factors:

* removing remnants of the original design of ``bytes`` as a mutable type
* allowing users to easily convert integer values to a length 1 ``bytes``
  object


Alternate Constructors
--

The ``bytes`` and ``bytearray`` constructors currently accept an integer
argument, but interpret it to mean a zero-filled object of the given length.
This is a legacy of the original design of ``bytes`` as a mutable type,
rather than a particularly intuitive behaviour for users. It has become
especially confusing now that some other ``bytes`` interfaces treat integers
and the corresponding length 1 bytes instances as equivalent input.
Compare::

>>> b"\x03" in bytes([1, 2, 3])
True
>>> 3 in bytes([1, 2, 3])
True

>>> bytes(b"\x03")
b'\x03'
>>> bytes(3)
b'\x00\x00\x00'

This PEP proposes that the current handling of integers in the bytes and
bytearray constructors by deprecated in Python 3.5 and targeted for
removal in Python 3.7, being replaced by two more explicit alternate
constructors provided as class methods. The initial python-ideas thread
[ideas-thread1]_ that spawned this PEP was specifically aimed at deprecating
this constructor behaviour.

Firstly, a ``byte`` constructor is proposed that converts integers
in the range 0 to 255 (inclusive) to a ``bytes`` object::

>>> bytes.byte(3)
b'\x03'
>>> bytearray.byte(3)
bytearray(b'\x03')
>>

Re: [Python-Dev] PEP 467: Minor API improvements for bytes & bytearray

2014-08-15 Thread Nick Coghlan
On 16 August 2014 03:48, Guido van Rossum  wrote:
> This feels chatty. I'd like the PEP to call out the specific proposals and
> put the more verbose motivation later.

I realised that some of that history was actually completely
irrelevant now, so I culled a fair bit of it entirely.

> It took me a long time to realize
> that you don't want to deprecate bytes([1, 2, 3]), but only bytes(3).

I've split out the four subproposals into their own sections, so
hopefully this is clearer now.

> Also
> your mention of bytes.byte() as the counterpart to ord() confused me -- I
> think it's more similar to chr().

This was just a case of me using the wrong word - I meant "inverse"
rather than "counterpart".

> I don't like iterbytes as a builtin, let's
> keep it as a method on affected types.

Done. I also added an explanation of the benefits it offers over the
more generic "map(bytes.byte, data)", as well as more precise
semantics for how it will work with memoryview objects.

New draft is live at http://www.python.org/dev/peps/pep-0467/, as well
as being included inline below.

Regards,
Nick.

===

PEP: 467
Title: Minor API improvements for bytes and bytearray
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-03-30
Python-Version: 3.5
Post-History: 2014-03-30 2014-08-15 2014-08-16


Abstract


During the initial development of the Python 3 language specification, the
core ``bytes`` type for arbitrary binary data started as the mutable type
that is now referred to as ``bytearray``. Other aspects of operating in
the binary domain in Python have also evolved over the course of the Python
3 series.

This PEP proposes four small adjustments to the APIs of the ``bytes``,
``bytearray`` and ``memoryview`` types to make it easier to operate entirely
in the binary domain:

* Deprecate passing single integer values to ``bytes`` and ``bytearray``
* Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors
* Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors
* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
  ``memoryview.iterbytes`` alternative iterators


Proposals
=

Deprecation of current "zero-initialised sequence" behaviour


Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
argument and interpret it as meaning to create a zero-initialised sequence
of the given size::

>>> bytes(3)
b'\x00\x00\x00'
>>> bytearray(3)
bytearray(b'\x00\x00\x00')

This PEP proposes to deprecate that behaviour in Python 3.5, and remove it
entirely in Python 3.6.

No other changes are proposed to the existing constructors.


Addition of explicit "zero-initialised sequence" constructors
-

To replace the deprecated behaviour, this PEP proposes the addition of an
explicit ``zeros`` alternative constructor as a class method on both
``bytes`` and ``bytearray``::

>>> bytes.zeros(3)
b'\x00\x00\x00'
>>> bytearray.zeros(3)
bytearray(b'\x00\x00\x00')

It will behave just as the current constructors behave when passed a single
integer.

The specific choice of ``zeros`` as the alternative constructor name is taken
from the corresponding initialisation function in NumPy (although, as these
are 1-dimensional sequence types rather than N-dimensional matrices, the
constructors take a length as input rather than a shape tuple)


Addition of explicit "single byte" constructors
---

As binary counterparts to the text ``chr`` function, this PEP proposes the
addition of an explicit ``byte`` alternative constructor as a class method
on both ``bytes`` and ``bytearray``::

>>> bytes.byte(3)
b'\x03'
>>> bytearray.byte(3)
bytearray(b'\x03')

These methods will only accept integers in the range 0 to 255 (inclusive)::

>>> bytes.byte(512)
Traceback (most recent call last):
  File "", line 1, in 
ValueError: bytes must be in range(0, 256)

>>> bytes.byte(1.0)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'float' object cannot be interpreted as an integer

The documentation of the ``ord`` builtin will be updated to explicitly note
that ``bytes.byte`` is the inverse operation for binary data, while ``chr``
is the inverse operation for text data.

Behaviourally, ``bytes.byte(x)`` will be equivalent to the current
``bytes([x])`` (and similarly for ``bytearray``). The new spelling is
expected to be easier to discover and 

Re: [Python-Dev] Multiline with statement line continuation

2014-08-16 Thread Nick Coghlan
On 17 August 2014 07:42, Chris Angelico  wrote:
> On Sat, Aug 16, 2014 at 10:47 PM, Marko Rauhamaa  wrote:
>>
>> You might be able to have it bothways. You could have:
>>
>>with (open(name) for name in os.listdir("config")) as files:
>
> But that's not a tuple, it's a generator. Should generators be context
> managers? Is anyone seriously suggesting this? I don't think so. Is
> this solutions looking for problems?

Yes. We have a whole programming language to play with, when "X is
hard to read" becomes a problem, it may be time to reach for a better
tool. If the context manager line is getting unwieldy, it's often a
sign it's time to factor it out to a dedicated helper, or break it up
into multiple with statements :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?

2014-08-16 Thread Nick Coghlan
I've seen a few people on python-ideas express the assumption that
there will be another Py3k style compatibility break for Python 4.0.

I've also had people express the concern that "you broke compatibility
in a major way once, how do we know you won't do it again?".

Both of those contrast strongly with Guido's stated position that he
never wants to go through a transition like the 2->3 one again.

Barry wrote PEP 404 to make it completely explicit that python-dev had
no plans to create a Python 2.8 release. Would it be worth writing a
similarly explicit "not an option" PEP explaining that the regular
deprecation and removal process (roughly documented in PEP 387) is the
*only* deprecation and removal process? It could also point to the
fact that we now have PEP 411 (provisional APIs) to help reduce our
chances of being locked indefinitely into design decisions we aren't
happy with.

If folks (most signficantly, Guido) are amenable to the idea, it
shouldn't take long to put such a PEP together, and I think it could
help reduce some of the confusions around the expectations for Python
4.0 and the evolution of 3.x in general.

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?

2014-08-16 Thread Nick Coghlan
On 17 August 2014 12:43, Guido van Rossum  wrote:
> On Sat, Aug 16, 2014 at 6:28 PM, Nick Coghlan  wrote:
>> I've also had people express the concern that "you broke compatibility
>> in a major way once, how do we know you won't do it again?".
>
>
> Well, they won't, really. You can't predict the future. But really, that's a
> pretty poor way to say "please don't do it again."
>
> I'm not sure why, but I hate when someone starts a suggestion or a question
> with "why doesn't Python ..." and I have to fight the urge to reply in a
> flippant way without answering the real question. (And just now I did it
> again.)
>
> I suppose this phrasing may actually be meant as a form of politeness, but
> to me it often sounds passive-aggressive, pretend-polite. (Could it be a
> matter of cultural difference? The internet is full of broken English, my
> own often included.)

I don't mind it if the typical answers are accepted as valid:

*  "because it has these downsides, and those are considered to
outweigh the benefits"
*  "because it's difficult, and it never bothered anyone enough for
them to put in the work to do something about it"

Those aren't always obvious, especially to folks that don't have a lot
of experience with long lived software projects (I had only just
started high school when Python was first released!), so I don't mind
explaining them when I have time.

>> Both of those contrast strongly with Guido's stated position that he
>> never wants to go through a transition like the 2->3 one again.
>
> Right. What's more, when I say that, I don't mean that you should wait until
> I retire -- I think it's genuinely a bad idea.

Absolutely agreed - I think the Unicode change was worthwhile (even
with the impact proving to be higher than expected), but there isn't
any such fundamental change to the data model lurking for Python 3.

> I also don't expect that it'll be necessary -- in fact, I am counting on
> tools (e.g. static analysis!) to improve to the point where there won't be a
> reason for such a transition.

The fact that things like Hylang and MacroPy can already run on the
CPython VM also shows that other features (like import hooks and the
AST compiler) have evolved to the point where the Python data model
and runtime semantics can be more effectively decoupled from syntactic
details.

> (Don't understand this to mean that we should never deprecate things.
> Deprecations will happen, they are necessary for the evolution of any
> programming language. But they won't ever hurt in the way that Python 3
> hurt.)

Right. I think Python 2 has been stable for so long that I sometimes
wonder if folks forget (or never knew?) we used to deprecate things
within the Python 2 series as well, such that code that ran on Python
2.x wasn't necessarily guaranteed to run on Python 2.(x+2). "Never
deprecate anything" is a recipe for unbounded growth in complexity.

Benjamin has made a decent start on documenting that normal
deprecation process in PEP 387, so I'd also suggest refining that a
bit and getting it to "Accepted" as part of any explicit "Python 4.x
won't be as disruptive as 3.x" clarification.

>> no plans to create a Python 2.8 release. Would it be worth writing a
>> similarly explicit "not an option" PEP explaining that the regular
>> deprecation and removal process (roughly documented in PEP 387) is the
>> *only* deprecation and removal process? It could also point to the
>> fact that we now have PEP 411 (provisional APIs) to help reduce our
>> chances of being locked indefinitely into design decisions we aren't
>> happy with.
>>
>> If folks (most significantly, Guido) are amenable to the idea, it
>>
>> shouldn't take long to put such a PEP together, and I think it could
>> help reduce some of the confusions around the expectations for Python
>> 4.0 and the evolution of 3.x in general.
>
> But what should it say?

The specific things I was thinking we could point out were:

- PEP 387, documenting the normal deprecation process that existed
even in Python 2
- highlighting the increased preference for "documented deprecation
only" in cases where maintaining something isn't actively causing
problems, there are just better alternatives now available
- PEP 411, the (still relatively new) provisional API concept
- PEP 405, adding pyvenv as a standard part of Python
- PEP 453, better integrating PyPI into the recommended way of working
with the language

Those all help change the way the language evolves, as they reduce the
pressure to rush things into the standard library before they'

Re: [Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?

2014-08-16 Thread Nick Coghlan
On 17 August 2014 15:08, Guido van Rossum  wrote:
> I think this would be a great topic for a blog post. Once you've written it
> I can even bless it by Tweeting about it. :-)

Sounds like a plan - I'll try to put together something coherent this week :)

> PS. Why isn't PEP 387 accepted yet?

Not sure - it mostly looks correct to me. I suspect it just fell off
the radar since it's a "describe what we're already doing anyway" kind
of document.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?

2014-08-17 Thread Nick Coghlan
On 17 August 2014 15:34, Nick Coghlan  wrote:
> On 17 August 2014 15:08, Guido van Rossum  wrote:
>> I think this would be a great topic for a blog post. Once you've written it
>> I can even bless it by Tweeting about it. :-)
>
> Sounds like a plan - I'll try to put together something coherent this week :)

OK, make that "this afternoon":
http://www.curiousefficiency.org/posts/2014/08/python-4000.html :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: Minor API improvements for bytes & bytearray

2014-08-17 Thread Nick Coghlan
On 17 August 2014 18:13, Raymond Hettinger  wrote:
>
> On Aug 14, 2014, at 10:50 PM, Nick Coghlan  wrote:
>
> Key points in the proposal:
>
> * deprecate passing integers to bytes() and bytearray()
>
>
> I'm opposed to removing this part of the API.  It has proven useful
> and the alternative isn't very nice.   Declaring the size of fixed length
> arrays is not a new concept and is widely adopted in other languages.
> One principal use case for the bytearray is creating and manipulating
> binary data.  Initializing to zero is common operation and should remain
> part of the core API (consider why we now have list.copy() even though
> copying with a slice remains possible and efficient).

That's why the PEP proposes adding a "zeros" method, based on the name
of the corresponding NumPy construct.

The status quo has some very ugly failure modes when an integer is
passed unexpectedly, and tries to create a large buffer, rather than
throwing a type error.

> I and my clients have taken advantage of this feature and it reads nicely.

If I see "bytearray(10)" there is nothing there that suggests "this
creates an array of length 10 and initialises it to zero" to me. I'd
be more inclined to guess it would be equivalent to "bytearray([10])".

"bytearray.zeros(10)", on the other hand, is relatively clear,
independently of user expectations.

> The proposed deprecation would break our code and not actually make
> anything better.
>
> Another thought is that the core devs should be very reluctant to deprecate
> anything we don't have to while the 2 to 3 transition is still in progress.
> Every new deprecation of APIs that existed in Python 2.7 just adds another
> obstacle to converting code.  Individually, the differences are trivial.
> Collectively, they present a good reason to never migrate code to Python 3.

This is actually one of the inconsistencies between the Python 2 and 3
binary APIs:

Python 2.7.5 (default, Jun 25 2014, 10:19:55)
[GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> bytes(10)
'10'
>>> bytearray(10)
bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')

Users wanting well-behaved binary sequences in Python 2.7 would be
well advised to use the "future" module to get a full backport of the
actual Python 3 bytes type, rather than the approximation that is the
8-bit str in Python 2. And once they do that, they'll be able to track
the evolution of the Python 3 binary sequence behaviour without any
further trouble.

That said, I don't really mind how long the deprecation cycle is. I'd
be fine with fully supporting both in 3.5 (2015), deprecating the main
constructor in favour of the explicit zeros() method in 3.6 (2017) and
dropping the legacy behaviour in 3.7 (2018)

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fwd: PEP 467: Minor API improvements for bytes & bytearray

2014-08-17 Thread Nick Coghlan
On 18 Aug 2014 08:04, "Markus Unterwaditzer" 
wrote:
>
> On Sun, Aug 17, 2014 at 05:41:10PM -0400, Barry Warsaw wrote:
> > I think the biggest API "problem" is that default iteration returns
integers
> > instead of bytes.  That's a real pain.
>
> I agree, this behavior required some helper functions while porting
Werkzeug to
> Python 3 AFAIK.
>
> >
> > I'm not sure .iterbytes() is the best name for spelling iteration over
bytes
> > instead of integers though.  Given that we can't change __iter__(), I
> > personally would perhaps prefer a simple .bytes property over which if
you
> > iterated you would receive bytes, e.g.
>
> I'd rather be for a .bytes() method, to match the .values(), and .keys()
> methods on dictionaries.

Calling it bytes is too confusing:

for x in bytes(data):
   ...

for x in bytes(data).bytes()

When referring to bytes, which bytes do you mean, the builtin or the method?

iterbytes() isn't especially attractive as a method name, but it's far more
explicit about its purpose.

Cheers,
Nick.

>
> -- Markus
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: Minor API improvements for bytes & bytearray

2014-08-17 Thread Nick Coghlan
On 18 Aug 2014 03:07, "Raymond Hettinger" 
wrote:
>
>
> On Aug 17, 2014, at 1:41 AM, Nick Coghlan  wrote:
>
>> If I see "bytearray(10)" there is nothing there that suggests "this
>> creates an array of length 10 and initialises it to zero" to me. I'd
>> be more inclined to guess it would be equivalent to "bytearray([10])".
>>
>> "bytearray.zeros(10)", on the other hand, is relatively clear,
>> independently of user expectations.
>
>
> Zeros would have been great but that should have been done originally.
> The time to get API design right is at inception.
> Now, you're just breaking code and invalidating any published examples.

I'm fine with postponing the deprecation elements indefinitely (or just
deprecating bytes(int) and leaving bytearray(int) alone).

>
>>>
>>> Another thought is that the core devs should be very reluctant to
deprecate
>>> anything we don't have to while the 2 to 3 transition is still in
progress.
>>> Every new deprecation of APIs that existed in Python 2.7 just adds
another
>>> obstacle to converting code.  Individually, the differences are trivial.
>>> Collectively, they present a good reason to never migrate code to
Python 3.
>>
>>
>> This is actually one of the inconsistencies between the Python 2 and 3
>> binary APIs:
>
>
> However, bytearray(n) is the same in both Python 2 and Python 3.
> Changing it in Python 3 increases the gulf between the two.
>
> The further we let Python 3 diverge from Python 2, the less likely that
> people will convert their code and the harder you make it to write code
> that runs under both.
>
> FWIW, I've been teaching Python full time for three years.  I cover the
> use of bytearray(n) in my classes and not a single person out of 3000+
> engineers have had a problem with it.   I seriously question the PEP's
> assertion that there is a real problem to be solved (i.e. that people
> are baffled by bytearray(bufsiz)) and that the problem is sufficiently
> painful to warrant the headaches that go along with API changes.

Yes, I'd expect engineers and networking folks to be fine with it. It isn't
how this mode of the constructor *works* that worries me, it's how it
*fails* (i.e. silently producing unexpected data rather than a type error).

Purely deprecating the bytes case and leaving bytearray alone would likely
address my concerns.

>
> The other proposal to add bytearray.byte(3) should probably be named
> bytearray.from_byte(3) for clarity.  That said, I question whether there
is
> actually a use case for this.   I have never seen seen code that has a
> need to create a byte array of length one from a single integer.
> For the most part, the API will be easiest to learn if it matches what
> we do for lists and for array.array.

This part of the proposal came from a few things:

* many of the bytes and bytearray methods only accept bytes-like objects,
but iteration and indexing produce integers
* to mitigate the impact of the above, some (but not all) bytes and
bytearray methods now accept integers in addition to bytes-like objects
* ord() in Python 3 is only documented as accepting length 1 strings, but
also accepts length 1 bytes-like objects

Adding bytes.byte() makes it practical to document the binary half of ord's
behaviour, and eliminates any temptation to expand the "also accepts
integers" behaviour out to more types.

bytes.byte() thus becomes the binary equivalent of chr(), just as Python 2
had both chr() and unichr().

I don't recall ever needing chr() in a real program either, but I still
consider it an important part of clearly articulating the data model.

> Sorry Nick, but I think you're making the API worse instead of better.
> This API isn't perfect but it isn't flat-out broken either.   There is
some
> unfortunate asymmetry between bytes() and bytearray() in Python 2,
> but that ship has sailed.  The current API for Python 3 is pretty good
> (though there is still a tension between wanting to be like lists and like
> strings both at the same time).

Yes. It didn't help that the docs previously expected readers to infer the
behaviour of the binary sequence methods from the string documentation -
while the new docs could still use some refinement, I've at least addressed
that part of the problem.

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fwd: PEP 467: Minor API improvements for bytes & bytearray

2014-08-17 Thread Nick Coghlan
On 18 Aug 2014 08:55, "Barry Warsaw"  wrote:
>
> On Aug 18, 2014, at 08:48 AM, Nick Coghlan wrote:
>
> >Calling it bytes is too confusing:
> >
> >for x in bytes(data):
> >   ...
> >
> >for x in bytes(data).bytes()
> >
> >When referring to bytes, which bytes do you mean, the builtin or the
method?
> >
> >iterbytes() isn't especially attractive as a method name, but it's far
more
> >explicit about its purpose.
>
> I don't know.  How often do you really instantiate the bytes object there
in
> the for loop?

I'm talking more generally - do you *really* want to be explaining that
"bytes" behaves like a tuple of integers, while "bytes.bytes" behaves like
a tuple of bytes?

Namespaces are great and all, but using the same name for two different
concepts is still inherently confusing.

Cheers,
Nick.

>
> -Barry
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: Minor API improvements for bytes & bytearray

2014-08-17 Thread Nick Coghlan
On 18 Aug 2014 09:41, "Raymond Hettinger" 
wrote:
>
>
> I encourage restraint against adding an unneeded class method that has no
parallel
> elsewhere.  Right now, the learning curve is mitigated because bytes is
very str-like
> and because bytearray is list-like (i.e. the method names have been used
elsewhere
> and likely already learned before encountering bytes() or bytearray()).
 Putting in new,
> rarely used funky method adds to the learning burden.
>
> If you do press forward with adding it (and I don't see why), then as an
alternate
> constructor, the name should be from_int() or some such to avoid ambiguity
> and to make clear that it is a class method.

If I remember the sequence of events correctly, I thought of
map(bytes.byte, data) first, and then Guido suggested a dedicated
iterbytes() method later.

The step I hadn't taken (until now) was realising that the new
memoryview(data).iterbytes() capability actually combines with the existing
(bytes([b]) for b in data) to make the original bytes.byte idea unnecessary.

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fwd: PEP 467: Minor API improvements for bytes & bytearray

2014-08-17 Thread Nick Coghlan
On 18 Aug 2014 09:57, "Barry Warsaw"  wrote:
>
> On Aug 18, 2014, at 09:12 AM, Nick Coghlan wrote:
>
> >I'm talking more generally - do you *really* want to be explaining that
> >"bytes" behaves like a tuple of integers, while "bytes.bytes" behaves
like
> >a tuple of bytes?
>
> I would explain it differently though, using concrete examples.
>
> data = bytes(...)
> for i in data: # iterate over data as integers
> for i in data.bytes: # iterate over data as bytes
>
> But whatever.  I just wish there was something better than iterbytes.

There's actually another aspect to your idea, independent of the naming:
exposing a view rather than just an iterator. I'm going to have to look at
the implications for memoryview, but it may be a good way to go (and would
align with the iterator -> view changes in dict).

Cheers,
Nick.

>
> -Barry
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?

2014-08-17 Thread Nick Coghlan
On 18 August 2014 11:14, Donald Stufft  wrote:
> On Sun, Aug 17, 2014, at 09:02 PM, Guido van Rossum wrote:
>> I'm unsure about what's the single biggest pain moving to Python 3. In the 
>> past I would have said that it's for sure the bytes/str split (which both 
>> the biggest pain and the biggest payoff).
>>
>> But if I look carefully into the soul of teams that are still on 2.7 (I know 
>> a few... :-), I think the real reason is that Python 3 changes so many 
>> different things, you have to actually understand your code to port it 
>> (unlike with minor version transitions, where the changes usually spike in 
>> one specific area, and you can leave the rest to normal attrition and 
>> periodic maintenance).
>>
>
> In my experience bytes/str is the single biggest change that causes the
> most problems. Most of the other changes can be mechanically transformed
> and/or papered over using helpers like six. The bytes/str change is the
> main one that requires understanding code and where it requires a
> serious untangling of things in code bases where str/bytes are freely
> used intechangingbly. Often times this requires making a decision about
> what *should* be bytes or str as well which requires having some deep
> knowledge about the APIs in question too.

It's certainly the one that has caused the most churn in CPython and
the standard library - the ripples still haven't entirely settled on
that front :)

I think Guido's right that there's also a "death of a thousand cuts"
aspect for large existing code bases, though, especially those that
are lacking comprehensive test suites. By definition, existing large
Python 2 applications are OK with the restrictions imposed by Python
2, and we're deliberately not forcing the issue by halting Python 2
maintenance. That's where Steve Dower's idea of being able to
progressively declare a code base "Python 3 compatible" on a file by
file basis and have some means of programmatically enforcing that is
interesting - it opens the door to "opportunistic and incremental"
porting, where modules are progressively updated to run on both, until
an application reaches a point where it can switch to Python 3 and
leave Python 2 behind.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fwd: PEP 467: Minor API improvements for bytes & bytearray

2014-08-19 Thread Nick Coghlan
On 18 August 2014 10:45, Guido van Rossum  wrote:
> On Sun, Aug 17, 2014 at 5:22 PM, Barry Warsaw  wrote:
>>
>> On Aug 18, 2014, at 10:08 AM, Nick Coghlan wrote:
>>
>> >There's actually another aspect to your idea, independent of the naming:
>> >exposing a view rather than just an iterator. I'm going to have to look
>> > at
>> >the implications for memoryview, but it may be a good way to go (and
>> > would
>> >align with the iterator -> view changes in dict).
>>
>> Yep!  Maybe that will inspire a better spelling. :)
>
>
> +1. It's just as much about b[i] as it is about "for c in b", so a view
> sounds right. (The view would have to be mutable for bytearrays and for
> writable memoryviews.)
>
> On the rest, it's sounding more and more as if we will just need to live
> with both bytes(1000) and bytearray(1000). A warning sounds worse than a
> deprecation to me.

I'm fine with keeping bytearray(1000), since that works the same way
in both Python 2 & 3, and doesn't seem likely to be invoked
inadvertently.

I'd still like to deprecate "bytes(1000)", since that does different
things in Python 2 & 3, while "b'\x00' * 1000" does the same thing in
both.

$ python -c 'print("{!r}\n{!r}".format(bytes(10), b"\x00" * 10))'
'10'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
$ python3 -c 'print("{!r}\n{!r}".format(bytes(10), b"\x00" * 10))'
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

Hitting the deprecation warning in single-source code would seem to be
a strong hint that you have a bug in one version or the other rather
than being intended behaviour.

> bytes.zeros(n) sounds fine to me; I value similar interfaces for bytes and
> bytearray pretty highly.

With "bytearray(1000)" sticking around indefinitely, I'm less
concerned about adding a "zeros" constructor.

> I'm lukewarm on bytes.byte(c); but bytes([c]) does bother me because a size
> one list is (or at least feels) more expensive to allocate than a size one
> bytes object. So, okay.

So, here's an interesting thing I hadn't previously registered: we
actually already have a fairly capable "bytesview" option, and have
done since Stefan implemented "memoryview.cast" in 3.3. The trick lies
in the 'c' format character for the struct module, which is parsed as
a length 1 bytes object rather than as an integer:

>>> data = bytearray(b"Hello world")
>>> bytesview = memoryview(data).cast('c')
>>> list(bytesview)
[b'H', b'e', b'l', b'l', b'o', b' ', b'w', b'o', b'r', b'l', b'd']
>>> b''.join(bytesview)
b'Hello world'
>>> bytesview[0:5] = memoryview(b"olleH").cast('c')
>>> list(bytesview)
[b'o', b'l', b'l', b'e', b'H', b' ', b'w', b'o', b'r', b'l', b'd']
>>> b''.join(bytesview)
b'olleH world'

For the read-only case, it covers everything (iteration, indexing,
slicing), for the writable view case, it doesn't cover changing the
shape of the target array, and it doesn't cover assigning arbitrary
buffer objects (you need to wrap them in a similar cast for memoryview
to allow the assignment).

It's hardly the most *intuitive* spelling though - I was one of the
reviewers for Stefan's memoryview rewrite back in 3.3, and I only made
the connection today when looking to see how a view object like the
one we were discussing elsewhere in the thread might be implemented as
a facade over arbitrary memory buffers, rather than being specific to
bytes and bytearray.

If we went down the "bytesview" path, then a single new facade would
cover not only the 3 builtins (bytes, bytearray, memoryview) but also
any *other* buffer exporting type. If we so chose (at some point in
the future, not as part of this PEP), such a type could allow
additional bytes operations (like "count", "startswith" or "index") to
be applied to arbitrary regions of memory without making a copy. We
can't add those other operations to memoryview, since they don't make
sense for an n-dimensional array.

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-20 Thread Nick Coghlan
On 20 Aug 2014 04:18, "Marko Rauhamaa"  wrote:
>
> Tres Seaver :
>
> > On 08/19/2014 01:43 PM, Ben Hoyt wrote:
> >> Fair enough. I don't quite understand, though -- why is the "official
> >> policy" to kill something that's "essential" on *nix?
> >
> > ISTM that the policy is based on a fantasy that "it looks like text to
> > me in my use cases, so therefore it must be text for everyone."
>
> What I like about Python is that it allows me to write native linux code
> without having to make portability compromises that plague, say, Java. I
> have select.epoll(). I have os.fork(). I have socket.TCP_CORK. The
> "textualization" of Python3 seems part of a conscious effort to make
> Python more Java-esque.

It's not just the JVM that says text and binary APIs should be separate -
it's every widely used operating system services layer except POSIX. The
POSIX way works well *if* everyone reliably encodes things as UTF-8 or
always uses encoding detection, but its failure mode is unfortunately
silent data corruption.

That said, there's a lot of Python software that is POSIX specific, where
bytes paths would be the least of the barriers to porting to Windows or
Jython. I'm personally +1 on consistently allowing binary paths in lower
level APIs, but disallowing them in higher level explicitly cross platform
abstractions like pathlib.

Regards,
Nick.

>
>
> Marko
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-20 Thread Nick Coghlan
On 21 Aug 2014 08:19, "Greg Ewing"  wrote:
>
> Antoine Pitrou wrote:
>>
>> I think if you want low-level features (such as unconverted bytes paths
under POSIX), it is reasonable to point you to low-level APIs.
>
>
> The problem with scandir() in particular is that there is
> currently *no* low-level API exposed that gives the same
> functionality.
>
> If scandir() is not to support bytes paths, I'd suggest
> exposing the opendir() and readdir() system calls with
> bytes path support.

scandir is low level (the entire os module is low level). In fact, aside
from pathlib, I'd consider pretty much every API we have that deals with
paths to be low level - that's a large part of the reason we needed pathlib!

Cheers,
Nick.

>
> --
> Greg
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-20 Thread Nick Coghlan
On 21 Aug 2014 09:06, "Chris Barker"  wrote:

>
> As I understand it, the whole problem with some posix systems is that
there is NO filesystem encoding -- i.e. you can't know for sure what
encoding a filename is in. So you need to be able to pass the bytes through
as they are.
>
> (At least as I read Armin Ronacher's blog)

Armin lets his astonishment at the idea we'd expect Linux vendors to fix
their broken OS get the better of him at times - he thinks the
responsibility lies entirely with us to work around its quirks and
limitations :)

The "surrogateescape" codec is our main answer to the unreliability of the
POSIX encoding model - fsdecode will squirrel away arbitrary bytes in the
private use area, and then fsencode will restore them again later. That
works for the simple round tripping case, but we currently lack good
default tools for "cleaning" strings that may contain surrogates (or even
scanning a string to see if surrogates are present).

One idea I had along those lines is a surrogatereplace error handler (
http://bugs.python.org/issue22016) that emitted an ASCII question mark for
each smuggled byte, rather than propagating the encoding problem.

Cheers,
Nick.

>
> -Chris
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-20 Thread Nick Coghlan
On 21 August 2014 09:33, Ethan Furman  wrote:
> On 08/20/2014 03:31 PM, Nick Coghlan wrote:
>> On 21 Aug 2014 08:19, "Greg Ewing" > <mailto:greg.ew...@canterbury.ac.nz>> wrote:
>>>
>>>
>>> Antoine Pitrou wrote:
>>>>
>>>>
>>>> I think if you want low-level features (such as unconverted bytes paths
>>>> under POSIX), it is reasonable to point you to low-level APIs.
>>>
>>>
>>>
>>> The problem with scandir() in particular is that there is
>>> currently *no* low-level API exposed that gives the same
>>> functionality.
>>>
>>> If scandir() is not to support bytes paths, I'd suggest
>>> exposing the opendir() and readdir() system calls with
>>> bytes path support.
>>
>>
>> scandir is low level (the entire os module is low level). In fact, aside
>> from pathlib, I'd consider pretty much every
>> API we have that deals with paths to be low level - that's a large part of
>> the reason we needed pathlib!
>
>
> If scandir is low-level, and the low-level API's are the ones that should
> support bytes paths, then scandir should support bytes paths.
>
> Is that what you meant to say?

Yes. The discussions around PEP 471 *deferred* discussions of bytes
and file descriptor support to their own RFEs (not needing a PEP),
they didn't decide definitively not to support them. So Serhiy's
thread is entirely pertinent to that question.

Note that adding bytes support still *should not* hold up the initial
PEP 471 implementation - it should be done as a follow on RFE.

Cheers,
Nick.


-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 21 August 2014 12:16, Stephen J. Turnbull  wrote:
> Nick Coghlan writes:
>
>  > One idea I had along those lines is a surrogatereplace error handler (
>  > http://bugs.python.org/issue22016) that emitted an ASCII question mark for
>  > each smuggled byte, rather than propagating the encoding problem.
>
> Please, don't.
>
> "Smuggled bytes" are not independent events.  They tend to be
> correlated *within* file names, and this handler would generate names
> whose human semantics get lost (and there *are* human semantics,
> otherwise the name would be str(some_counter)).  They tend to be
> correlated across file names, and this handler will generate multiple
> files with the same munged name (and again, the differentiating human
> semantics get lost).
>
> If you don't know the semantics of the intended file names, you can't
> generate good replacement names.  This has to be an application-level
> function, and often requires user intervention to get good names.
>
> If you want to provide helper functions that applications can use to
> clean names explicitly, that might be OK.

Yeah, I was thinking in the context of reproducing sys.stdout's
behaviour in Python 2, but that reproduces the bytes faithfully, so
'surrogateescape' is already offers exactly the behaviour we want
(sys.stdout will have surrogateescape enabled by default in 3.5).

I'll keep pondering the question of possible helper functions in the
"string" module.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 21 August 2014 14:52, Cameron Simpson  wrote:
>
> Oh, and I reject Nick's characterisation of POSIX as "broken". It's
> perfectly internally consistent. It just doesn't match what he wants.
> (Indeed, what I want, and I'm a long time UNIX fanboy.)

The part that is broken is the idea that locale encodings are a viable
solution to conveying the appropriate encoding to use to talk to the
operating system. We've tried trusting them with Python 3, and they're
reliably wrong in certain situations. systemd is apparently better
than upstart at setting them correctly (e.g. for cron jobs), but even
it can't defend against an erroneous (or deliberate!) "LANG=C", or ssh
environment forwarding pushing a client's locale to the server. It's
worth looking through some of Armin Ronacher's complaints about Python
3 being broken on Linux, and seeing how many of them boil down to
"trusting the locale is wrong, Python 3 should just assume UTF-8 on
every POSIX system, the same way it does on Mac OS X". (I suspect
ShiftJIS, ISO-2022, et al users might object to that approach, but
it's at least a more viable choice now than it was back in 2008)

I still think we made the right call at least *trying* the idea of
trusting the locale encoding (since that's the officially supported
way of getting this information from the OS), and in many, many
situations it works fine. But I suspect we may eventually need to
resolve the technical issues currently preventing us from deciding to
ignore the environmental locale during interpreter startup and try
something different (such as always assuming UTF-8, or trying to force
C.UTF-8 if we detect the C locale, or looking for the systemd config
files and using those to set the OS encoding, rather than the
environmental locale).

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 21 August 2014 23:58, Marko Rauhamaa  wrote:
>
> My point is that the poor programmer cannot ignore the possibility of
> "funny" character sets. If Python tried to protect the programmer from
> that possibility, the result might be even more intractable: how to act
> on a file with an non-UTF-8 filename if you are unable to express it as
> a text string?

That's what the "surrogateescape" codec is for - we use it by default
on most OS interfaces, and it's implicit in the use of "os.fsencode"
and "os.fsdecode". Starting with Python 3, it's also enabled on
sys.stdout by default, so that "print(os.listdir(dirname))" will pass
the original raw bytes through to the terminal the same way Python 2
does.

The docs could use additional details as to which interfaces do and
don't have surrogateescape enabled by default, but for the time being,
the description of the codec error handler just links out to the
original definition in PEP 383.

It may also be useful to have some tools for detecting and cleaning
strings containing surrogate escaped data, but there hasn't been a
concrete proposal along those lines as yet. Personally, I'm currently
waiting to see if the Fedora or OpenStack folks indicate a need for
such tools before proposing any additions.

Regards,
Nick.

>
>
> Marko
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com



-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 22 August 2014 00:12, Nick Coghlan  wrote:
> On 21 August 2014 23:58, Marko Rauhamaa  wrote:
>>
>> My point is that the poor programmer cannot ignore the possibility of
>> "funny" character sets. If Python tried to protect the programmer from
>> that possibility, the result might be even more intractable: how to act
>> on a file with an non-UTF-8 filename if you are unable to express it as
>> a text string?
>
> That's what the "surrogateescape" codec is for

Oops, that should say "codec error handled" (I got it right later in the post).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)

2014-08-21 Thread Nick Coghlan
On 22 August 2014 00:41, Armin Rigo  wrote:
> Hi,
>
> On 18 August 2014 22:30, Oleg Broytman  wrote:
>>Aha, I see now -- the signing certificate is CAcert, which I've
>> installed manually.
>
> I don't suppose anyone is particularly annoyed by this fact?  I know
> for sure two classes of people that will never click "Ignore".  The
> first one is people that, for lack of a less negative term, I'll call
> "security freaks".  The second is "serious business people" to which
> the shiny new look of python.org appeals; they are likely to heed the
> warning "Legitimate banks, stores, etc. will never ask you to do this"
> and would regard an official hint to ignore it as highly
> unprofessional.

I've now raised this issue with the infrastructure team. The current
hosting arrangements for bugs.python.org were put in place when the
PSF didn't have any on-call system administrators of its own, but now
that we do, it may be time to migrate that service to a location where
we can switch to a more appropriate SSL certificate.

Anyone interested in following the discussion further may wish to join
infrastruct...@python.org

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)

2014-08-21 Thread Nick Coghlan
On 22 Aug 2014 04:45, "Benjamin Peterson"  wrote:
>
> Perhaps some board members could comment, but I hope the PSF could just
> pay a few hundred a year for a proper certificate.

That's exactly what we're doing - MAL reminded me we reached the same
conclusion last time this came up, we'll just track it better this time to
make sure it doesn't slip through the cracks again.

(And yes, switching to forced HTTPS once this is addressed would also be a
good idea - we'll add it to the list)

Regards,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 22 Aug 2014 09:24, "Isaac Morland"  wrote:
> I think the real tension here is between the POSIX level where filenames
are byte strings (except for \x00, which is reserved for string
termination) where \x2F has special interpretation, and absolutely every
application ever written, in every language, which wants filenames to be
character strings.

That's one of the best summaries of the situation I've ever seen :)

Most languages (including Python 2) throw up their hands and say this is
the developer's problem to deal with. Python 3 says it's *our* problem to
deal with on behalf of our developers. The "surrogateescape" error handler
allows recalcitrant bytes to be dealt with relatively gracefully in most
situations. We don't quite cover *everything* yet (hence the complaints
from some of the folks that are experts at dealing with Python 2 Unicode
handling on POSIX systems), but the remaining problems are a lot more
tractable than the "teach every native English speaker everywhere how to
handle Unicode properly" problem.

Regards,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-23 Thread Nick Coghlan
hon 3 is that applications
should require additional complexity solely to deal with *incorrectly*
configured systems and improperly encoded data and metadata (and,
ideally, the detection of the need for such handling should be "Python
3 threw an exception" rather than "something further down the line
detected corrupted data").

This is software rather than magic, though - these improvements only
happen through people actually knuckling down and solving the related
problems. When folks complain about Python 3's operating system
interface handling causing problems in some situations? They're almost
always referring to areas where we're still relying on the locale
system on POSIX or the code page system on Windows. Both of those
approaches are irredeemably broken - the answer is to stop relying on
them, but appropriately updating the affected subsystems generally
isn't a trivial task. A lot of the affected code runs before the
interpreter is fully initialised, which makes it really hard to test,
and a lot of it is incredibly convoluted due to various configuration
options and platform specific details, which makes it incredibly hard
to modify without breaking anything.

One of those areas is the fact that we still use the old 8-bit APIs to
interact with the Windows console. Those are just as broken in a
multilingual world as the other Windows 8-bit APIs, so Drekin came up
with a project to expose the Windows console as a UTF-16-LE stream
that uses the 16-bit APIs instead:
https://pypi.python.org/pypi/win_unicode_console

I personally hope we'll be able to get the issues Drekin references
there resolved for Python 3.5 - if other folks hope for the same
thing, then one of the best ways to help that happen is to try out the
win_unicode_console module and provide feedback on what does and
doesn't work.

Another was getting exceptions attempting to write OS data to
sys.stdout when the locale settings had been scrubbed from the
environment. For Python 3.5, we better tolerate that situation by
setting "errors=surrogateescape" on sys.stdout when the environment
claims "ascii" as a suitable encoding for talking to the operating
system (this is our way of saying "we don't actually believe you, but
also don't have the data we need to overrule you completely").

While I was going to wait for more feedback from Fedora folks before
pushing the idea again, this thread also makes me think it would be
worth our while to add more tools for dealing with surrogate escapes
and latin-1 binary data smuggling just to help make those techniques
more discoverable and accessible:
http://bugs.python.org/issue18814#msg225791

These various discussions are also giving me plenty of motivation to
get back to working on PEP 432 (the rewrite of the interpreter startup
sequence) for Python 3.5. A lot of these things are just plain hard to
change because of the complexity of the current startup code.
Redesigning that to use a cleaner, multiphase startup sequence that
gets the core interpreter running *before* configuring the operating
system integration should give us several more options when it comes
to dealing with some of these challenges.

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Bytes path related questions for Guido

2014-08-23 Thread Nick Coghlan
At Guido's request, splitting out two specific questions from Serhiy's
thread where I believe we could do with an explicit "yes or no" from
him.

1. Should we accept patches adding support for the direct use of bytes
paths in lower level filesystem manipulation APIs? (i.e. everything
that isn't pathlib)

This was Serhiy's original question (due to some open issues [1,2]). I
think the answer is yes, as we already do in some cases, and the
"pathlib doesn't support binary paths" design decision is a high level
platform independent API vs low level potentially platform dependent
API one rather than being about disallowing the use of bytes paths in
general.

[1] http://bugs.python.org/issue19997
[2] http://bugs.python.org/issue20797

2. Should we add some additional helpers to the string module for
dealing with surrogate escaped bytes and other techniques for
smuggling arbitrary binary data as text?

My proposal [3] is to add:

* string.escaped_surrogates (constant with the 128 escaped code points)
* string.clean(s): replaces surrogates with '\ufffd' or another
specified code point
* string.redecode(s, encoding): encodes a string back to bytes and
then decodes it again using the specified encoding (the old encoding
defaults to 'latin-1' to match the assumptions in WSGI)

"s != string.clean(s)" would then serve as a check for "does this
string contain any surrogate escaped bytes?"

[3] http://bugs.python.org/issue18814#msg225791

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path related questions for Guido

2014-08-24 Thread Nick Coghlan
On 24 August 2014 14:44, Nick Coghlan  wrote:
> 2. Should we add some additional helpers to the string module for
> dealing with surrogate escaped bytes and other techniques for
> smuggling arbitrary binary data as text?
>
> My proposal [3] is to add:
>
> * string.escaped_surrogates (constant with the 128 escaped code points)
> * string.clean(s): replaces surrogates with '\ufffd' or another
> specified code point
> * string.redecode(s, encoding): encodes a string back to bytes and
> then decodes it again using the specified encoding (the old encoding
> defaults to 'latin-1' to match the assumptions in WSGI)


Serhiy & Ezio convinced me to scale this one back to a proposal for
"codecs.clean_surrogate_escapes(s)", which replaces surrogates that
may be produced by surrogateescape (that's what string.clean() above
was supposed to be, but my description was not correct, and the name
was too vague for that error to be obvious to the reader)

"s != codecs.clean_surrogate_escapes(s)" would then become the check
for "does this string contain any surrogate escaped bytes?"

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path related questions for Guido

2014-08-24 Thread Nick Coghlan
On 25 August 2014 00:23, Antoine Pitrou  wrote:
> Le 24/08/2014 09:04, Nick Coghlan a écrit :
>> Serhiy & Ezio convinced me to scale this one back to a proposal for
>> "codecs.clean_surrogate_escapes(s)", which replaces surrogates that
>> may be produced by surrogateescape (that's what string.clean() above
>> was supposed to be, but my description was not correct, and the name
>> was too vague for that error to be obvious to the reader)
>
>
> "clean" conveys the wrong meaning. It should use a scary word such as
> "trap". "Cleaning" surrogates is unlikely to be the right procedure when
> dealing with surrogates produced by undecodable byte sequences.

"purge_surrogate_escapes" was the other term that occurred to me.

Either way, my use case is to filter them out when I *don't* want to
pass them along to other software, but would prefer the Unicode
replacement character to the ASCII question mark created by using the
"replace" filter when encoding.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path related questions for Guido

2014-08-24 Thread Nick Coghlan
On 25 Aug 2014 03:55, "Guido van Rossum"  wrote:
>
> Yes on #1 -- making the low-level functions more usable for edge cases by
supporting bytes seems fine (as long as the support for strings, where it
exists, is not compromised).

Thanks!

> The status of pathlib is a little unclear to me -- is there a plan to
eventually support bytes or not?

It's text only and Antoine plans to keep it that - the concatenation
operations, etc, are really only safe if you decode first.

>
> For #2 I think you should probably just work with the others you have
mentioned.

Yes, that sounds like a good idea. There's been some good progress on the
issue tracker, so I think we can thrash out some workable (and
comprehensible!) utilities that will be useful in their own right while
also serving as aids to understanding for the underlying mechanisms.

Cheers,
Nick.

>
>
> On Sat, Aug 23, 2014 at 9:44 PM, Nick Coghlan  wrote:
>>
>> At Guido's request, splitting out two specific questions from Serhiy's
>> thread where I believe we could do with an explicit "yes or no" from
>> him.
>>
>> 1. Should we accept patches adding support for the direct use of bytes
>> paths in lower level filesystem manipulation APIs? (i.e. everything
>> that isn't pathlib)
>>
>> This was Serhiy's original question (due to some open issues [1,2]). I
>> think the answer is yes, as we already do in some cases, and the
>> "pathlib doesn't support binary paths" design decision is a high level
>> platform independent API vs low level potentially platform dependent
>> API one rather than being about disallowing the use of bytes paths in
>> general.
>>
>> [1] http://bugs.python.org/issue19997
>> [2] http://bugs.python.org/issue20797
>>
>> 2. Should we add some additional helpers to the string module for
>> dealing with surrogate escaped bytes and other techniques for
>> smuggling arbitrary binary data as text?
>>
>> My proposal [3] is to add:
>>
>> * string.escaped_surrogates (constant with the 128 escaped code points)
>> * string.clean(s): replaces surrogates with '\ufffd' or another
>> specified code point
>> * string.redecode(s, encoding): encodes a string back to bytes and
>> then decodes it again using the specified encoding (the old encoding
>> defaults to 'latin-1' to match the assumptions in WSGI)
>>
>> "s != string.clean(s)" would then serve as a check for "does this
>> string contain any surrogate escaped bytes?"
>>
>> [3] http://bugs.python.org/issue18814#msg225791
>>
>> Regards,
>> Nick.
>>
>> --
>> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Fwd: Accepting PEP 440: Version Identification and Dependency Specification

2014-08-26 Thread Nick Coghlan
Antoine pointed out that it would still be a good idea to forward
packaging PEP acceptance announcements to python-dev, even when the
actual acceptance happens on distutils-sig.

That makes sense to me, so here's last week's notice of the acceptance
of PEP 440, the implementation independent versioning standard derived
from pkg_resources, PEP 386, and ideas from both Linux distributions
and other open source language communities.

Regards,
Nick.

-- Forwarded message ------
From: Nick Coghlan 
Date: 22 August 2014 22:34
Subject: Accepting PEP 440: Version Identification and Dependency Specification
To: DistUtils mailing list 


I just pushed Donald's final round of edits in response to the
feedback on the last PEP 440 thread, and as such I'm happy to announce
that I am accepting PEP 440 as the recommended approach to identifying
versions and specifying dependencies when distributing Python
software.

The PEP is available in the usual place at
http://www.python.org/dev/peps/pep-0440/

It's been a long road to get to an implementation independent
versioning standard that has a feasible migration path from the
current pkg_resources defined de facto standard, and I'd like to thank
a few folks:

* Donald Stufft for his extensive work on PEP 440 itself, especially
the proof of concept integration into pip
* Vinay Sajip for his efforts in validating earlier versions of the PEP
* Tarek Ziadé for starting us down the road to an implementation
independent versioning standard with the initial creation of PEP 386
back in June 2009, more than five years ago!

Regards,
Nick.

--
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia


-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-26 Thread Nick Coghlan
On 27 Aug 2014 02:52, "Terry Reedy"  wrote:
>
> On 8/26/2014 9:11 AM, R. David Murray wrote:
>>
>> On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan 
wrote:
>>>
>>> As some examples of where bilingual computing breaks down:
>>>
>>> * My NFS client and server may have different locale settings
>>> * My FTP client and server may have different locale settings
>>> * My SSH client and server may have different locale settings
>>> * I save a file locally and send it to someone with a different locale
setting
>>> * I attempt to access a Windows share from a Linux client (or
vice-versa)
>>> * I clone my POSIX hosted git or Mercurial repository on a Windows
client
>>> * I have to connect my Linux client to a Windows Active Directory
>>> domain (or vice-versa)
>>> * I have to interoperate between native code and JVM code
>>>
>>> The entire computing industry is currently struggling with this
>>> monolingual (ASCII/Extended ASCII/EBCDIC/etc) -> bilingual (locale
>>> encoding/code pages) -> multilingual (Unicode) transition. It's been
>>> going on for decades, and it's still going to be quite some time
>>> before we're done.
>>>
>>> The POSIX world is slowly clawing its way towards a multilingual model
>>> that actually works: UTF-8
>>> Windows (including the CLR) and the JVM adopted a different
>>> multilingual model, but still one that actually works: UTF-16-LE
>
>
> Nick, I think the first half of your post is one of the clearest
expositions yet of 'why Python 3' (in particular, the str to unicode
change).  It is worthy of wider distribution and without much change, it
would be a great blog post.

Indeed, I had the same idea - I had been assuming users already understood
this context, which is almost certainly an invalid assumption.

The blog post version is already mostly written, but I ran out of weekend.
Will hopefully finish it up and post it some time in the next few days :)

>> This kind of puts the "length" of the python2->python3 transition
>> period in perspective, doesn't it?

I realised in writing the post that ASCII is over 50 years old at this
point, while Unicode as an official standard is more than 20. By the time
this is done, we'll likely be talking 30+ years for Unicode to displace the
confusing mess that is code pages and locale encodings :)

Cheers,
Nick.

>
>
> --
> Terry Jan Reedy
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows Unicode console support [Was: Bytes path support]

2014-08-27 Thread Nick Coghlan
On 27 August 2014 01:23, Paul Moore  wrote:
> On 24 August 2014 04:27, Nick Coghlan  wrote:
>> One of those areas is the fact that we still use the old 8-bit APIs to
>> interact with the Windows console. Those are just as broken in a
>> multilingual world as the other Windows 8-bit APIs, so Drekin came up
>> with a project to expose the Windows console as a UTF-16-LE stream
>> that uses the 16-bit APIs instead:
>> https://pypi.python.org/pypi/win_unicode_console
>>
>> I personally hope we'll be able to get the issues Drekin references
>> there resolved for Python 3.5 - if other folks hope for the same
>> thing, then one of the best ways to help that happen is to try out the
>> win_unicode_console module and provide feedback on what does and
>> doesn't work.
>
> This looks very cool, and I plan on giving it a try. But I don't see
> any issues mentioned there (unless you mean the fact that it's not
> possible to hook into Python's interactive interpreter directly, but I
> don't see how that could be fixed in an external module). There's no
> open issues on the project's github tracker.

There are two links to CPython issues from the project description:

http://bugs.python.org/issue1602
http://bugs.python.org/issue17620

Part of the feedback on those was that as much as possible should be
made available as a third party module before returning to the
question of how to update CPython.

If we can get additional confirmation that the module addresses the
CLI integration issues, then we can take a closer look at switching
CPython itself over.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-27 Thread Nick Coghlan
On 27 August 2014 08:52, Nick Coghlan  wrote:
> On 27 Aug 2014 02:52, "Terry Reedy"  wrote:
>> Nick, I think the first half of your post is one of the clearest
>> expositions yet of 'why Python 3' (in particular, the str to unicode
>> change).  It is worthy of wider distribution and without much change, it
>> would be a great blog post.
>
> Indeed, I had the same idea - I had been assuming users already understood
> this context, which is almost certainly an invalid assumption.
>
> The blog post version is already mostly written, but I ran out of weekend.
> Will hopefully finish it up and post it some time in the next few days :)

Aaand, it's up:
http://www.curiousefficiency.org/posts/2014/08/multilingual-programming.html

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-27 Thread Nick Coghlan
On 28 Aug 2014 04:20, "Glenn Linderman"  wrote:
>
> On 8/27/2014 5:16 AM, Nick Coghlan wrote:
>>
>> On 27 August 2014 08:52, Nick Coghlan  wrote:
>>>
>>> On 27 Aug 2014 02:52, "Terry Reedy"  wrote:
>>>>
>>>> Nick, I think the first half of your post is one of the clearest
>>>> expositions yet of 'why Python 3' (in particular, the str to unicode
>>>> change).  It is worthy of wider distribution and without much change,
it
>>>> would be a great blog post.
>>>
>>> Indeed, I had the same idea - I had been assuming users already
understood
>>> this context, which is almost certainly an invalid assumption.
>>>
>>> The blog post version is already mostly written, but I ran out of
weekend.
>>> Will hopefully finish it up and post it some time in the next few days
:)
>>
>> Aaand, it's up:
>>
http://www.curiousefficiency.org/posts/2014/08/multilingual-programming.html
>>
>> Cheers,
>> Nick.
>>
>
> Indeed, I also enjoyed and found enlightening your response to this
issue, including the broader historical context. I remember when Unicode
was first published back in 1991, and it sounded interesting, but far
removed from the reality of implementations of the day. I was intrigued by
UTF-8 at the time, and even wrote an encoder and decoder for it for a
software package that eventually never reached any real customers.
>
> Your blog post says:
>>
>> Choosing UTF-8 aims to treat formatting text for communication with the
user as "just a display issue". It's a low impact design that will "just
work" for a lot of software, but it comes at a price:
>>
>> because encoding consistency checks are mostly avoided, data in
different encodings may be freely concatenated and passed on to other
applications. Such data is typically not usable by the receiving
application.
>
>
> I don't believe this is a necessary result of using UTF-8. It is a
possible result, and I guess some implementations are using it this way,
but a proper language could still provide and/or require proper usage of
UTF-8 data through its type system just as Python3 is doing with PEP 393.

Yes, Go works that way, for example. I doubt it actually checks for valid
UTF-8 at OS boundaries though - that would be a potentially expensive
check, and as a network service centric language, Go can afford to place
more constraints on the operating environment than we can.

>In fact, if it were not for the requirement to support passing character
strings in other formats (UTF-16, UTF-32) to historical APIs (in CPython
add-on packages) and the resulting practical performance considerations of
converting to/from UTF-8 repeatedly when calling those APIs, Python3 could
have evolved to using UTF-8 as its underlying data format, and obtained
equal encoding consistency as it has today.

We already have string processing algorithms that work for fixed width
encodings (and are known not to work for variable width encodings, hence
the bugs in Unicode handling on the old narrow builds).

It isn't that variable width encodings aren't a viable choice for
programming language text modelling, it's that the assumption of a fixed
width model is more deeply entrenched in CPython (and especially the C API)
than the exact number of bits used per code point.

> Of course, nothing can be "required" if the user chooses to continue
operating in the encoded domain, and manipulate data using the necessary
byte-oriented features of of whatever language is in use.
>
> One of the choices of Python3, was to retain character indexing as an
underlying arithmetic implementation citing algorithmic speed, but that is
a seldom needed operation, and of limited general applicability when
considering grapheme clusters.

The choice that was made was to say no to the question "Do we rewrite a
Unicode type that we already know works from scratch?". The decisions about
how to handle *text* were made way back before the PEP process even
existed, and later captured as PEP 100.

What changed in Python 3 was dropping the hybrid 8-bit str type with its
locale dependent behaviour, and parcelling its responsibilities out to
either the existing unicode type (renamed as str, as it was the default
choice), or the new locale independent bytes type.

> An iterator based approach can solve both problems, but would have been
best introduced as part of Python3.0, although it may have made 2to3
harder, and may have made it less practical to implement six and other "run
on both Py2 and Py3" type solutions harder, without introducing those same
iterative solutions into Python 2.6 or 2.7.

The option of fundamentally changing the text handling design was never on
the ta

[Python-Dev] Cleaning up surrogate escaped strings (was Bytes path related questions for Guido)

2014-08-28 Thread Nick Coghlan
On 26 Aug 2014 21:34, "MRAB"  wrote:
>
> On 2014-08-26 03:11, Stephen J. Turnbull wrote:
>>
>> Nick Coghlan writes:
>>
>>   > "purge_surrogate_escapes" was the other term that occurred to me.
>>
>> "purge" suggests removal, not replacement.  That may be useful too.
>>
>> neutralize_surrogate_escapes(s, remove=False, replacement='\uFFFD')
>>
> How about:
>
> replace_surrogate_escapes(s, replacement='\uFFFD')
>
> If you want them removed, just pass an empty string as the replacement.

The current proposal on the issue tracker is to instead take advantage of
the existing error handlers:

def convert_surrogateescape(data, errors='replace'):
return data.encode('utf-8', 'surrogateescape').decode('utf-8',
errors)

That code is short, but semantically dense - it took a few iterations to
come up with that version. (Added bonus: once you're alerted to the
possibility, it's trivial to write your own version for existing Python 3
versions. The standard name just makes it easier to look up when you come
across it in a piece of code, and provides the option of optimising it
later if it ever seems worth the extra work)

I also filed a separate RFE to make backslashreplace usable on input, since
that allows the option of separating the replacement operation from the
encoding operation.

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Cleaning up surrogate escaped strings (was Bytes path related questions for Guido)

2014-08-28 Thread Nick Coghlan
On 29 August 2014 10:32, Stephen J. Turnbull  wrote:
> Nick Coghlan writes:
>
>  > The current proposal on the issue tracker is to instead take advantage of
>  > the existing error handlers:
>  >
>  > def convert_surrogateescape(data, errors='replace'):
>  > return data.encode('utf-8', 'surrogateescape').decode('utf-8', 
> errors)
>  >
>  > That code is short, but semantically dense
>
> And it doesn't implement your original suggestion of replacement with
> '?' (and another possibility for history buffs is 0x1A, ASCII SUB).  At
> least, AFAICT from the docs there's no way to specify the replacement
> character; decoding always uses U+FFFD.  (If I knew how to do that, I
> would have suggested this.)

If that actually matters in a given context, I can do an ordinary
string replacement later. I couldn't think of a case where it actually
mattered though - if "must be ASCII" was a requirement, then
backslashreplace was a suitable alternative that lost less information
(hence the RFE to make that also usable on input).

>  > (Added bonus: once you're alerted to the possibility, it's trivial
>  > to write your own version for existing Python 3 versions.
>
> I'm not sure that's true.  At least, to me that code was obvious -- I
> got the exact definition (except for the function name) on the first
> try -- but I ruled it out because it didn't implement your suggestion
> of replacement with '?', even as an option.

Yeah, part of the tracker discussion involved me realising that part
wasn't a necessary requirement - the key is being able to get rid of
the surrogates, or replace them with something readily identifiable,
and less about being able to control exactly what they get replaced
by.

> OTOH, I think a lot of the resistance to codec-based solutions is the
> misconception that en/decoding streams is expensive, or the
> misconception that Python's internal representation of text as an
> array of code points (rather than an array of "characters" or
> "grapheme clusters") is somehow insufficient for text processing.

We don't actually have any technical deep dives into how Python 3's
text handling works readily available online, so there's a lot of
speculation and misinformation floating around. My recent article
gives the high level context, but it really needs to be paired up with
a piece (or pieces) that go deep into the details of codec
optimisation, the UTF-8 caching, how it integrates with the UTF-16-LE
Windows APIs, how the internal storage structure is determined at
allocation time, how it maintains compatibility with the legacy C
extension APIs, etc. The only current widely distributed articles on
those topics are written from a perspective that assumes we don't know
anything about Unicode, and are just making things unnecessarily
complicated (rather than solving hard cross platform compatibility and
text processing performance problems). That perspective is incorrect,
but "trust me, they're wrong" doesn't work very well with people that
are already angry.

Text manipulation is one of the most sophisticated subsystems in the
interpreter, though, so it's hard to know where to start on such a
series (and easy to get intimidated by the sheer magnitude of the work
involved in doing it right).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-08-30 Thread Nick Coghlan
On 30 Aug 2014 06:08, "Ethan Furman"  wrote:
>
> On 08/29/2014 01:00 PM, M.-A. Lemburg wrote:
>>
>> On 29.08.2014 21:47, Alex Gaynor wrote:
>>>
>>>
>>> I've just submitted PEP 476, on enabling certificate validation by
default for
>>> HTTPS clients in Python. Please have a look and let me know what you
think.
>>
>>
>> Thanks for the PEP. I think this is generally a good idea,
>> but some important parts are missing from the PEP:
>>
>>   * transition plan:
>>
>> I think starting with warnings in Python 3.5 and going
>> for exceptions in 3.6 would make a good transition
>>
>> Going straight for exceptions in 3.5 is not in line with
>> our normal procedures for backwards incompatible changes.
>>
>>   * configuration:
>>
>> It would be good to be able to switch this on or off
>> without having to change the code, e.g. via a command
>> line switch and environment variable; perhaps even
>> controlling whether or not to raise an exception or
>> warning.
>>
>>   * choice of trusted certificate:
>>
>> Instead of hard wiring using the system CA roots into
>> Python it would be good to just make this default and
>> permit the user to point Python to a different set of
>> CA roots.
>>
>> This would enable using self signed certs more easily.
>> Since these are often used for tests, demos and education,
>> I think it's important to allow having more control of
>> the trusted certs.
>
>
> +1 for PEP with above changes.

Ditto from me.

In relation to changing the Python CLI API to offer some of the wget/curl
style command line options, I like the idea of providing recipes in the
docs for implementing them at the application layer, but postponing making
the *default* behaviour configurable that way.

Longer term, I'd like to actually have a per-runtime configuration file for
some of these things that also integrated with the pyvenv support, but that
requires untangling the current startup code first (and there are only so
many hours in the day).

Regards,
Nick.


>
> --
> ~Ethan~
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-08-30 Thread Nick Coghlan
On 31 August 2014 12:21, R. David Murray  wrote:
> On Sun, 31 Aug 2014 03:25:25 +0200, Antoine Pitrou  
> wrote:
>> On Sun, 31 Aug 2014 09:26:30 +1000
>> Nick Coghlan  wrote:
>> > In relation to changing the Python CLI API to offer some of the wget/curl
>> > style command line options, I like the idea of providing recipes in the
>> > docs for implementing them at the application layer, but postponing making
>> > the *default* behaviour configurable that way.
>>
>> I'm against any additional environment variables and command-line
>> options. It will only complicate and obscure the security parameters of
>> certificate validation.

As Antoine says here, I'm also opposed to adding more Python specific
configuration options. However, I think there may be something
worthwhile we can do that's closer to the way browsers work, and has
the significant benefit of being implementable as a PyPI module first
(more on that in a separate reply).

>> The existing knobs have already been mentioned in this thread, I won't
>> mention them here again.
>
> Do those knobs allow one to instruct urllib to accept an invalid
> certificate without changing the program code?

Only if you add the specific certificate concerned to the certificate
store that Python is using (which PEP 476 currently suggests will be
the platform wide certificate store). Whether or not that is an
adequate solution is the point currently in dispute.

My view is that the core problem/concern we need to address here is
how we manage the migration away from a network communication model
that trusts the network by default. That transition will happen
regardless of whether or not we adapt Python as a platform - the
challenge for us is how we can address it in a way that minimises the
impact on existing users, while still ensuring future users are
protected by default.

This would be relatively easy if we only had to worry about the public
internet (since we're followers rather than leaders in that
environment), but we don't. Python made the leap into enterprise
environments long ago, so we not only need to cope with corporate
intranets, we need to cope with corporate intranets that aren't
necessarily being well managed. That's what makes this a harder
problem for us than it is for a new language like Go that was created
by a public internet utility, specifically for use over the public
internet - they didn't *have* an installed base to manage, they could
just build a language specifically tailored for the task of running
network services on Linux, without needing to account for any other
use cases.

The reason our existing installed base creates a problem is because
corporate network security has historically focused on "perimeter
defence": carving out a trusted island behind the corporate firewall
where users and other computer systems could be "safely" assumed not
to be malicious.

As an industry, we have learned though harsh experience that *this
model doesn't work*. You can't trust the network, period. A corporate
intranet is *less* dangerous than the public internet, but you still
can't trust it. This "don't trust the network" ethos is also
reinforced by the broad shift to "utility computing" where more and
more companies are running distributed networks, where some of their
systems are actually running on vendor provided servers. The "network
perimeter" is evaporating, as corporate "intranets" start to look a
lot more like recreations of the internet in miniature, with the only
difference being the existence of more formal contractual
relationships than typically exist between internet peers.

Unfortunately, far too many organisations (especially those outside
the tech industry) still trust in perimeter defence for their internal
network security, and hence tolerate the use of unsecured connections,
or skipping certificate validation internally. This is actually a
really terrible idea, but it's still incredibly common due to the
general failure of the technology industry to take usability issues
seriously when we design security systems - doing the wrong "unsafe"
thing is genuinely easier than doing things right.

We have enough evidence now to be able to say (as Alex does in PEP
476) that it has been comprehensively demonstrated that "opt-in
security" really just means "security failures are common and silent
by default". We've seen it with C buffer overflow vulnerabilities,
we've seen it with plain text communication links, we've seen it with
SSL certificate validation - the vast majority of users and developers
will just run with the default behaviour of the platform or
application they're using, even if those defaults have serious
problems. As the saying goes, "you 

Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-08-30 Thread Nick Coghlan
On 31 August 2014 12:21, R. David Murray  wrote:
> Do those knobs allow one to instruct urllib to accept an invalid
> certificate without changing the program code?

My first reply ended up being a context dump of the challenges created
by legacy corporate intranets that may not be immediately obvious to
folks that spend most of their time working on or with the public
internet. I decided to split these more technical details out to a new
reply for the benefit of folks that already know all that history :)

To answer David's specific question, the existing knobs at the OpenSSL
level (SSL_CERT_DIR and SSL_CERT_FILE ) let people add an internal CA,
opt out of the default CA system, and trust *specific* self-signed
certs.

What they don't allow is a global "trust any cert" setting -
exceptions need to be added at the individual cert level or at the CA
level, or the application needs to offer an option to not do cert
validation at all. That "trust anything" option at the platform level
is the setting that is a really bad idea - if an organisation thinks
it needs that (because they have a lot of self-signed certs, but
aren't verifying their HTTPS connections to those servers), then what
they really need is an internal CA, where their systems just need to
be set up to trust the internal CA in addition to the platform CA
certs.

With Alex's proposal, organisations that are already running an
internal CA should be just fine - Python 3.5 will see the CA cert in
the platform cert store and accept certs signed by it as valid. (Note:
the Python 3.4 warning should take this into account, which could be a
problem since we don't currently do validity checks against the
platform store by default. The PEP needs to cover the mechanics of
that in more detail, as I think it means we'll need to make *some*
changes to the default configuration even in Python 3.4 to get
accurate validity data back from OpenSSL)

However, we also need to accept that there's a reason browser vendors
still offer "click through insecurity" for sites with self-signed
certificates, and tools like wget/curl offer the option to say "don't
check the certificate": these are necessary compromises to make SSL
based network connections actually work on many current corporate
intranets.

It is corporate environments that also make it desirable to be able to
address this potential problem at a *user* level, since many Python
users in a large organisations are actually running Python entirely
out of their home directories, rather than as a system installation
(they may not even have admin access to their own systems).

My suggestion at this point is that we take a leaf from both browser
vendors and the design of SSH: make it easy to *add* a specific
self-signed cert to the set a *particular user* trusts by default
(preferably *only* for a particular host, to limit the power of such
certs). "python -m ssl" doesn't currently do anything interesting, so
it could be used to provide an API for managing that user level
certificate store.

A Python-specific user level cert store is something that could be
developed as a PyPI library for Python 2.7.9+ and 3.4+ (Is cert
management considered in scope for cryptography.io? If so, that could
be a good home).

So while I agree with the intent of PEP 476, and like the suggested
end state, I'm back to thinking that the transition plan for existing
corporate users needs more work before it can be accepted. This is
especially true since it becomes another barrier to migrating from
Python 2.7 to Python 3.5+ (a warning in Python 3.4 doesn't help with
that aspect, although a new -3 warning might).

A third party module that offers a user level certificate store, and a
gevent.monkey style way of opting in to this behaviour for existing
Python versions would be one way to provide a more compelling
transition plan.

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-08-30 Thread Nick Coghlan
On 31 August 2014 16:16, Donald Stufft  wrote:
>
> On Aug 31, 2014, at 2:09 AM, Nick Coghlan  wrote:
>
> At the same time, we need to account for the fact that most existing
> organisations still trust in perimeter defence for their internal
> network security, and hence tolerate (or even actively encourage) the
> use of unsecured connections, or skipping certificate validation,
> internally. This is actually a really terrible idea, but it's still
> incredibly common due to the general failure of the technology
> industry to take usability issues seriously when we design security
> systems (at least until recently) - doing the wrong "unsafe" thing is
> genuinely easier than doing things right.
>
>
> Just a quick clarification in order to be a little clearer, this change will
> (obviously) only effect those who trust perimeter security *and* decided to
> install an invalid certificate instead of just using HTTP. I'm not saying
> that
> this doesn't happen, just being specific (I'm not actually sure why they
> would
> install a TLS certificate at all if they are trusting perimeter security,
> but
> I'm sure folks do).

It's the end result when a company wide edict to use HTTPS isn't
backed up by the necessary documentation and training on how to get a
properly signed cert from your internal CA (or, even better, when such
an edict comes down without setting up an internal CA first). Folks
hit the internet instead, find instructions on creating a self-signed
cert, install that, and tell their users to ignore the security
warning and accept the cert. Historically, Python clients have "just
worked" in environments that required a click-through on the browser
side, since you had to opt in to checking the certificates properly.

Self-signed certificates can also be really handy for doing local
testing - you're not really aiming to authenticate the connection in
that case, you're just aiming to test that the secure connection
machinery is all working properly.

(As far as the "what about requests?" question goes - that's in a
similar situation to Go, where being new allows it to choose different
defaults, and folks for whom those defaults don't work just won't use
it. There's also the fact that most corporate Python users are
unlikely to know that PyPI exists, let alone that it contains a module
called "requests" that does SSL certificate validation by default.
Those of us in the corporate world that interact directly with
upstream are still the exception rather than the rule)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-08-31 Thread Nick Coghlan
On 1 Sep 2014 06:32, "Paul Moore"  wrote:
>
> On 31 August 2014 21:15, Antoine Pitrou  wrote:
> > What do you call your local cert store?
>
> I was referring to Christian's comment
> > It's very simple to trust a self-signed certificate: just download it
and stuff it into the trust store.
>
> From his recent response, I guess he meant the system store, and he
> agrees that this is a bad option.
>
> OK, that's fair, but:
>
> a) Is there really no OS-level personal trust store? I'm thinking of
> Windows here for my own personal use, but the same question applies
> elsewhere.
> b) I doubt my confusion over Christian's response is atypical. Based
> on what he said, if we hadn't had the subsequent discussion, I would
> probably have found a way to add a cert to "the store" without
> understanding the implications. While it's not Python's job to educate
> users, it would be a shame if its default behaviour led people to make
> ill-informed decisions.

Right, this is why I came to the conclusion we need to follow the browser
vendors lead here and support a per-user Python specific supplementary
certificate cache before we can start validating certs by default at the
*Python* level. There are still too many failure modes for cert management
on private networks for us to safely ignore the use case of needing to
force connections to services with invalid certs.

We don't need to *solve* that problem here today - we can push it back to
Alex (and anyone else interested) as a building block to investigate
providing as part of cryptography.io or certi.fi, with a view to making a
standard library version of that (along with any SSL module updates) part
of PEP 476.

In the meantime, we can update the security considerations for the ssl
module to make it clearer that the defaults are set up for trusted networks
and that using it safely on the public internet may mean you're better off
with a third party library like requests or Twisted. (I'll start another
thread shortly that is highly relevant to that topic)

Regards,
Nick.

>
> Maybe an SSL HOWTO would be a useful addition to the docs, if anyone
> feels motivated to write one.
>
> Regardless, thanks for the education!
>
> Paul
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 477: selected ensurepip backports for Python 2.7

2014-08-31 Thread Nick Coghlan
Earlier versions of PEP 453 proposed bootstrapping pip into a Python 2.7
maintenance release in addition to including it with Python 3.4.

That part of the proposal proved to be controversial, so we dropped it from
the original PEP in order to focus on meeting the Python 3.4 specific
release deadlines. This also had the benefit of working out the kinks in
the bootstrapping processing as part of the Python 3.4 release cycle.

However, we still think we should start providing pip by default to Python
2.7 users as well, at least as part of the Windows and Mac OS X installers.

One notable difference from PEP 453 is that because there is no venv module
in 2.7, and hence no integration between venv and ensurepip, we can give
redistributors the option of just disabling ensurepip entirely and
redirecting users to platform specific installation tools.

Regards,
Nick.

==

PEP: 477
Title: Backport ensurepip (PEP 453) to Python 2.7
Version: $Revision$
Last-Modified: $Date$
Author: Donald Stufft 
Nick Coghlan 
Status: Active
Type: Process
Content-Type: text/x-rst
Created: 26-Aug-2014
Post-History: 1-Sep-2014

Abstract


This PEP proposes that the ``ensurepip`` module, added to Python 3.4 by PEP
453, be backported to Python 2.7. It also proposes that automatic invocation
of ``ensurepip`` be added to the Python 2.7 Windows and OSX installers.
However
it does **not** propose that automatic invocation be added to the
``Makefile``.

It also proposes that the documentation changes for the package distribution
and installation guides be updated to match that in 3.4, which references
using
the ``ensurepip`` module to bootstrap the installer.

Rationale
=

Python 2.7 is effectively a LTS release of Python which represents the end
of
the 2.x series and there is still a very large contingent of users whom are
still using Python 2.7 as their primary version. These users, in order to
participate in the wider Python ecosystem, must manually attempt to go out
and
find the correct way to bootstrap the packaging tools.

It is the opinion of this PEP that making it as easy as possible for end
users
to participate in the wider Python ecosystem is important for 3 primary
reasons:

1. The Python 2.x to 3.x migration has a number of painpoints that are
eased by
   a number of third party modules such as six [#six]_, modernize
[#modernize]_,
   or future [#future]_. However relying on these tools requires that
everyone
   who uses the project have a tool to install these packages.
2. In addition to tooling to aid in migration from Python 2.x to 3.x, there
are
   also a number of modules that are *new* in Python 3 for which there are
   backports available on PyPI. This can also aid in the ability for people
   to write 2.x and 3.x compatible software as well as enable them to use
some
   of the newer features of Python 3 on Python 2.
3. Users also will need a number of tools in order to create python packages
   that conform to the newer standards that are being proposed. Things like
   setuptools [#setuptools]_, Wheel [#wheel]_, and twine [#twine]_ are
enabling
   a safer, faster, and more reliable packaging tool chain. These tools can
be
   difficult for people to use if first they must be told how to go out and
   install the package manager.
4. One of Pythons biggest strengths is in the huge ecosystem of libraries
and
   projects that have been built on top of it, most of which are distributed
   through PyPI. However in order to benefit from this wide ecosystem
   meaningfully requires end users, some of which are going to be new, to
make
   a decision on which package manager they should get, how to get it, and
then
   finally actually installing it first.

Furthermore, alternative implementations of Python are recognizing the
benefits
of PEP 453 and both PyPy and Jython have plans to backport ensurepip to
their
2.7 runtimes.

Automatic Invocation


PEP 453 has ``ensurepip`` automatically invoked by default in the
``Makefile``
and the Windows and OSX Installers. This allowed it to ensure that, by
default,
all users would get Python with pip already installed. This PEP however
believes that while this is fine for the Python 2.7 Windows and Mac OS X
installers it is *not* ok for the Python 2.7 ``Makefile`` in general.

The primary consumers of the ``Makefile`` are downstream package managers
which
distribute Python themselves. These downstream distributors typically do not
want pip to be installed via ``ensurepip`` and would prefer that end users
install it with their own package manager. Not invoking ``ensurepip``
automatically from the ``Makefile`` would allow these distributors to simply
ignore the fact that ``ensurepip`` has been backported and still not end up
with pip installed via it.

The primary consumers of the OSX and Windows installers are end users who
are
attempting to install Python on their own machine. There is not a package
manager available where these

Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-08-31 Thread Nick Coghlan
On 1 Sep 2014 07:43, "Christian Heimes"  wrote:
>
> On 31.08.2014 08:09, Nick Coghlan wrote:
> > As Antoine says here, I'm also opposed to adding more Python specific
> > configuration options. However, I think there may be something
> > worthwhile we can do that's closer to the way browsers work, and has
> > the significant benefit of being implementable as a PyPI module first
> > (more on that in a separate reply).
>
> I'm on your and Antoine's side and strictly against any additional
> environment variables or command line arguments. That would make the
> whole validation process even more complex and harder to understand.
>
> There might be a better option to give people and companies the option
> to tune the SSL module to their needs. Python already have a
> customization hook for the site module called sitecustomize. How about
> another module named sslcustomize? Such a module could be used to tune
> the ssl module to the needs of users, e.g. configure a different default
> context, add certificates to a default context etc.
>
> Companies could install them in a system global directory on their
> servers. Users could put them in their own user site directory and even
> each virtual env can have one sslcustomize of its own. It's fully
> backward compatible, doesn't add any flags and developers have the full
> power of Python for configuration and customization.

And means a user specific store (if one became available) could be
configured there.

Yes, I think this would address my concerns, especially if combined with a
clear recipe in the documentation on how to optionally disable cert
validation at the application layer.

Assuming sslcustomize was in site-packages rather than the standard library
directories, you would also be able to use virtual environments with an
appropriate sslcustomize module to disable cert checking even if the
application you were running didn't support direct configuration.

Cheers,
Nick.

>
> Christian
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 477: selected ensurepip backports for Python 2.7

2014-08-31 Thread Nick Coghlan
On 1 Sep 2014 09:23, "Benjamin Peterson"  wrote:
>
> On Sun, Aug 31, 2014, at 16:17, Antoine Pitrou wrote:
> > On Mon, 1 Sep 2014 08:00:14 +1000
> > Nick Coghlan  wrote:
> > >
> > > That part of the proposal proved to be controversial, so we dropped
it from
> > > the original PEP in order to focus on meeting the Python 3.4 specific
> > > release deadlines. This also had the benefit of working out the kinks
in
> > > the bootstrapping processing as part of the Python 3.4 release cycle.
> > >
> > > However, we still think we should start providing pip by default to
Python
> > > 2.7 users as well, at least as part of the Windows and Mac OS X
installers.
> >
> > I don't agree with this. pip is simply not part of the 2.7 feature set.
> > If you add pip to a bugfix version, then you have bugfix versions which
> > are more featureful than others, which makes things more complicated to
> > explain.
>
> 2.7.x has been and will be alive for so long that will already have to
> explain that sort thing; i.e. PEP 466 and why different bugfix releases
> support different versions of dependency libraries.

Exactly. LTS is genuinely different from stopping maintenance after the
next feature release - it requires considering the "stability risk" and
"user experience improvement" questions separately.

In this case, the problem is that the Python 2 platform *is* still
evolving, but the centre of that evolution has moved to PyPI. For "standard
library only" users, Python 2 stopped evolving back in 2010. For PyPI
users, by contrast, it's still evolving at a rapid pace.

For our Python 3 transition story to be coherent, we need to ensure tools
like six, modernize and future are readily available, while still remaining
free to evolve independently of the standard library. That means providing
a package management utility as easily and as painlessly as possible.

Embracing pip upstream for Python 2 as well as Python 3 also sends a
powerful signal to redistributors that says "your users are going to need
this" and makes them do additional work to *avoid* providing it. Some of
them *will* choose that path, but that becomes a matter for discussion
between them and their user base, rather than a problem we need to worry
about upstream.

Cheers,
Nick.


> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-08-31 Thread Nick Coghlan
On 1 Sep 2014 08:15, "Donald Stufft"  wrote:
>
>
>> On Aug 31, 2014, at 5:43 PM, Christian Heimes 
wrote:
>>
>> Companies could install them in a system global directory on their
>> servers. Users could put them in their own user site directory and even
>> each virtual env can have one sslcustomize of its own. It's fully
>> backward compatible, doesn't add any flags and developers have the full
>> power of Python for configuration and customization.
>
> This may be a dumb question, but why can’t sitecustomize do this already?

It can. The advantage of a separate file is that it won't conflict with
existing sitecustomize modules, so (for example) redistributors can add a
default sslcustomize, and you can add one to your virtual environments that
are integrated with the system Python environment without needing to worry
about whether or not there's a global sitecustomize (you'd only have
trouble if there was a global sslcustomize).

Cheers,
Nick.

>
> ---
> Donald Stufft
> PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-08-31 Thread Nick Coghlan
On 1 September 2014 11:10, R. David Murray  wrote:
>
> It sounds like this would address my concerns as well (I don't really
> care *how* it is implemented as long as I don't have to touch the
> code of a third party application when I upgrade my python version to
> 3.5...remember, the context here is backward compatibility concerns).
>
> Does it address the issue of accepting an invalid cert, though?

That's actually an interesting question, as the PEP doesn't currently
propose adding any new global configuration knobs to the ssl or
httplib modules - it just proposes switching httplib from the legacy
(but fully backwards compatible) ssl._create_stdlib_context() API to
the newer (but potentially backwards incompatible in some
environments) ssl.create_default_context() API.

Having the ssl module import an sslcustomize module at the end
wouldn't be enough unless the appropriate APIs were put in place to
allow things to be configured at a process global level.

One possible way to do that would be to provide a central context
factory mapping that provide a module specific SSL context creator.
We'd seed it appropriately for the stdlib modules where we wanted to
use the legacy context definition, but it would default to using
ssl.create_default_context.

Under that kind of model, the first change we would actually make is
to make ssl._create_stdlib_context() public under a suitable name,
let's say ssl.create_legacy_context()

Independenting of any other changes, exposing
ssl.create_legacy_context() like that would also make it
straightforward for folks to opt in to the old behaviour as an interim
hack in a way that is easy to grep for and fix later (it's also
something a linter can easily disallow).

The second change would be to provide a mapping from arbitrary names
to context factories in the ssl module that defaults to
ssl.create_default_context:

named_contexts = defaultdict((lambda name: create_default_context))

(A more accurate name would be "named_context_factory", but I think
"named_contexts" reads better. Folks will learn quickly enough that it
actually stores context factories rather than prebuilt context
objects)

The third change would be to replace all calls to
"ssl._create_stdlib_context()" with calls to
"ssl.named_contexts[__name__]()" instead.

The final change would be to seed the context factory map
appropriately for the standard library modules where we wanted to keep
the *old* default:

for modname in ("nntplib", "poplib", "imaplib", "ftplib",
"smtplib", "asyncio.selector_events", "urllib.request",
"http.client"):
named_contexts[modname] = create_legacy_context

The list I have above is for *all* current uses of
"sss._create_stdlib_context". The backwards incompatible part of PEP
476 would then just be about removing names from that list (currently
just "http.client", but I'd suggest "asyncio.selector_events" as
another candidate, taking advantage of asyncio's provisional API
status).

The "revert to 3.4 behaviour" content for sslcustomize.py would then just be:

import ssl
ssl.named_contexts["http.client"] = ssl.create_legacy_context

However, someone that wanted to also enforce SSL properly for other
standard library modules could go the other way:

import ssl
for modname in ("nntplib", "poplib", "imaplib", "ftplib",
"smtplib", "urllib.request"):
   ssl.named_contexts[modname] = ssl.create_default_context

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-08-31 Thread Nick Coghlan
On 1 September 2014 16:07, Paul Moore  wrote:
> On 31 August 2014 23:10, Nick Coghlan  wrote:
>> Assuming sslcustomize was in site-packages rather than the standard library
>> directories, you would also be able to use virtual environments with an
>> appropriate sslcustomize module to disable cert checking even if the
>> application you were running didn't support direct configuration.
>
> Would this mean that a malicious package could install a custom
> sslcustomize.py and so add unwanted certs to the system? I guess we
> have to assume that installed packages are trusted, but I just wanted
> to be explicit.

Yes, it would have exactly the same security failure modes as
sitecustomize, except it would only fire if the application imported
the ssl module.

The "-S" and "-I" switches would need to disable the implied
"sslcustomize", just as they disable "import site".

Cheers,
Nick.



-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-09-01 Thread Nick Coghlan
On 1 September 2014 17:13, Christian Heimes  wrote:
> On 01.09.2014 08:44, Nick Coghlan wrote:
>> Yes, it would have exactly the same security failure modes as
>> sitecustomize, except it would only fire if the application
>> imported the ssl module.
>>
>> The "-S" and "-I" switches would need to disable the implied
>> "sslcustomize", just as they disable "import site".
>
> A malicious package can already play havoc with your installation with
> a custom ssl module. If somebody is able to sneak in a ssl.py then you
> are screwed anyway. sslcustomize is not going to make the situation worse.

That's not quite true - we're fairly careful about putting the
standard library before userspace directories, so aside from the
"current directory" problem, shadowing "ssl" itself can be tricky to
arrange. "sslcustomize" would be more like "sitecustomize" - since it
wouldn't normally be in the standard library, it can appear anywhere
on sys.path, rather than having to be injected ahead of the standard
library.

I think that's OK though - compared to the security nightmare that is
downloading modules from PyPI and running "./setup.py install" (or,
even worse, "sudo ./setup.py install"), this would be a rather
esoteric attack vector, and the existing -S and -I mechanisms could be
used to defend against it :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 477: selected ensurepip backports for Python 2.7

2014-09-01 Thread Nick Coghlan
On 1 Sep 2014 17:31, "Donald Stufft"  wrote:
>
>
>> On Sep 1, 2014, at 2:22 AM, Ned Deily  wrote:
>>
>>
>> And that is a minor complication compared with the confusion and
>> difficulty of trying to explain to users (stuck with 2.7 for the time
>> being) of how to install third-party packages on each platform
>> (especially Windows) versus the simplicity of the 3.4.x story, thanks to
>> ensurepip.  Having pip available as a documented, batteries-included
>> tool for all current releases would be a huge usability improvement.
>
>
> Yes this is a major driver. I mean I think I probably have an above
average
> knowledge of how to bootstrap pip, and if you dump me on a Windows box
> I struggle to actually do it the first time around without stumbling
around and
> doing things in the wrong order and the like. (Getting a compiler
toolchain is
> worse, but yay for Wheels).

Yeah. I've mentioned it before, but I think it bears repeating that trying
to install pip on Windows with both Python 2 & 3 installed was one of the
key things that convinced me to write PEP 453 in the first place. The
default settings in both Internet and Windows explorer make it tricky
regardless, but parallel installs make it even worse.

>> FTR, I'm willing to backport the pieces I did for 3.4 and I could do the
>> ensurepip backport, as well.  I'll leave the Windows installer changes
>> for someone else, though.
>
>
> Awesome, I’m of course willing to back port ensure pip itself as well.
Truthfully
> it shouldn’t be a very difficult backport. It’s only ~200 SLOC or so and
the only
> real things would be removing a Python3ism here or there.

Backporting meaningful tests will actually be the annoying part: the
current unit tests use unittest.mock, while the current functional tests
use pyvenv :)

Both of those can be dealt with, but the tests will be a bit of an ugly
hack by comparison with their Py3 counterparts :)

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-09-01 Thread Nick Coghlan
On 2 Sep 2014 00:08, "Antoine Pitrou"  wrote:
>
> On Mon, 1 Sep 2014 23:42:10 +1000
> Chris Angelico  wrote:
> > >>
> > >> That has to be done inside the same process. But imagine this
> > >> scenario: You have a program that gets invoked as root (or some other
> > >> user than yourself), and you're trying to fiddle with what it sees.
> > >> You don't have root access, but you can manipulate the file system,
to
> > >> the extent that your userid has access. What can you do to affect
this
> > >> other program?
> > >
> > > If you're root you shouldn't run untrusted code. See
> > > https://docs.python.org/3/using/cmdline.html#cmdoption-I
> >
> > Right, which is why sslcustomize has to be controlled by that, but the
> > possibility of patching (or monkeypatching) ssl.py isn't as big a
> > deal.
>
> To be frank I don't understand what you're arguing about.

When I said "shadowing ssl can be tricky to arrange", Chris correctly
interpreted it as referring to the filesystem based privilege escalation
scenario that isolated mode handles, not to normal in-process
monkeypatching or module injection. I don't consider the latter cases to be
interesting attack scenarios, as they imply the attacker is *already*
running arbitrary Python code inside your CPython process, so you've
already lost.

Cheers,
Nick.

>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-09-01 Thread Nick Coghlan
On 2 Sep 2014 00:59, "Antoine Pitrou"  wrote:
>
> On Tue, 2 Sep 2014 00:53:11 +1000
> Nick Coghlan  wrote:
> > >
> > > To be frank I don't understand what you're arguing about.
> >
> > When I said "shadowing ssl can be tricky to arrange", Chris correctly
> > interpreted it as referring to the filesystem based privilege escalation
> > scenario that isolated mode handles, not to normal in-process
> > monkeypatching or module injection.
>
> There's no actual difference. You can have a sitecustomize.py that does
> the monkeypatching or the shadowing. There doesn't seem to be anything
> "tricky" about that.

Oh, now I get what you mean - yes, sitecustomize already poses the same
kind of problem as the proposed sslcustomize (hence the existence of the
related command line options).

I missed that you had switched to talking about using that attack vector,
rather than trying to shadow stdlib modules directly through the filesystem
(which is the only tricky thing I was referring to).

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-09-01 Thread Nick Coghlan
On 2 Sep 2014 03:08, "Donald Stufft"  wrote:
>
>
>> On Sep 1, 2014, at 1:01 PM, Christian Heimes 
wrote:
>>
>> On 01.09.2014 17:35, Nick Coghlan wrote:
>>>
>>> Oh, now I get what you mean - yes, sitecustomize already poses the same
>>> kind of problem as the proposed sslcustomize (hence the existence of the
>>> related command line options).
>>
>>
>> If an attacker is able to place a module like sitecustomize.py in an
>> import directory or any .pth file in a site-packages directory than this
>> Python installation is compromised. .pth files are insidious because
>> they are always loaded and their code is always executed. I don't see
>> how sslcustomize is going to make a difference here.
>>
>
> Right, this is the point I was trying to make. If you’ve installed a
malicious
> package it’s game over. There’s nothing Python can do to help you.

Yes, that's what I said originally when pointing out that isolated mode and
the switch to disable site module processing would need to disable
sslcustomize processing as well.

Antoine was replying to a side comment about it being tricky to shadow
stdlib modules. I left out the qualifier "directly" in my original comment,
and he left out "indirectly through sitecustomize" in his initial reply, so
we were talking past each for a while.

Cheers,
Nick.

>
> ---
> Donald Stufft
> PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR

2014-09-02 Thread Nick Coghlan
On 2 September 2014 07:17, Matthew Woodcraft  wrote:
>
> (The program handles SIGTERM so that it can do a bit of cleanup before
> exiting, and it uses the signal-handler-sets-a-flag technique. The call
> that might be interrupted is sleep(), so the program doesn't strictly
> _rely_ on the existing behaviour; it would just become very slow to
> exit.)

Making an exception for sleep() (i.e. still letting it throw EINTR)
sounds like a reasonable idea to me.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   3   4   5   6   7   8   9   10   >