Re: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?)

2016-04-09 Thread Nick Coghlan
On 9 April 2016 at 02:02, Koos Zevenhoven  wrote:
> I'm still thinking a little bit about 'pathname', which to me sounds
> more like a string than fspath does [1]. It would be nice to have the
> string/path distinction especially when pathlib adoption grows larger.
> But who knows, maybe somewhere in the far future, no-one will care
> much about fspath, fsencode, fsdecode or os.path.

Ah, I like it - adding the "name" suffix nicely distinguishes the
protocol from the rich path objects in pathlib.

I'll catch up on Ethan's dedicated naming thread before commenting
further, though :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?)

2016-04-09 Thread Victor Stinner
os.DirEntry doesn't support bytes: os.scandir() only accept str. It's a
deliberate choice.

I strongly suggest to only support Unicode for filenames in Python 3. So
__fspath__ must only return str, or a TypeError must be raised.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?)

2016-04-09 Thread Ethan Furman

On 04/09/2016 12:07 AM, Victor Stinner wrote:

os.DirEntry doesn't support bytes: os.scandir() only accept str. It's a
deliberate choice.


3.5.0 scandir supports bytes:

--> huh = list(scandir(b'.'))
--> huh
[, , b'__MACOSX'>, , , b'index.html'>]


--> huh[0].path
b'./minicourse-ajax-project'

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Question about the current implementation of str

2016-04-09 Thread Nick Coghlan
On 9 April 2016 at 10:56, Larry Hastings  wrote:
>
>
> I have a straightforward question about the str object, specifically the
> PyUnicodeObject.  I've tried reading the source to answer the question
> myself but it's nearly impenetrable.  So I was hoping someone here who
> understands the current implementation could answer it for me.
>
> Although the str object is immutable from Python's perspective, the C object
> itself is mutable.  For example, for dynamically-created strings the hash
> field may be lazy-computed and cached inside the object.  I was wondering if
> there were other fields like this.  For example, are there similar
> lazy-computed cached objects for the different encoded versions (utf8 utf16)
> of the str?  What would really help an exhaustive list of the fields of a
> str object that may ever change after the object's initial creation.

https://www.python.org/dev/peps/pep-0393/#specification should have
most of the relevant details.

Aside from the hash and the interned-or-not flag in the state, most
things should be locked once the string is ready, except that
generating the utf-8 and wchar_t forms is deferred until they're
needed if they're not the same as the canonical form.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancments - method name only

2016-04-09 Thread Nick Coghlan
On 9 April 2016 at 04:25, Brett Cannon  wrote:
> On Fri, 8 Apr 2016 at 11:13 Ethan Furman  wrote:
>> On 04/08/2016 10:46 AM, Koos Zevenhoven wrote:
>>  > On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker  wrote:
>>  >> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote:
>>
>>  >>> I'm still thinking a little bit about 'pathname', which to me sounds
>>  >>> more like a string than fspath does.
>>  >>
>>  >>
>>  >> I like that a lot - or even "__pathstr__" or "__pathstring__"
>>  >> after all, we're making a big deal out of the fact that a path is
>>  >> *not a string*, but rather a string is a *representation* (or
>>  >> serialization) of a path.
>>
>> That's a decent point.
>>
>> So the plausible choices are, I think:
>>
>> - __fspath__  # File System Path -- possible confusion with Path
>
> +1

I like __fspath__, but I'm also sympathetic to Koos' point that we're
really dealing with path *names* being produced via this protocol,
rather than the paths themselves.

That would bring the completely explicit "__fspathname__" into the
mix, which would be comparable in length to "__getattribute__" as a
magic method name (both in terms of number of syllable and number of
characters).

Considering the helper function usage, here's some examples in
combination with os.fsencode and os.fsdecode:

# Status quo for binary/text path conversions
text_path = os.fsdecode(bytes_path)
bytes_path = os.fsencode(text_path)

# Getting a text path from an arbitrary object
text_path = os.fspath(obj) # This doesn't scream "returns text!" to me
text_path = os.fspathname(obj) # This does

# Getting a binary path from an arbitrary object
bytes_path = os.fsencode(os.fspath(obj))
bytes_path = os.fsencode(os.fspathname(obj))

I'm starting to think the semantic nudge from the "name" suffix when
reading the code is worth the extra four characters when writing it
(keeping in mind that the whole point of this exercise is that most
folks *won't* be writing explicit conversions - the stdlib will handle
it on their behalf).

I also think the more explicit name helps answer some of the type
signature questions that have arisen:

1. Does os.fspathname return rich Path objects? No, it returns names
as str objects
2. Will file descriptors pass through os.fspathname? No, as they're
not names, they're numeric descriptors.
3. Will bytes-like objects pass through os.fspathname? No, as they're
not names, they're encodings of names

When the name is instead "os.fspath", the appropriate answers to those
three questions are far more debatable.

> I personally still like __ospath__ as well.

That one fails the "Is it ambiguous when spoken aloud?" test for me:
if someone mentions "oh-ess-path", are they talking about os.path or
__ospath__? With "eff-ess-path" or "eff-ess-path-name", that problem
doesn't arise.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Question about the current implementation of str

2016-04-09 Thread Victor Stinner
Le 9 avr. 2016 03:04, "Larry Hastings"  a écrit :
> Although the str object is immutable from Python's perspective, the C
object itself is mutable.  For example, for dynamically-created strings the
hash field may be lazy-computed and cached inside the object.

Yes, the hash is computed once on demand. It doesn't matter how you build
the string.

> I was wondering if there were other fields like this.  For example, are
there similar lazy-computed cached objects for the different encoded
versions (utf8 utf16) of the str?

Cached utf8 is only cached when you call the C functions filling this
cache. The Python str.encode('utf8') doesn't fill the cache, but it uses it.

On Windows, there is a cache for wchar_t* which is utf16. This format is
used by all C functions of the Windows API (Python should only use the
Unicode flavor of the Windows API).

I don't recall other caches.

> What would really help an exhaustive list of the fields of a str object
that may ever change after the object's initial creation.

I don't recall exactly what happens if a cache is created and then the
string is modified. If I recall correctly, the cache is invalidated.

But the hash is used as an heuristic to decide if a string is "immutable"
or not, the refcount is also used by the heuristic. If the string is
immutable, an operation like resize must create a new string.

You can document the PEP 393 in Include/unicodeobject.h.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Incomplete Internationalization in Argparse Module

2016-04-09 Thread Grady Martin

I agree.  However, an incorrect choice for an argument with a choices parameter 
results in this string.

On 2016年04月08日 18時12分, Guido van Rossum wrote:


That string looks like it is aimed at the developer, not the user of
the program, so it makes sense not to translate it.

On Fri, Apr 8, 2016 at 2:07 PM, Brett Cannon  wrote:



On Fri, 8 Apr 2016 at 14:05 Grady Martin  wrote:


Hello, all.  I was wondering if the following string was left untouched by
gettext for a purpose (from line 720 of argparse.py, in class
ArgumentError):

'argument %(argument_name)s: %(message)s'

There may be other untranslatable strings in the argparse module, but I
have yet to encounter them in the wild.



Probably so that anyone introspecting on the error message can count on
somewhat of a consistent format (comes into play with doctest typically).

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/guido%40python.org





--
--Guido van Rossum (python.org/~guido)

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Question about the current implementation of str

2016-04-09 Thread Serhiy Storchaka

On 09.04.16 10:52, Victor Stinner wrote:

Le 9 avr. 2016 03:04, "Larry Hastings" mailto:la...@hastings.org>> a écrit :
 > Although the str object is immutable from Python's perspective, the C
object itself is mutable.  For example, for dynamically-created strings
the hash field may be lazy-computed and cached inside the object.

Yes, the hash is computed once on demand. It doesn't matter how you
build the string.

 > I was wondering if there were other fields like this.  For example,
are there similar lazy-computed cached objects for the different encoded
versions (utf8 utf16) of the str?

Cached utf8 is only cached when you call the C functions filling this
cache. The Python str.encode('utf8') doesn't fill the cache, but it uses it.

On Windows, there is a cache for wchar_t* which is utf16. This format is
used by all C functions of the Windows API (Python should only use the
Unicode flavor of the Windows API).

I don't recall other caches.

 > What would really help an exhaustive list of the fields of a str
object that may ever change after the object's initial creation.

I don't recall exactly what happens if a cache is created and then the
string is modified. If I recall correctly, the cache is invalidated.


You must remember, some bugs with desynchronized utf8 and wchar_t* 
caches were fixed just few months ago.



But the hash is used as an heuristic to decide if a string is
"immutable" or not, the refcount is also used by the heuristic. If the
string is immutable, an operation like resize must create a new string.

You can document the PEP 393 in Include/unicodeobject.h.


In normal case the string object can be mutated only at creation time. 
But CPython uses some tricks that modifies already created strings if 
they have no external references and are not interned. For example "a += 
b" or "a = a + b" can resize the "a" string.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Question about the current implementation of str

2016-04-09 Thread Victor Stinner
2016-04-09 9:52 GMT+02:00 Victor Stinner :
> But the hash is used as an heuristic to decide if a string is "immutable" or
> not, the refcount is also used by the heuristic. If the string is immutable,
> an operation like resize must create a new string.

I'm talking about this private function:

static int
unicode_modifiable(PyObject *unicode)
{
assert(_PyUnicode_CHECK(unicode));
if (Py_REFCNT(unicode) != 1)
return 0;
if (_PyUnicode_HASH(unicode) != -1)
return 0;
if (PyUnicode_CHECK_INTERNED(unicode))
return 0;
if (!PyUnicode_CheckExact(unicode))
return 0;
#ifdef Py_DEBUG
/* singleton refcount is greater than 1 */
assert(!unicode_is_singleton(unicode));
#endif
return 1;
}

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancments - method name only

2016-04-09 Thread Giampaolo Rodola'
On Fri, Apr 8, 2016 at 9:09 PM, Chris Angelico  wrote:

> On Sat, Apr 9, 2016 at 5:03 AM, Chris Barker 
> wrote:
> > On Fri, Apr 8, 2016 at 11:34 AM, Koos Zevenhoven 
> wrote:
> >>
> >> >
> >> > __pathstr__ # pathstring
> >> >
> >>
> >> Or perhaps __pathstring__ in case it may be or return byte strings.
> >
> >
> > I'm fine with __pathstring__ , but I thought it was already decided that
> it
> > would NOT return a bytestring!
>
> I sincerely hope that's been settled on. There's no reason to have
> this ever return anything other than a str. (Famous last words, I
> know.)
>
> ChrisA
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/g.rodola%40gmail.com
>

I'm kind of scared about this: scared to state and be 100% sure that bytes
won't *never ever* be returned.
As such I would call this __fspath__ or something, but I would definitively
avoid to use "str".

-- 
Giampaolo - http://grodola.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?)

2016-04-09 Thread Koos Zevenhoven
On Sat, Apr 9, 2016 at 10:16 AM, Ethan Furman  wrote:
> On 04/09/2016 12:07 AM, Victor Stinner wrote:
>>
>> os.DirEntry doesn't support bytes: os.scandir() only accept str. It's a
>> deliberate choice.
>
>
> 3.5.0 scandir supports bytes:
>
> --> huh = list(scandir(b'.'))
> --> huh
> [, ,  b'__MACOSX'>, , ,  b'index.html'>]
>
> --> huh[0].path
> b'./minicourse-ajax-project'
>
>

Maybe it's the bytes support in scandir that should be deprecated?
(And not bytes support in general, which cannot be done on posix, as I
hear Stephen T. will tell me).

-Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-09 Thread Victor Stinner
Please don't loose time trying yet another sandbox inside CPython. It's
just a waste of time. It's broken by design.

Please read my email about my attempt (pysandbox):
https://lwn.net/Articles/574323/

And the LWN article:
https://lwn.net/Articles/574215/

There are a lot of safe ways to run CPython inside a sandbox (and not rhe
opposite).

I started as you, add more and more things to a blacklist, but it doesn't
work.

See pysandbox test suite for a lot of ways to escape a sandbox. CPython has
a list of know code to crash CPython (I don't recall the dieectory in
sources), even with the latest version of CPython.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-09 Thread Maciej Fijalkowski
I'm with Victor here. In fact I tried (and failed) to convince Victor
that the approach is entirely unworkable when he was starting, don't
be the next one :-)

On Sat, Apr 9, 2016 at 3:43 PM, Victor Stinner  wrote:
> Please don't loose time trying yet another sandbox inside CPython. It's just
> a waste of time. It's broken by design.
>
> Please read my email about my attempt (pysandbox):
> https://lwn.net/Articles/574323/
>
> And the LWN article:
> https://lwn.net/Articles/574215/
>
> There are a lot of safe ways to run CPython inside a sandbox (and not rhe
> opposite).
>
> I started as you, add more and more things to a blacklist, but it doesn't
> work.
>
> See pysandbox test suite for a lot of ways to escape a sandbox. CPython has
> a list of know code to crash CPython (I don't recall the dieectory in
> sources), even with the latest version of CPython.
>
> Victor
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancments - method name only

2016-04-09 Thread R. David Murray
On Sat, 09 Apr 2016 17:48:38 +1000, Nick Coghlan  wrote:
> On 9 April 2016 at 04:25, Brett Cannon  wrote:
> > On Fri, 8 Apr 2016 at 11:13 Ethan Furman  wrote:
> >> On 04/08/2016 10:46 AM, Koos Zevenhoven wrote:
> >>  > On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker  wrote:
> >>  >> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote:
> >>
> >>  >>> I'm still thinking a little bit about 'pathname', which to me sounds
> >>  >>> more like a string than fspath does.
> >>  >>
> >>  >>
> >>  >> I like that a lot - or even "__pathstr__" or "__pathstring__"
> >>  >> after all, we're making a big deal out of the fact that a path is
> >>  >> *not a string*, but rather a string is a *representation* (or
> >>  >> serialization) of a path.
> >>
> >> That's a decent point.
> >>
> >> So the plausible choices are, I think:
> >>
> >> - __fspath__  # File System Path -- possible confusion with Path
> >
> > +1
> 
> I like __fspath__, but I'm also sympathetic to Koos' point that we're
> really dealing with path *names* being produced via this protocol,
> rather than the paths themselves.
> 
> That would bring the completely explicit "__fspathname__" into the
> mix, which would be comparable in length to "__getattribute__" as a
> magic method name (both in terms of number of syllable and number of
> characters).

I'm not going to vote -1, but for the record I have no real intuition
as to what a "path name" would be.  An arbitrary identifier that we're
using to refer to an os path?

That is, a 'filename' is the identifier we've assigned to this thing
pointed to by an inode in linux, but an os path is a text representation
of the path from the root filename to a specified filename.  That is,
the path *is* the name, so to say "path name" sounds redundant and
confusing to me.

--David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Defining a path protocol

2016-04-09 Thread Nikolaus Rath
On Apr 07 2016, Donald Stufft  wrote:
>> On Apr 7, 2016, at 6:48 AM, Nikolaus Rath  wrote:
>> 
>> Does anyone anticipate any classes other than those from pathlib to come
>> with such a method?
>
>
> It seems like it would be reasonable for pathlib.Path to call fspath on the
> path passed to pathlib.Path.__init__, which would mean that if other libraries
> implemented __fspath__ then you could pass their path objects to pathlib and
> it would just work (and similarly, if they also called fspath it would enable
> interoperation between all of the various path libraries).

Indeed, but my question is: is this actually going to happen? Are there
going to be other libraries that will implement __fspath__, and will
there be demand for pathlib to support them?


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Incomplete Internationalization in Argparse Module

2016-04-09 Thread Guido van Rossum
OK, so this should be taken to the bug tracker.

On Saturday, April 9, 2016, Grady Martin  wrote:

> I agree.  However, an incorrect choice for an argument with a choices
> parameter results in this string.
>
> On 2016年04月08日 18時12分, Guido van Rossum wrote:
>
>>
>> That string looks like it is aimed at the developer, not the user of
>> the program, so it makes sense not to translate it.
>>
>> On Fri, Apr 8, 2016 at 2:07 PM, Brett Cannon  wrote:
>>
>>>
>>>
>>> On Fri, 8 Apr 2016 at 14:05 Grady Martin 
>>> wrote:
>>>

 Hello, all.  I was wondering if the following string was left untouched
 by
 gettext for a purpose (from line 720 of argparse.py, in class
 ArgumentError):

 'argument %(argument_name)s: %(message)s'

 There may be other untranslatable strings in the argparse module, but I
 have yet to encounter them in the wild.

>>>
>>>
>>> Probably so that anyone introspecting on the error message can count on
>>> somewhat of a consistent format (comes into play with doctest typically).
>>>
>>> ___
>>> Python-Dev mailing list
>>> Python-Dev@python.org
>>> https://mail.python.org/mailman/listinfo/python-dev
>>> Unsubscribe:
>>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>>>
>>>
>>
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>>
>

-- 
--Guido (mobile)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?)

2016-04-09 Thread Ethan Furman

On 04/09/2016 03:51 AM, Koos Zevenhoven wrote:

On Sat, Apr 9, 2016 at 10:16 AM, Ethan Furman  wrote:



3.5.0 scandir supports bytes:


Maybe it's the bytes support in scandir that should be deprecated?
(And not bytes support in general, which cannot be done on posix, as I
hear Stephen T. will tell me).


No, scandir is a low-level function -- it needs to support bytes.

--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Defining a path protocol

2016-04-09 Thread Ethan Furman

On 04/09/2016 07:32 AM, Nikolaus Rath wrote:

On Apr 07 2016, Donald Stufft  wrote:

On Apr 7, 2016, at 6:48 AM, Nikolaus Rath  wrote:

Does anyone anticipate any classes other than those from pathlib to come
with such a method?



It seems like it would be reasonable for pathlib.Path to call fspath on the
path passed to pathlib.Path.__init__, which would mean that if other libraries
implemented __fspath__ then you could pass their path objects to pathlib and
it would just work (and similarly, if they also called fspath it would enable
interoperation between all of the various path libraries).


Indeed, but my question is: is this actually going to happen? Are there
going to be other libraries that will implement __fspath__, and will
there be demand for pathlib to support them?


There will be at least one.  :)

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-09 Thread Ethan Furman

On 04/09/2016 12:48 AM, Nick Coghlan wrote:

> Considering the helper function usage, here's some examples in
> combination with os.fsencode and os.fsdecode:
>
>   # Status quo for binary/text path conversions
>   text_path = os.fsdecode(bytes_path)
>   bytes_path = os.fsencode(text_path)
>
>   # Getting a text path from an arbitrary object
>   text_path = os.fspath(obj) # This doesn't scream "returns text!"
>   text_path = os.fspathname(obj) # This does
>
>   # Getting a binary path from an arbitrary object
>   bytes_path = os.fsencode(os.fspath(obj))
>   bytes_path = os.fsencode(os.fspathname(obj))
>
> I'm starting to think the semantic nudge from the "name" suffix when
> reading the code is worth the extra four characters when writing it
> (keeping in mind that the whole point of this exercise is that most
> folks *won't* be writing explicit conversions - the stdlib will handle
> it on their behalf).
>
> I also think the more explicit name helps answer some of the type
> signature questions that have arisen:
>
> 1. Does os.fspathname return rich Path objects? No, it returns names
> as str objects
> 2. Will file descriptors pass through os.fspathname? No, as they're
> not names, they're numeric descriptors.
> 3. Will bytes-like objects pass through os.fspathname? No, as they're
> not names, they're encodings of names

This worries me.

I know the primary purpose of this change is to enable pathlib and os 
and the rest of the stdlib to work together, but consider . . .


If adding a new attribute/method was as far as we went, new code (stdlib 
or otherwise) would look like:


  if isinstance(a_path_thingy, bytes):
  # because os can accept bytes
  pass
  elif isinstance(a_path_thingy, str):
  # but it's usually text
  pass
  elif hasattr(a_path_thingy, '__fspath__'):
  a_path_thingy = a_path_thingy.__fspath__()
  else:
  raise TypeError('not a valid path')
  # do something with the path

If we add os.fspath(), but don't allow bytes to be returned from it, our 
above example looks more like:


  if isinstance(a_path_thingy, bytes):
  # because os can accept bytes
  pass
  else:
  a_path_thingy = os.fspath(a_path_thingy)
  # do something with the path

Yes, it's better -- but it still requires a pre-check before calling 
os.fspath().


It is my contention that this is better:

  a_path_thingy = os.fspath(a_path_thingy)

This raises two issues:

1) Part of the stdlib is the new scandir module, which can work
   with, and return, both bytes and text -- if __fspath__ can only
   hold text, DirEntry will not get the __fspath__ method added,
   and the pre-check, boiler-plate code will flourish;

2) pathlib.Path accepts bytes -- so what happens when a byte-derived
   Path is passed to os.fspath()?  Is a TypeError raised?  Do we
   guess and auto-convert with fsdecode()?

I think the best answer is to

- let __fspath__ hold bytes as well as text
- let fspath() return bytes as well as text

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Incomplete Internationalization in Argparse Module

2016-04-09 Thread Grady Martin

Excellent.  Issue/patch here:

http://bugs.python.org/issue26726

On 2016年04月09日 08時16分, Guido van Rossum wrote:


OK, so this should be taken to the bug tracker.

On Saturday, April 9, 2016, Grady Martin  wrote:


I agree.  However, an incorrect choice for an argument with a choices
parameter results in this string.

On 2016年04月08日 18時12分, Guido van Rossum wrote:



That string looks like it is aimed at the developer, not the user of
the program, so it makes sense not to translate it.

On Fri, Apr 8, 2016 at 2:07 PM, Brett Cannon  wrote:




On Fri, 8 Apr 2016 at 14:05 Grady Martin 
wrote:



Hello, all.  I was wondering if the following string was left untouched
by
gettext for a purpose (from line 720 of argparse.py, in class
ArgumentError):

'argument %(argument_name)s: %(message)s'

There may be other untranslatable strings in the argparse module, but I
have yet to encounter them in the wild.




Probably so that anyone introspecting on the error message can count on
somewhat of a consistent format (comes into play with doctest typically).

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/guido%40python.org





--
--Guido van Rossum (python.org/~guido)





--
--Guido (mobile)

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-09 Thread Nick Coghlan
On 9 April 2016 at 22:43, Victor Stinner  wrote:
> Please don't loose time trying yet another sandbox inside CPython. It's just
> a waste of time. It's broken by design.
>
> Please read my email about my attempt (pysandbox):
> https://lwn.net/Articles/574323/
>
> And the LWN article:
> https://lwn.net/Articles/574215/
>
> There are a lot of safe ways to run CPython inside a sandbox (and not rhe
> opposite).
>
> I started as you, add more and more things to a blacklist, but it doesn't
> work.
>
> See pysandbox test suite for a lot of ways to escape a sandbox. CPython has
> a list of know code to crash CPython (I don't recall the dieectory in
> sources), even with the latest version of CPython.

They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers

There's also https://hg.python.org/cpython/file/tip/Lib/test/test_crashers.py
which was designed to run them regularly to catch when they were
resolved, but it was too fragile and tended to hang the buildbots.

Even without those considerations though, there are system level
denial of service attacks that untrusted code can perform without even
trying to break out of the sandbox - the most naive is "while 1:
pass", but there are more interesting ones like "from itertools import
count; sum(count())", or even "sum(iter(int, 1))" and "list(iter(int,
1))".

Operating system level security sandboxes still aren't particularly
easy to use correctly, but they're a lot more reliable than language
runtime level sandboxes, can be used to defend against many more
attack vectors, and even offer increased flexibility (e.g. "can write
to these directories, but no others", "can read these files, but no
others", "can contact these IP addresses, but no others").

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancments - method name only

2016-04-09 Thread Nick Coghlan
On 9 April 2016 at 23:02, R. David Murray  wrote:
> That is, a 'filename' is the identifier we've assigned to this thing
> pointed to by an inode in linux, but an os path is a text representation
> of the path from the root filename to a specified filename.  That is,
> the path *is* the name, so to say "path name" sounds redundant and
> confusing to me.

"The path is the name" is a true statement in the context of:

1. The way *nix APIs work
2. Existing filesystem interfaces in the standard library
3. Path abstractions that inherit from str/unicode

It's no longer true in the context of pathlib - there, the path name
is a serialised representation of a rich path object.

It's also not really true in the context of Python 3 in general -
bytes-like objects are an encoding of the path name, rather than the
name itself.

This means that "path" has become ambiguous due to the changing
context - do we mean the path name representation, the binary encoding
of that name, or a higher level rich path object?

We're never going to be able to eliminate that ambiguity (Python's
*nix & C roots run too deep for that), but we *can* potentially
standardise the terms used when disambiguation is needed: path name
(str), encoded path name (bytes-like object), rich path object (object
implementing the new protocol)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 506 secrets module

2016-04-09 Thread Steven D'Aprano
I've just spotted this email from Guido, sorry about the delay in 
responding.

Further comments below.


On Thu, Jan 14, 2016 at 10:47:09AM -0800, Guido van Rossum wrote:

> I think the discussion petered out and nobody asked me to approve it yet
> (or I lost track of it). I'm almost happy to approve it in the current
> state. My only quibble is with some naming -- I'm not sure that a
> super-generic name like 'equal' is better than the original
> ('compare_digest'), 

Changed.


> and I would have picked a different name for token_url
> -- probably token_urlsafe. But maybe Steven can convince me that the names
> currently in the PEP are better.

Changed.


> (I also don't like the wishy-washy
> position of the PEP on the actual specs of the proposed functions. But I'm
> fine with the actual implementation shown as the spec.)

I'm not really sure what you want me to do to improve that. Can you be 
more concrete about what you would like the PEP to say?


I haven't updated the PEP yet, but the newest version of the secrets 
module with the changes requested is here:

https://bitbucket.org/sdaprano/secrets



-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-09 Thread Nick Coghlan
On 10 April 2016 at 02:41, Ethan Furman  wrote:
> If we add os.fspath(), but don't allow bytes to be returned from it, our
> above example looks more like:
>
>   if isinstance(a_path_thingy, bytes):
>   # because os can accept bytes
>   pass
>   else:
>   a_path_thingy = os.fspath(a_path_thingy)
>   # do something with the path
>
> Yes, it's better -- but it still requires a pre-check before calling
> os.fspath().
>
> It is my contention that this is better:
>
>   a_path_thingy = os.fspath(a_path_thingy)

That approach often doesn't work, though - by design, there are
situations where you can't transparently handle bytes and str with the
same code path in Python 3 the way you could in Python 2.

When somebody hands you bytes rather than text you need to worry about
the encoding, and you need to worry about returning bytes rather than
text yourself. https://hg.python.org/cpython/rev/e44410e5928e#l4.1
provides an illustration of how fiddly that can get, and that's in the
URL context - cross-platform filesystem path handling is worse, since
you need to worry about the significant differences between the way
Windows and *nix handle binary paths, and you can't use os.sep
directly any more (since that's always text).

> This raises two issues:
>
> 1) Part of the stdlib is the new scandir module, which can work
>with, and return, both bytes and text -- if __fspath__ can only
>hold text, DirEntry will not get the __fspath__ method added,
>and the pre-check, boiler-plate code will flourish;

DirEntry can still get the check, it can just throw TypeError when it
represents a binary path (that's one of the advantages of using a
method-based protocol - exceptions on method calls are more acceptable
than exceptions on property access).

> 2) pathlib.Path accepts bytes -- so what happens when a byte-derived
>Path is passed to os.fspath()?  Is a TypeError raised?  Do we
>guess and auto-convert with fsdecode()?

pathlib is str-only (which makes sense, since it's a cross-platform
API and binary paths basically don't work on Windows):

>>> pathlib.Path(b".")
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib64/python3.4/pathlib.py", line 907, in __new__
self = cls._from_parts(args, init=False)
  File "/usr/lib64/python3.4/pathlib.py", line 589, in _from_parts
drv, root, parts = self._parse_args(args)
  File "/usr/lib64/python3.4/pathlib.py", line 581, in _parse_args
% type(a))
TypeError: argument should be a path or str object, not 

The only specific mention of binary support in the pathlib docs is to
state that "bytes(p)" uses os.fsencode() to convert to the binary
representation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancments - method name only

2016-04-09 Thread Greg Ewing

Brett Cannon wrote:


Depends if you use `/` or `\` as your path separator


Or whether your pathnames look entirely different, e.g VMS:

  device:[topdir.subdir.subsubdir]filename.ext;version

Pathnames are very much OS-dependent in both syntax *and* semantics.

Even the main two in use today (unix and windows) can't be
mapped directly onto each other, because windows has drive
letters and unix doesn't.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib (was: Defining a path protocol)

2016-04-09 Thread Greg Ewing

Nick Coghlan wrote:

We want to be able to readily use the protocol helper in builtin
modules like os and low level Python modules like os.path, which means
we want it to be much lower down in the import hierarchy than pathlib.


Also, it's more general than that. It works on any
object that wants to behave as a path, not just
pathlib ones, so it should be in a neutral place.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancments - method name only

2016-04-09 Thread Greg Ewing

Eric Snow wrote:

All this matters because it impacts the value returned from
__ospath__().  Should it return the string representation of the path
for the current OS or some standardized representation?


What standardized representation? I'm not aware of such
a thing.


I'd expect
the former.  However, if that is the expectation then something like
pathlib.PureWindowsPath will give you the wrong thing if your current
OS is linux.


No, you should get the representation corresponding to
the kind of path object you started with. If you're
working with Windows path objects on a Unix system,
they must be representing something on some Windows
system somewhere, not the one you're running the code
on. The only reason to ask for a string representation
of such a path is for use by that other system.

I don't think it even makes sense to ask for a Unix
representation of a Windows path or vice versa, because
the semantics are different. How do you translate a
Windows drive letter into Unix? What drive letter do
you use for an absolute Unix path?

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancments - method name only

2016-04-09 Thread Nick Coghlan
On 10 April 2016 at 15:58, Greg Ewing  wrote:
> Brett Cannon wrote:
>
>> Depends if you use `/` or `\` as your path separator
>
>
> Or whether your pathnames look entirely different, e.g VMS:
>
>   device:[topdir.subdir.subsubdir]filename.ext;version
>
> Pathnames are very much OS-dependent in both syntax *and* semantics.
>
> Even the main two in use today (unix and windows) can't be
> mapped directly onto each other, because windows has drive
> letters and unix doesn't.

This does raise a concrete API design question: how should
PurePath.__fspath__ behave when called on a mismatched OS?

For PurePath vs Path, the latter raises NotImplementedError if you try
to create a concrete path that doesn't match the running system:

   >>> pathlib.PureWindowsPath(".")
   PureWindowsPath('.')
   >>> pathlib.WindowsPath(".")
   Traceback (most recent call last):
 File "", line 1, in 
 File "/usr/lib64/python3.4/pathlib.py", line 910, in __new__
   % (cls.__name__,))
   NotImplementedError: cannot instantiate 'WindowsPath' on your system

The question we need to address is what happens if you do:

   >>> os.fspath(pathlib.PureWindowsPath("."))

on a *nix system?

Similar to my proposal for dealing with DirEntry.path being a
bytes-like object, I'd like to suggest raising TypeError in __fspath__
if the request is nonsensical for the currently running system - *nix
systems can *manipulate* Windows paths (and vice-versa), but actually
trying to *use* them with the local filesystem isn't going to work
properly, since the syntax and semantics are different.

   >>> os.fspath(pathlib.WindowsPath("."))
   Traceback (most recent call last):
   ...
   TypeError: cannot render 'PureWindowsPath' as filesystem path on
'posix' system

(I'm also suggesting replacing "your" with the value of os.name)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancments - method name only

2016-04-09 Thread Greg Ewing

On 9 April 2016 at 23:02, R. David Murray  wrote:


That is, a 'filename' is the identifier we've assigned to this thing
pointed to by an inode in linux, but an os path is a text representation
of the path from the root filename to a specified filename.  That is,
the path *is* the name, so to say "path name" sounds redundant and
confusing to me.


The term "pathname" is what is conventionally used to refer
to a textual string passed to the OS to identify an object
in the file system.

It's often abbreviated to just "path", but that's ambiguous
for our purposes, because "path" can also refer to one of
our higher-level objects.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com