Re: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?)
On 9 April 2016 at 02:02, Koos Zevenhoven wrote: > I'm still thinking a little bit about 'pathname', which to me sounds > more like a string than fspath does [1]. It would be nice to have the > string/path distinction especially when pathlib adoption grows larger. > But who knows, maybe somewhere in the far future, no-one will care > much about fspath, fsencode, fsdecode or os.path. Ah, I like it - adding the "name" suffix nicely distinguishes the protocol from the rich path objects in pathlib. I'll catch up on Ethan's dedicated naming thread before commenting further, though :) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?)
os.DirEntry doesn't support bytes: os.scandir() only accept str. It's a deliberate choice. I strongly suggest to only support Unicode for filenames in Python 3. So __fspath__ must only return str, or a TypeError must be raised. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?)
On 04/09/2016 12:07 AM, Victor Stinner wrote: os.DirEntry doesn't support bytes: os.scandir() only accept str. It's a deliberate choice. 3.5.0 scandir supports bytes: --> huh = list(scandir(b'.')) --> huh [, , b'__MACOSX'>, , , b'index.html'>] --> huh[0].path b'./minicourse-ajax-project' -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Question about the current implementation of str
On 9 April 2016 at 10:56, Larry Hastings wrote: > > > I have a straightforward question about the str object, specifically the > PyUnicodeObject. I've tried reading the source to answer the question > myself but it's nearly impenetrable. So I was hoping someone here who > understands the current implementation could answer it for me. > > Although the str object is immutable from Python's perspective, the C object > itself is mutable. For example, for dynamically-created strings the hash > field may be lazy-computed and cached inside the object. I was wondering if > there were other fields like this. For example, are there similar > lazy-computed cached objects for the different encoded versions (utf8 utf16) > of the str? What would really help an exhaustive list of the fields of a > str object that may ever change after the object's initial creation. https://www.python.org/dev/peps/pep-0393/#specification should have most of the relevant details. Aside from the hash and the interned-or-not flag in the state, most things should be locked once the string is ready, except that generating the utf-8 and wchar_t forms is deferred until they're needed if they're not the same as the canonical form. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pathlib enhancments - method name only
On 9 April 2016 at 04:25, Brett Cannon wrote: > On Fri, 8 Apr 2016 at 11:13 Ethan Furman wrote: >> On 04/08/2016 10:46 AM, Koos Zevenhoven wrote: >> > On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker wrote: >> >> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote: >> >> >>> I'm still thinking a little bit about 'pathname', which to me sounds >> >>> more like a string than fspath does. >> >> >> >> >> >> I like that a lot - or even "__pathstr__" or "__pathstring__" >> >> after all, we're making a big deal out of the fact that a path is >> >> *not a string*, but rather a string is a *representation* (or >> >> serialization) of a path. >> >> That's a decent point. >> >> So the plausible choices are, I think: >> >> - __fspath__ # File System Path -- possible confusion with Path > > +1 I like __fspath__, but I'm also sympathetic to Koos' point that we're really dealing with path *names* being produced via this protocol, rather than the paths themselves. That would bring the completely explicit "__fspathname__" into the mix, which would be comparable in length to "__getattribute__" as a magic method name (both in terms of number of syllable and number of characters). Considering the helper function usage, here's some examples in combination with os.fsencode and os.fsdecode: # Status quo for binary/text path conversions text_path = os.fsdecode(bytes_path) bytes_path = os.fsencode(text_path) # Getting a text path from an arbitrary object text_path = os.fspath(obj) # This doesn't scream "returns text!" to me text_path = os.fspathname(obj) # This does # Getting a binary path from an arbitrary object bytes_path = os.fsencode(os.fspath(obj)) bytes_path = os.fsencode(os.fspathname(obj)) I'm starting to think the semantic nudge from the "name" suffix when reading the code is worth the extra four characters when writing it (keeping in mind that the whole point of this exercise is that most folks *won't* be writing explicit conversions - the stdlib will handle it on their behalf). I also think the more explicit name helps answer some of the type signature questions that have arisen: 1. Does os.fspathname return rich Path objects? No, it returns names as str objects 2. Will file descriptors pass through os.fspathname? No, as they're not names, they're numeric descriptors. 3. Will bytes-like objects pass through os.fspathname? No, as they're not names, they're encodings of names When the name is instead "os.fspath", the appropriate answers to those three questions are far more debatable. > I personally still like __ospath__ as well. That one fails the "Is it ambiguous when spoken aloud?" test for me: if someone mentions "oh-ess-path", are they talking about os.path or __ospath__? With "eff-ess-path" or "eff-ess-path-name", that problem doesn't arise. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Question about the current implementation of str
Le 9 avr. 2016 03:04, "Larry Hastings" a écrit : > Although the str object is immutable from Python's perspective, the C object itself is mutable. For example, for dynamically-created strings the hash field may be lazy-computed and cached inside the object. Yes, the hash is computed once on demand. It doesn't matter how you build the string. > I was wondering if there were other fields like this. For example, are there similar lazy-computed cached objects for the different encoded versions (utf8 utf16) of the str? Cached utf8 is only cached when you call the C functions filling this cache. The Python str.encode('utf8') doesn't fill the cache, but it uses it. On Windows, there is a cache for wchar_t* which is utf16. This format is used by all C functions of the Windows API (Python should only use the Unicode flavor of the Windows API). I don't recall other caches. > What would really help an exhaustive list of the fields of a str object that may ever change after the object's initial creation. I don't recall exactly what happens if a cache is created and then the string is modified. If I recall correctly, the cache is invalidated. But the hash is used as an heuristic to decide if a string is "immutable" or not, the refcount is also used by the heuristic. If the string is immutable, an operation like resize must create a new string. You can document the PEP 393 in Include/unicodeobject.h. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Incomplete Internationalization in Argparse Module
I agree. However, an incorrect choice for an argument with a choices parameter results in this string. On 2016年04月08日 18時12分, Guido van Rossum wrote: That string looks like it is aimed at the developer, not the user of the program, so it makes sense not to translate it. On Fri, Apr 8, 2016 at 2:07 PM, Brett Cannon wrote: On Fri, 8 Apr 2016 at 14:05 Grady Martin wrote: Hello, all. I was wondering if the following string was left untouched by gettext for a purpose (from line 720 of argparse.py, in class ArgumentError): 'argument %(argument_name)s: %(message)s' There may be other untranslatable strings in the argparse module, but I have yet to encounter them in the wild. Probably so that anyone introspecting on the error message can count on somewhat of a consistent format (comes into play with doctest typically). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Question about the current implementation of str
On 09.04.16 10:52, Victor Stinner wrote: Le 9 avr. 2016 03:04, "Larry Hastings" mailto:la...@hastings.org>> a écrit : > Although the str object is immutable from Python's perspective, the C object itself is mutable. For example, for dynamically-created strings the hash field may be lazy-computed and cached inside the object. Yes, the hash is computed once on demand. It doesn't matter how you build the string. > I was wondering if there were other fields like this. For example, are there similar lazy-computed cached objects for the different encoded versions (utf8 utf16) of the str? Cached utf8 is only cached when you call the C functions filling this cache. The Python str.encode('utf8') doesn't fill the cache, but it uses it. On Windows, there is a cache for wchar_t* which is utf16. This format is used by all C functions of the Windows API (Python should only use the Unicode flavor of the Windows API). I don't recall other caches. > What would really help an exhaustive list of the fields of a str object that may ever change after the object's initial creation. I don't recall exactly what happens if a cache is created and then the string is modified. If I recall correctly, the cache is invalidated. You must remember, some bugs with desynchronized utf8 and wchar_t* caches were fixed just few months ago. But the hash is used as an heuristic to decide if a string is "immutable" or not, the refcount is also used by the heuristic. If the string is immutable, an operation like resize must create a new string. You can document the PEP 393 in Include/unicodeobject.h. In normal case the string object can be mutated only at creation time. But CPython uses some tricks that modifies already created strings if they have no external references and are not interned. For example "a += b" or "a = a + b" can resize the "a" string. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Question about the current implementation of str
2016-04-09 9:52 GMT+02:00 Victor Stinner : > But the hash is used as an heuristic to decide if a string is "immutable" or > not, the refcount is also used by the heuristic. If the string is immutable, > an operation like resize must create a new string. I'm talking about this private function: static int unicode_modifiable(PyObject *unicode) { assert(_PyUnicode_CHECK(unicode)); if (Py_REFCNT(unicode) != 1) return 0; if (_PyUnicode_HASH(unicode) != -1) return 0; if (PyUnicode_CHECK_INTERNED(unicode)) return 0; if (!PyUnicode_CheckExact(unicode)) return 0; #ifdef Py_DEBUG /* singleton refcount is greater than 1 */ assert(!unicode_is_singleton(unicode)); #endif return 1; } Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pathlib enhancments - method name only
On Fri, Apr 8, 2016 at 9:09 PM, Chris Angelico wrote: > On Sat, Apr 9, 2016 at 5:03 AM, Chris Barker > wrote: > > On Fri, Apr 8, 2016 at 11:34 AM, Koos Zevenhoven > wrote: > >> > >> > > >> > __pathstr__ # pathstring > >> > > >> > >> Or perhaps __pathstring__ in case it may be or return byte strings. > > > > > > I'm fine with __pathstring__ , but I thought it was already decided that > it > > would NOT return a bytestring! > > I sincerely hope that's been settled on. There's no reason to have > this ever return anything other than a str. (Famous last words, I > know.) > > ChrisA > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/g.rodola%40gmail.com > I'm kind of scared about this: scared to state and be 100% sure that bytes won't *never ever* be returned. As such I would call this __fspath__ or something, but I would definitively avoid to use "str". -- Giampaolo - http://grodola.blogspot.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?)
On Sat, Apr 9, 2016 at 10:16 AM, Ethan Furman wrote: > On 04/09/2016 12:07 AM, Victor Stinner wrote: >> >> os.DirEntry doesn't support bytes: os.scandir() only accept str. It's a >> deliberate choice. > > > 3.5.0 scandir supports bytes: > > --> huh = list(scandir(b'.')) > --> huh > [, , b'__MACOSX'>, , , b'index.html'>] > > --> huh[0].path > b'./minicourse-ajax-project' > > Maybe it's the bytes support in scandir that should be deprecated? (And not bytes support in general, which cannot be done on posix, as I hear Stephen T. will tell me). -Koos ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)
Please don't loose time trying yet another sandbox inside CPython. It's just a waste of time. It's broken by design. Please read my email about my attempt (pysandbox): https://lwn.net/Articles/574323/ And the LWN article: https://lwn.net/Articles/574215/ There are a lot of safe ways to run CPython inside a sandbox (and not rhe opposite). I started as you, add more and more things to a blacklist, but it doesn't work. See pysandbox test suite for a lot of ways to escape a sandbox. CPython has a list of know code to crash CPython (I don't recall the dieectory in sources), even with the latest version of CPython. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)
I'm with Victor here. In fact I tried (and failed) to convince Victor that the approach is entirely unworkable when he was starting, don't be the next one :-) On Sat, Apr 9, 2016 at 3:43 PM, Victor Stinner wrote: > Please don't loose time trying yet another sandbox inside CPython. It's just > a waste of time. It's broken by design. > > Please read my email about my attempt (pysandbox): > https://lwn.net/Articles/574323/ > > And the LWN article: > https://lwn.net/Articles/574215/ > > There are a lot of safe ways to run CPython inside a sandbox (and not rhe > opposite). > > I started as you, add more and more things to a blacklist, but it doesn't > work. > > See pysandbox test suite for a lot of ways to escape a sandbox. CPython has > a list of know code to crash CPython (I don't recall the dieectory in > sources), even with the latest version of CPython. > > Victor > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pathlib enhancments - method name only
On Sat, 09 Apr 2016 17:48:38 +1000, Nick Coghlan wrote: > On 9 April 2016 at 04:25, Brett Cannon wrote: > > On Fri, 8 Apr 2016 at 11:13 Ethan Furman wrote: > >> On 04/08/2016 10:46 AM, Koos Zevenhoven wrote: > >> > On Fri, Apr 8, 2016 at 7:42 PM, Chris Barker wrote: > >> >> On Fri, Apr 8, 2016 at 9:02 AM, Koos Zevenhoven wrote: > >> > >> >>> I'm still thinking a little bit about 'pathname', which to me sounds > >> >>> more like a string than fspath does. > >> >> > >> >> > >> >> I like that a lot - or even "__pathstr__" or "__pathstring__" > >> >> after all, we're making a big deal out of the fact that a path is > >> >> *not a string*, but rather a string is a *representation* (or > >> >> serialization) of a path. > >> > >> That's a decent point. > >> > >> So the plausible choices are, I think: > >> > >> - __fspath__ # File System Path -- possible confusion with Path > > > > +1 > > I like __fspath__, but I'm also sympathetic to Koos' point that we're > really dealing with path *names* being produced via this protocol, > rather than the paths themselves. > > That would bring the completely explicit "__fspathname__" into the > mix, which would be comparable in length to "__getattribute__" as a > magic method name (both in terms of number of syllable and number of > characters). I'm not going to vote -1, but for the record I have no real intuition as to what a "path name" would be. An arbitrary identifier that we're using to refer to an os path? That is, a 'filename' is the identifier we've assigned to this thing pointed to by an inode in linux, but an os path is a text representation of the path from the root filename to a specified filename. That is, the path *is* the name, so to say "path name" sounds redundant and confusing to me. --David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Defining a path protocol
On Apr 07 2016, Donald Stufft wrote: >> On Apr 7, 2016, at 6:48 AM, Nikolaus Rath wrote: >> >> Does anyone anticipate any classes other than those from pathlib to come >> with such a method? > > > It seems like it would be reasonable for pathlib.Path to call fspath on the > path passed to pathlib.Path.__init__, which would mean that if other libraries > implemented __fspath__ then you could pass their path objects to pathlib and > it would just work (and similarly, if they also called fspath it would enable > interoperation between all of the various path libraries). Indeed, but my question is: is this actually going to happen? Are there going to be other libraries that will implement __fspath__, and will there be demand for pathlib to support them? Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.« signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Incomplete Internationalization in Argparse Module
OK, so this should be taken to the bug tracker. On Saturday, April 9, 2016, Grady Martin wrote: > I agree. However, an incorrect choice for an argument with a choices > parameter results in this string. > > On 2016年04月08日 18時12分, Guido van Rossum wrote: > >> >> That string looks like it is aimed at the developer, not the user of >> the program, so it makes sense not to translate it. >> >> On Fri, Apr 8, 2016 at 2:07 PM, Brett Cannon wrote: >> >>> >>> >>> On Fri, 8 Apr 2016 at 14:05 Grady Martin >>> wrote: >>> Hello, all. I was wondering if the following string was left untouched by gettext for a purpose (from line 720 of argparse.py, in class ArgumentError): 'argument %(argument_name)s: %(message)s' There may be other untranslatable strings in the argparse module, but I have yet to encounter them in the wild. >>> >>> >>> Probably so that anyone introspecting on the error message can count on >>> somewhat of a consistent format (comes into play with doctest typically). >>> >>> ___ >>> Python-Dev mailing list >>> Python-Dev@python.org >>> https://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: >>> https://mail.python.org/mailman/options/python-dev/guido%40python.org >>> >>> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> > -- --Guido (mobile) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Defining a path protocol (was: When should pathlib stop being provisional?)
On 04/09/2016 03:51 AM, Koos Zevenhoven wrote: On Sat, Apr 9, 2016 at 10:16 AM, Ethan Furman wrote: 3.5.0 scandir supports bytes: Maybe it's the bytes support in scandir that should be deprecated? (And not bytes support in general, which cannot be done on posix, as I hear Stephen T. will tell me). No, scandir is a low-level function -- it needs to support bytes. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Defining a path protocol
On 04/09/2016 07:32 AM, Nikolaus Rath wrote: On Apr 07 2016, Donald Stufft wrote: On Apr 7, 2016, at 6:48 AM, Nikolaus Rath wrote: Does anyone anticipate any classes other than those from pathlib to come with such a method? It seems like it would be reasonable for pathlib.Path to call fspath on the path passed to pathlib.Path.__init__, which would mean that if other libraries implemented __fspath__ then you could pass their path objects to pathlib and it would just work (and similarly, if they also called fspath it would enable interoperation between all of the various path libraries). Indeed, but my question is: is this actually going to happen? Are there going to be other libraries that will implement __fspath__, and will there be demand for pathlib to support them? There will be at least one. :) -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()
On 04/09/2016 12:48 AM, Nick Coghlan wrote: > Considering the helper function usage, here's some examples in > combination with os.fsencode and os.fsdecode: > > # Status quo for binary/text path conversions > text_path = os.fsdecode(bytes_path) > bytes_path = os.fsencode(text_path) > > # Getting a text path from an arbitrary object > text_path = os.fspath(obj) # This doesn't scream "returns text!" > text_path = os.fspathname(obj) # This does > > # Getting a binary path from an arbitrary object > bytes_path = os.fsencode(os.fspath(obj)) > bytes_path = os.fsencode(os.fspathname(obj)) > > I'm starting to think the semantic nudge from the "name" suffix when > reading the code is worth the extra four characters when writing it > (keeping in mind that the whole point of this exercise is that most > folks *won't* be writing explicit conversions - the stdlib will handle > it on their behalf). > > I also think the more explicit name helps answer some of the type > signature questions that have arisen: > > 1. Does os.fspathname return rich Path objects? No, it returns names > as str objects > 2. Will file descriptors pass through os.fspathname? No, as they're > not names, they're numeric descriptors. > 3. Will bytes-like objects pass through os.fspathname? No, as they're > not names, they're encodings of names This worries me. I know the primary purpose of this change is to enable pathlib and os and the rest of the stdlib to work together, but consider . . . If adding a new attribute/method was as far as we went, new code (stdlib or otherwise) would look like: if isinstance(a_path_thingy, bytes): # because os can accept bytes pass elif isinstance(a_path_thingy, str): # but it's usually text pass elif hasattr(a_path_thingy, '__fspath__'): a_path_thingy = a_path_thingy.__fspath__() else: raise TypeError('not a valid path') # do something with the path If we add os.fspath(), but don't allow bytes to be returned from it, our above example looks more like: if isinstance(a_path_thingy, bytes): # because os can accept bytes pass else: a_path_thingy = os.fspath(a_path_thingy) # do something with the path Yes, it's better -- but it still requires a pre-check before calling os.fspath(). It is my contention that this is better: a_path_thingy = os.fspath(a_path_thingy) This raises two issues: 1) Part of the stdlib is the new scandir module, which can work with, and return, both bytes and text -- if __fspath__ can only hold text, DirEntry will not get the __fspath__ method added, and the pre-check, boiler-plate code will flourish; 2) pathlib.Path accepts bytes -- so what happens when a byte-derived Path is passed to os.fspath()? Is a TypeError raised? Do we guess and auto-convert with fsdecode()? I think the best answer is to - let __fspath__ hold bytes as well as text - let fspath() return bytes as well as text -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Incomplete Internationalization in Argparse Module
Excellent. Issue/patch here: http://bugs.python.org/issue26726 On 2016年04月09日 08時16分, Guido van Rossum wrote: OK, so this should be taken to the bug tracker. On Saturday, April 9, 2016, Grady Martin wrote: I agree. However, an incorrect choice for an argument with a choices parameter results in this string. On 2016年04月08日 18時12分, Guido van Rossum wrote: That string looks like it is aimed at the developer, not the user of the program, so it makes sense not to translate it. On Fri, Apr 8, 2016 at 2:07 PM, Brett Cannon wrote: On Fri, 8 Apr 2016 at 14:05 Grady Martin wrote: Hello, all. I was wondering if the following string was left untouched by gettext for a purpose (from line 720 of argparse.py, in class ArgumentError): 'argument %(argument_name)s: %(message)s' There may be other untranslatable strings in the argparse module, but I have yet to encounter them in the wild. Probably so that anyone introspecting on the error message can count on somewhat of a consistent format (comes into play with doctest typically). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) -- --Guido (mobile) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)
On 9 April 2016 at 22:43, Victor Stinner wrote: > Please don't loose time trying yet another sandbox inside CPython. It's just > a waste of time. It's broken by design. > > Please read my email about my attempt (pysandbox): > https://lwn.net/Articles/574323/ > > And the LWN article: > https://lwn.net/Articles/574215/ > > There are a lot of safe ways to run CPython inside a sandbox (and not rhe > opposite). > > I started as you, add more and more things to a blacklist, but it doesn't > work. > > See pysandbox test suite for a lot of ways to escape a sandbox. CPython has > a list of know code to crash CPython (I don't recall the dieectory in > sources), even with the latest version of CPython. They're at https://hg.python.org/cpython/file/tip/Lib/test/crashers There's also https://hg.python.org/cpython/file/tip/Lib/test/test_crashers.py which was designed to run them regularly to catch when they were resolved, but it was too fragile and tended to hang the buildbots. Even without those considerations though, there are system level denial of service attacks that untrusted code can perform without even trying to break out of the sandbox - the most naive is "while 1: pass", but there are more interesting ones like "from itertools import count; sum(count())", or even "sum(iter(int, 1))" and "list(iter(int, 1))". Operating system level security sandboxes still aren't particularly easy to use correctly, but they're a lot more reliable than language runtime level sandboxes, can be used to defend against many more attack vectors, and even offer increased flexibility (e.g. "can write to these directories, but no others", "can read these files, but no others", "can contact these IP addresses, but no others"). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pathlib enhancments - method name only
On 9 April 2016 at 23:02, R. David Murray wrote: > That is, a 'filename' is the identifier we've assigned to this thing > pointed to by an inode in linux, but an os path is a text representation > of the path from the root filename to a specified filename. That is, > the path *is* the name, so to say "path name" sounds redundant and > confusing to me. "The path is the name" is a true statement in the context of: 1. The way *nix APIs work 2. Existing filesystem interfaces in the standard library 3. Path abstractions that inherit from str/unicode It's no longer true in the context of pathlib - there, the path name is a serialised representation of a rich path object. It's also not really true in the context of Python 3 in general - bytes-like objects are an encoding of the path name, rather than the name itself. This means that "path" has become ambiguous due to the changing context - do we mean the path name representation, the binary encoding of that name, or a higher level rich path object? We're never going to be able to eliminate that ambiguity (Python's *nix & C roots run too deep for that), but we *can* potentially standardise the terms used when disambiguation is needed: path name (str), encoded path name (bytes-like object), rich path object (object implementing the new protocol) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 506 secrets module
I've just spotted this email from Guido, sorry about the delay in responding. Further comments below. On Thu, Jan 14, 2016 at 10:47:09AM -0800, Guido van Rossum wrote: > I think the discussion petered out and nobody asked me to approve it yet > (or I lost track of it). I'm almost happy to approve it in the current > state. My only quibble is with some naming -- I'm not sure that a > super-generic name like 'equal' is better than the original > ('compare_digest'), Changed. > and I would have picked a different name for token_url > -- probably token_urlsafe. But maybe Steven can convince me that the names > currently in the PEP are better. Changed. > (I also don't like the wishy-washy > position of the PEP on the actual specs of the proposed functions. But I'm > fine with the actual implementation shown as the spec.) I'm not really sure what you want me to do to improve that. Can you be more concrete about what you would like the PEP to say? I haven't updated the PEP yet, but the newest version of the secrets module with the changes requested is here: https://bitbucket.org/sdaprano/secrets -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()
On 10 April 2016 at 02:41, Ethan Furman wrote: > If we add os.fspath(), but don't allow bytes to be returned from it, our > above example looks more like: > > if isinstance(a_path_thingy, bytes): > # because os can accept bytes > pass > else: > a_path_thingy = os.fspath(a_path_thingy) > # do something with the path > > Yes, it's better -- but it still requires a pre-check before calling > os.fspath(). > > It is my contention that this is better: > > a_path_thingy = os.fspath(a_path_thingy) That approach often doesn't work, though - by design, there are situations where you can't transparently handle bytes and str with the same code path in Python 3 the way you could in Python 2. When somebody hands you bytes rather than text you need to worry about the encoding, and you need to worry about returning bytes rather than text yourself. https://hg.python.org/cpython/rev/e44410e5928e#l4.1 provides an illustration of how fiddly that can get, and that's in the URL context - cross-platform filesystem path handling is worse, since you need to worry about the significant differences between the way Windows and *nix handle binary paths, and you can't use os.sep directly any more (since that's always text). > This raises two issues: > > 1) Part of the stdlib is the new scandir module, which can work >with, and return, both bytes and text -- if __fspath__ can only >hold text, DirEntry will not get the __fspath__ method added, >and the pre-check, boiler-plate code will flourish; DirEntry can still get the check, it can just throw TypeError when it represents a binary path (that's one of the advantages of using a method-based protocol - exceptions on method calls are more acceptable than exceptions on property access). > 2) pathlib.Path accepts bytes -- so what happens when a byte-derived >Path is passed to os.fspath()? Is a TypeError raised? Do we >guess and auto-convert with fsdecode()? pathlib is str-only (which makes sense, since it's a cross-platform API and binary paths basically don't work on Windows): >>> pathlib.Path(b".") Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python3.4/pathlib.py", line 907, in __new__ self = cls._from_parts(args, init=False) File "/usr/lib64/python3.4/pathlib.py", line 589, in _from_parts drv, root, parts = self._parse_args(args) File "/usr/lib64/python3.4/pathlib.py", line 581, in _parse_args % type(a)) TypeError: argument should be a path or str object, not The only specific mention of binary support in the pathlib docs is to state that "bytes(p)" uses os.fsencode() to convert to the binary representation. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pathlib enhancments - method name only
Brett Cannon wrote: Depends if you use `/` or `\` as your path separator Or whether your pathnames look entirely different, e.g VMS: device:[topdir.subdir.subsubdir]filename.ext;version Pathnames are very much OS-dependent in both syntax *and* semantics. Even the main two in use today (unix and windows) can't be mapped directly onto each other, because windows has drive letters and unix doesn't. -- Greg ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib (was: Defining a path protocol)
Nick Coghlan wrote: We want to be able to readily use the protocol helper in builtin modules like os and low level Python modules like os.path, which means we want it to be much lower down in the import hierarchy than pathlib. Also, it's more general than that. It works on any object that wants to behave as a path, not just pathlib ones, so it should be in a neutral place. -- Greg ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pathlib enhancments - method name only
Eric Snow wrote: All this matters because it impacts the value returned from __ospath__(). Should it return the string representation of the path for the current OS or some standardized representation? What standardized representation? I'm not aware of such a thing. I'd expect the former. However, if that is the expectation then something like pathlib.PureWindowsPath will give you the wrong thing if your current OS is linux. No, you should get the representation corresponding to the kind of path object you started with. If you're working with Windows path objects on a Unix system, they must be representing something on some Windows system somewhere, not the one you're running the code on. The only reason to ask for a string representation of such a path is for use by that other system. I don't think it even makes sense to ask for a Unix representation of a Windows path or vice versa, because the semantics are different. How do you translate a Windows drive letter into Unix? What drive letter do you use for an absolute Unix path? -- Greg ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pathlib enhancments - method name only
On 10 April 2016 at 15:58, Greg Ewing wrote: > Brett Cannon wrote: > >> Depends if you use `/` or `\` as your path separator > > > Or whether your pathnames look entirely different, e.g VMS: > > device:[topdir.subdir.subsubdir]filename.ext;version > > Pathnames are very much OS-dependent in both syntax *and* semantics. > > Even the main two in use today (unix and windows) can't be > mapped directly onto each other, because windows has drive > letters and unix doesn't. This does raise a concrete API design question: how should PurePath.__fspath__ behave when called on a mismatched OS? For PurePath vs Path, the latter raises NotImplementedError if you try to create a concrete path that doesn't match the running system: >>> pathlib.PureWindowsPath(".") PureWindowsPath('.') >>> pathlib.WindowsPath(".") Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python3.4/pathlib.py", line 910, in __new__ % (cls.__name__,)) NotImplementedError: cannot instantiate 'WindowsPath' on your system The question we need to address is what happens if you do: >>> os.fspath(pathlib.PureWindowsPath(".")) on a *nix system? Similar to my proposal for dealing with DirEntry.path being a bytes-like object, I'd like to suggest raising TypeError in __fspath__ if the request is nonsensical for the currently running system - *nix systems can *manipulate* Windows paths (and vice-versa), but actually trying to *use* them with the local filesystem isn't going to work properly, since the syntax and semantics are different. >>> os.fspath(pathlib.WindowsPath(".")) Traceback (most recent call last): ... TypeError: cannot render 'PureWindowsPath' as filesystem path on 'posix' system (I'm also suggesting replacing "your" with the value of os.name) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pathlib enhancments - method name only
On 9 April 2016 at 23:02, R. David Murray wrote: That is, a 'filename' is the identifier we've assigned to this thing pointed to by an inode in linux, but an os path is a text representation of the path from the root filename to a specified filename. That is, the path *is* the name, so to say "path name" sounds redundant and confusing to me. The term "pathname" is what is conventionally used to refer to a textual string passed to the OS to identify an object in the file system. It's often abbreviated to just "path", but that's ambiguous for our purposes, because "path" can also refer to one of our higher-level objects. -- Greg ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com