Re: [Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?

2014-08-21 Thread Terry Reedy

On 8/20/2014 8:27 PM, Joseph Martinot-Lagarde wrote:


The pain was even bigger because in addition to the change in underlying
types, the names of the types were not compatible between the python
versions. I often try to write compatible code between python2 and 3,
and I can't use "str" because it has not the same meaning in both
versions, I can not use "unicode" because it disappeared in python3,


And bridge library should have the equivalent of
if 'py3': unicode = str


I can't use "byte" because it doesn't exist in python2.


2.7 (and 2.6?) already has
if 'py2': bytes = str
and I presume bridge libraries targeted before that was added include it 
also.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Oleg Broytman
Hi!

On Thu, Aug 21, 2014 at 02:52:19PM +1000, Cameron Simpson  
wrote:
> Oh, and I reject Nick's characterisation of POSIX as "broken". It's
> perfectly internally consistent. It just doesn't match what he
> wants. (Indeed, what I want, and I'm a long time UNIX fanboy.)
> 
> Cheers,
> Cameron Simpson 

   +1 from another Unix fanboy. Like an old wine, Unix becomes better
with years! ;-)

Oleg.
-- 
 Oleg Broytmanhttp://phdru.name/[email protected]
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 21 August 2014 12:16, Stephen J. Turnbull  wrote:
> Nick Coghlan writes:
>
>  > One idea I had along those lines is a surrogatereplace error handler (
>  > http://bugs.python.org/issue22016) that emitted an ASCII question mark for
>  > each smuggled byte, rather than propagating the encoding problem.
>
> Please, don't.
>
> "Smuggled bytes" are not independent events.  They tend to be
> correlated *within* file names, and this handler would generate names
> whose human semantics get lost (and there *are* human semantics,
> otherwise the name would be str(some_counter)).  They tend to be
> correlated across file names, and this handler will generate multiple
> files with the same munged name (and again, the differentiating human
> semantics get lost).
>
> If you don't know the semantics of the intended file names, you can't
> generate good replacement names.  This has to be an application-level
> function, and often requires user intervention to get good names.
>
> If you want to provide helper functions that applications can use to
> clean names explicitly, that might be OK.

Yeah, I was thinking in the context of reproducing sys.stdout's
behaviour in Python 2, but that reproduces the bytes faithfully, so
'surrogateescape' is already offers exactly the behaviour we want
(sys.stdout will have surrogateescape enabled by default in 3.5).

I'll keep pondering the question of possible helper functions in the
"string" module.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 4000 to explicitly declare we won't be doing a Py3k style compatibility break again?

2014-08-21 Thread Martin v. Löwis
Am 18.08.14 08:45, schrieb Nick Coghlan:
> It's certainly the one that has caused the most churn in CPython and
> the standard library - the ripples still haven't entirely settled on
> that front :)

For people porting their libraries and applications, the challenge is
often even bigger: they need to learn a new programming concept. For
many developers, it is a novel idea that character strings are not
just bytes. A similar split is in the number types (integers vs.
floats), but most developers have learned the distinction when they
learned programming. That a text file is not a file that contains text
(but bytes interpreted as text) is surprising. In addition, you also
have to learn a lot of facts (what is the ASCII encoding, what is
the iso-8859-1 encoding, what is UTF-8 (and how does it differ from
Unicode)).

When you have all that understood, you *then* run into the design
choices to be made for your software.

> 
> I think Guido's right that there's also a "death of a thousand cuts"
> aspect for large existing code bases, though, especially those that
> are lacking comprehensive test suites.

I think the second big challenge is "my dependencies are not ported
to Python 3". There is little you can do about it, short of porting
the dependencies yourself (fortunately, Python and most of its libraries
are free software).

Regards,
Martin


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Martin v. Löwis
Am 19.08.14 19:43, schrieb Ben Hoyt:
 The official policy is that we want them [support for bytes paths in 
 stdlib functions] to go away, but reality so far has not budged. We will 
 continue to hold our breath though. :-)
>>>
>>> Does that mean that new APIs should explicitly not support bytes? I'm
>>> thinking of os.scandir() (PEP 471), which I'm implementing at the
>>> moment. I was originally going to make it support bytes so it was
>>> compatible with listdir, but maybe that's a bad idea. Bytes paths are
>>> essentially broken on Windows.
>>
>> Bytes paths are "essential" on Unix, though, so I don't think we should
>> create new low-level APIs that don't support bytes.
> 
> Fair enough. I don't quite understand, though -- why is the "official
> policy" to kill something that's "essential" on *nix?

I think the people defending the "Unix file names are just bytes" side
often miss an important detail: displaying file names to the user, and
allowing the user to enter file names.

A script that just needs to traverse a directory tree and look at files
by certain criteria can easily do so with not worrying about a text
interpretation of the file names.

When it comes to user interaction, it becomes apparent that, even on
Unix, file names are not just bytes. If you do "ls -l" in your shell,
the "system" (not just the kernel - but ultimately the terminal program,
which might be the console driver, or an X11 application) will interpret
the file name as having an encoding, and render them with a font.

So for Python, the question is: which of the use cases (processing
all files, vs. showing them to the user) should be better supported?
Python 3 took the latter as an answer, under the assumption that this
is the more common case.

Regards,
Martin


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 21 August 2014 14:52, Cameron Simpson  wrote:
>
> Oh, and I reject Nick's characterisation of POSIX as "broken". It's
> perfectly internally consistent. It just doesn't match what he wants.
> (Indeed, what I want, and I'm a long time UNIX fanboy.)

The part that is broken is the idea that locale encodings are a viable
solution to conveying the appropriate encoding to use to talk to the
operating system. We've tried trusting them with Python 3, and they're
reliably wrong in certain situations. systemd is apparently better
than upstart at setting them correctly (e.g. for cron jobs), but even
it can't defend against an erroneous (or deliberate!) "LANG=C", or ssh
environment forwarding pushing a client's locale to the server. It's
worth looking through some of Armin Ronacher's complaints about Python
3 being broken on Linux, and seeing how many of them boil down to
"trusting the locale is wrong, Python 3 should just assume UTF-8 on
every POSIX system, the same way it does on Mac OS X". (I suspect
ShiftJIS, ISO-2022, et al users might object to that approach, but
it's at least a more viable choice now than it was back in 2008)

I still think we made the right call at least *trying* the idea of
trusting the locale encoding (since that's the officially supported
way of getting this information from the OS), and in many, many
situations it works fine. But I suspect we may eventually need to
resolve the technical issues currently preventing us from deciding to
ignore the environmental locale during interpreter startup and try
something different (such as always assuming UTF-8, or trying to force
C.UTF-8 if we detect the C locale, or looking for the systemd config
files and using those to set the OS encoding, rather than the
environmental locale).

Regards,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Antoine Pitrou


Le 21/08/2014 00:52, Cameron Simpson a écrit :


The "bytes in some arbitrary encoding where at least the slash character
(and
maybe a couple others) is ascii compatible" notion is completely bogus.
There's only one special byte, the slash (code 47). There's no OS-level
need that it or anything else be ASCII compatible.


Of course there is. Try to split an UTF-16-encoded file path on the byte 
47 and you'll get a lot of garbage. So, yes, POSIX implicitly mandates 
an ASCII-compatible encoding for file paths.


Regards

Antoine.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Marko Rauhamaa
"Martin v. Löwis" :

> I think the people defending the "Unix file names are just bytes" side
> often miss an important detail: displaying file names to the user, and
> allowing the user to enter file names.

The user interface is a real issue and needs to be addressed. It is
separate from the OS interface, though.

> A script that just needs to traverse a directory tree and look at
> files by certain criteria can easily do so with not worrying about a
> text interpretation of the file names.

A single system often has file names that have been encoded with
different schemes. Only today, I have had to deal with the JIS character
table (http://i.msdn.microsoft.com/cc305152.932%28en-us,MSDN.10%29.gif>) -- you
will notice that it doesn't have a backslash character. A coworker uses
ISO-8859-1.

I use UTF-8. UTF-8, of course, will refuse to deal with some byte
sequences.

My point is that the poor programmer cannot ignore the possibility of
"funny" character sets. If Python tried to protect the programmer from
that possibility, the result might be even more intractable: how to act
on a file with an non-UTF-8 filename if you are unable to express it as
a text string?


Marko
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 21 August 2014 23:58, Marko Rauhamaa  wrote:
>
> My point is that the poor programmer cannot ignore the possibility of
> "funny" character sets. If Python tried to protect the programmer from
> that possibility, the result might be even more intractable: how to act
> on a file with an non-UTF-8 filename if you are unable to express it as
> a text string?

That's what the "surrogateescape" codec is for - we use it by default
on most OS interfaces, and it's implicit in the use of "os.fsencode"
and "os.fsdecode". Starting with Python 3, it's also enabled on
sys.stdout by default, so that "print(os.listdir(dirname))" will pass
the original raw bytes through to the terminal the same way Python 2
does.

The docs could use additional details as to which interfaces do and
don't have surrogateescape enabled by default, but for the time being,
the description of the codec error handler just links out to the
original definition in PEP 383.

It may also be useful to have some tools for detecting and cleaning
strings containing surrogate escaped data, but there hasn't been a
concrete proposal along those lines as yet. Personally, I'm currently
waiting to see if the Fedora or OpenStack folks indicate a need for
such tools before proposing any additions.

Regards,
Nick.

>
>
> Marko
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com



-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 22 August 2014 00:12, Nick Coghlan  wrote:
> On 21 August 2014 23:58, Marko Rauhamaa  wrote:
>>
>> My point is that the poor programmer cannot ignore the possibility of
>> "funny" character sets. If Python tried to protect the programmer from
>> that possibility, the result might be even more intractable: how to act
>> on a file with an non-UTF-8 filename if you are unable to express it as
>> a text string?
>
> That's what the "surrogateescape" codec is for

Oops, that should say "codec error handled" (I got it right later in the post).

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)

2014-08-21 Thread Armin Rigo
Hi,

On 18 August 2014 22:30, Oleg Broytman  wrote:
>Aha, I see now -- the signing certificate is CAcert, which I've
> installed manually.

I don't suppose anyone is particularly annoyed by this fact?  I know
for sure two classes of people that will never click "Ignore".  The
first one is people that, for lack of a less negative term, I'll call
"security freaks".  The second is "serious business people" to which
the shiny new look of python.org appeals; they are likely to heed the
warning "Legitimate banks, stores, etc. will never ask you to do this"
and would regard an official hint to ignore it as highly
unprofessional.

(The bug tracker of PyPy used to have the same problem.  We fixed the
situation recently, but previously, we used to argue that we didn't
have a lot of connections with either class of people...)


A bientôt,

Armin.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)

2014-08-21 Thread Nick Coghlan
On 22 August 2014 00:41, Armin Rigo  wrote:
> Hi,
>
> On 18 August 2014 22:30, Oleg Broytman  wrote:
>>Aha, I see now -- the signing certificate is CAcert, which I've
>> installed manually.
>
> I don't suppose anyone is particularly annoyed by this fact?  I know
> for sure two classes of people that will never click "Ignore".  The
> first one is people that, for lack of a less negative term, I'll call
> "security freaks".  The second is "serious business people" to which
> the shiny new look of python.org appeals; they are likely to heed the
> warning "Legitimate banks, stores, etc. will never ask you to do this"
> and would regard an official hint to ignore it as highly
> unprofessional.

I've now raised this issue with the infrastructure team. The current
hosting arrangements for bugs.python.org were put in place when the
PSF didn't have any on-call system administrators of its own, but now
that we do, it may be time to migrate that service to a location where
we can switch to a more appropriate SSL certificate.

Anyone interested in following the discussion further may wish to join
[email protected]

Regards,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)

2014-08-21 Thread Martin v. Löwis
Am 21.08.14 17:44, schrieb Nick Coghlan:
> I've now raised this issue with the infrastructure team. The current
> hosting arrangements for bugs.python.org were put in place when the
> PSF didn't have any on-call system administrators of its own, but now
> that we do, it may be time to migrate that service to a location where
> we can switch to a more appropriate SSL certificate.

Just to relay Noah's response: it's actually not the hosting that
prevents installation of a proper certificate, it's the limitation
that the certificate we could deploy would include "python.org" as
a server name, which is considered risky regardless of where the
service is hosted. There are solutions to that as well, of course.

Regards,
Martin


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)

2014-08-21 Thread Ryan Hiebert

> On Aug 21, 2014, at 11:29 AM, Martin v. Löwis  wrote:
> 
> Am 21.08.14 17:44, schrieb Nick Coghlan:
>> I've now raised this issue with the infrastructure team. The current
>> hosting arrangements for bugs.python.org were put in place when the
>> PSF didn't have any on-call system administrators of its own, but now
>> that we do, it may be time to migrate that service to a location where
>> we can switch to a more appropriate SSL certificate.
> 
> Just to relay Noah's response: it's actually not the hosting that
> prevents installation of a proper certificate, it's the limitation
> that the certificate we could deploy would include "python.org" as
> a server name, which is considered risky regardless of where the
> service is hosted. There are solutions to that as well, of course.

That sounds like a limitation I’ve seen with StartSSL. Perhaps there’s a 
certificate authority that would be willing to sponsor a certificate for Python 
without this annoying limitation?
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Stephen J. Turnbull
Marko Rauhamaa writes:

 > My point is that the poor programmer cannot ignore the possibility of
 > "funny" character sets.

*Poor* programmers do it all the time.  That's why Python codecs raise
when they encounter bytes they can't handle.

 > If Python tried to protect the programmer from that possibility,

I don't understand your point.  The existing interfaces aren't going
anywhere, and they're enough to do anything you need to do.  Although
there are a few radicals (like me in a past life :-) who might like to
see them go away in favor of opt-in to binary encoding via
surrogateescape error handling, nobody in their right mind supports
us.

The question here is not about going backward, it's about whether to
add new bytes APIs, and which ones.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)

2014-08-21 Thread Benjamin Peterson
On Thu, Aug 21, 2014, at 09:48, Ryan Hiebert wrote:
> 
> > On Aug 21, 2014, at 11:29 AM, Martin v. Löwis  wrote:
> > 
> > Am 21.08.14 17:44, schrieb Nick Coghlan:
> >> I've now raised this issue with the infrastructure team. The current
> >> hosting arrangements for bugs.python.org were put in place when the
> >> PSF didn't have any on-call system administrators of its own, but now
> >> that we do, it may be time to migrate that service to a location where
> >> we can switch to a more appropriate SSL certificate.
> > 
> > Just to relay Noah's response: it's actually not the hosting that
> > prevents installation of a proper certificate, it's the limitation
> > that the certificate we could deploy would include "python.org" as
> > a server name, which is considered risky regardless of where the
> > service is hosted. There are solutions to that as well, of course.
> 
> That sounds like a limitation I’ve seen with StartSSL. Perhaps there’s a
> certificate authority that would be willing to sponsor a certificate for
> Python without this annoying limitation?

Perhaps some board members could comment, but I hope the PSF could just
pay a few hundred a year for a proper certificate.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)

2014-08-21 Thread Terry Reedy

On 8/21/2014 10:41 AM, Armin Rigo wrote:

Hi,

On 18 August 2014 22:30, Oleg Broytman  wrote:

Aha, I see now -- the signing certificate is CAcert, which I've
installed manually.


I don't suppose anyone is particularly annoyed by this fact?


I noticed the issue, and started this thread, because someone posted an 
https::/bugs.python.org link. I ordinarily just go to bugs.python.org 
and get the http connection.  I have https-anywhere installed, but it 
must notice the dodgy certificate and silently not switch. So I never 
knew before tht there was an https connection available, and never 
thought to try it.


Given that we are shipping both login credentials and files over the 
connection, making https routine, with a proper certificate, might be a 
good idea.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Cameron Simpson

On 21Aug2014 09:20, Antoine Pitrou  wrote:

Le 21/08/2014 00:52, Cameron Simpson a écrit :

The "bytes in some arbitrary encoding where at least the slash character
(and
maybe a couple others) is ascii compatible" notion is completely bogus.
There's only one special byte, the slash (code 47). There's no OS-level
need that it or anything else be ASCII compatible.


Of course there is. Try to split an UTF-16-encoded file path on the 
byte 47 and you'll get a lot of garbage. So, yes, POSIX implicitly 
mandates an ASCII-compatible encoding for file paths.


[Rolls eyes.] Looking at the UTF-16 encoding, it looks like it also embeds NUL 
bytes for various codes below 32768. How are they handled? As remarked, codes 0 
(NUL) and 47 (ASCII slash code) _are_ special to UNIX filename bytes strings.


If you imagine you can embed bare UTF-16 freely even excluding code 47, I think 
one of us is missing something.


That's not "ASCII compatible". That's "not all byte codes can be freely used 
without thought", and any multibyte coding will have to consider such things 
when embedding itself in another coding scheme.


Cheers,
Cameron Simpson 

Microsoft:  Committed to putting the "backward" into "backward compatibility."
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Chris Barker
On Wed, Aug 20, 2014 at 9:52 PM, Cameron Simpson  wrote:

> On 20Aug2014 16:04, Chris Barker - NOAA Federal 
> wrote:
>
>>

>  So really, people treat them as
>>>
>> "bytes-in-some-arbitrary-encoding-where-at-least the-slash-character-(and
>> maybe a couple others)-is-ascii-compatible"
>>
>
> As someone who fought long and hard in the surrogate-escape listdir()
> wars, and was won over once the scheme was thoroughly explained to me, I
> take issue with these assertions: they are bogus or misleading.
>
> Firstly, POSIX filenames _are_ just byte strings. The only forbidden
> character is the NUL byte, which terminates a C string, and the only
> special character is the slash, which separates pathanme components.
>

so they are "just byte strings", oh, except that you can't have a  null,
and the "slash" had better be code 47 (and vice versa). How is that
different than "bytes-in-some-arbitrary-encoding-where-at-least
the-slash-character-is-ascii-compatible"?

(sorry about the "maybe a couple others", I was too lazy to do my research
and be sure).

But my point is that python users want to be able to work with paths, and
paths on posix are not strictly strings with a clearly defined encoding,
but they are also not quite "just arbitrary bytes". So it would be nice if
we could have a pathlib that would work with these odd beasts. I've lost
track a bit as to whether the surrogate-escape solution allows this to all
work now. If it does, then great, sorry for the noise.

Second, a bare low level program cannot do _much_ more than pass them
> around.  It certainly can do things like compute their basename, or other
> path related operations.
>

only if you assume that pesky slash == 47 thing -- it's not much, but it's
not raw bytes either.

The "bytes in some arbitrary encoding where at least the slash character
> (and
> maybe a couple others) is ascii compatible" notion is completely bogus.
> There's only one special byte, the slash (code 47). There's no OS-level
> need that it or anything else be ASCII compatible. I think
> characterizations such as the one quoted are activately misleading.
>

code 47 == "slash" is ascii compatible -- where else did the 47 value come
from?


> I think we'd all agree it is nice to have a system where filenames are all
> Unicode, but since POSIX/UNIX predates it by decades it is a bit late to
> ignore the reality for such systems.


well, the community could have gone to "if you want anything other than
ascii, make it utf-8 -- but always, we're all a bunch of independent
thinkers.

But none of this is relevant -- systems in the wild do what they do --
clearly we all want Python to work with them as best it can.


> There's no _external_ "filesystem encoding" in the sense of something
> recorded in the filesystem that anyone can inspect. But there is the
> expressed locale settings, available at runtime to any program that cares
> to pay attention. It is a workable situation.
>

I haven't run into it, but it seem the folks that have don't think relying
on the locale setting is the least bit workable. If it were, we woldn't be
havin this discussion -- use the locale setting to decide how to decode
filenames -- done.

Oh, and I reject Nick's characterisation of POSIX as "broken". It's
> perfectly internally consistent. It just doesn't match what he wants.
> (Indeed, what I want, and I'm a long time UNIX fanboy.)
>

bug or feature? you decide. Internal consistency is a good start, but it
punts the whole encoding issue to the client software, without giving it
the tools to do it right. I call that "really hard to work with" if not
broken.

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[email protected]
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Paul Moore
On 21 August 2014 23:27, Cameron Simpson  wrote:
> That's not "ASCII compatible". That's "not all byte codes can be freely used
> without thought", and any multibyte coding will have to consider such things
> when embedding itself in another coding scheme.

I wonder how badly a Unix system would break if you specified UTF16 as
the system encoding...?
Paul
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Antoine Pitrou


Le 21/08/2014 18:27, Cameron Simpson a écrit :

As
remarked, codes 0 (NUL) and 47 (ASCII slash code) _are_ special to UNIX
filename bytes strings.


So you admit that POSIX mandates that file paths are expressed in an 
ASCII-compatible encoding after all? Good. I've nothing to add to your rant.


Antoine.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Isaac Morland

On Thu, 21 Aug 2014, Chris Barker wrote:


so they are "just byte strings", oh, except that you can't have a  null, and
the "slash" had better be code 47 (and vice versa). How is that different
than "bytes-in-some-arbitrary-encoding-where-at-least
the-slash-character-is-ascii-compatible"?


Actually, slash doesn't need to be code 47.  But no matter what code 47 
means outside of the context of a filename, it is the path arc separator 
byte (not character).


In fact, this isn't even entirely academic.  On a Mac OS X machine, go 
into Finder and try to create a directory called ":".  You'll get an error 
saying 'The name “:” can’t be used.'.  Now create a directory called "/". 
No problem, raising the question of what is going on at the filesystem 
level?


Answer:

$ ls -al
total 0
drwxr-xr-x   3 ijmorlan  staff   102 21 Aug 18:57 ./
drwxr-xr-x+ 80 ijmorlan  staff  2720 21 Aug 18:57 ../
drwxr-xr-x   2 ijmorlan  staff68 21 Aug 18:57 :/

And of course in shell one would remove the directory with this:

rm -rf :

not:

rm -rf /

So in effect the file system path arc encoding on Mac OS X is UTF-8 
*except* that : is outlawed and / is encoded as \x3A rather than the usual 
\x2F.  Of course, the path arc separator byte (not character) remains \x2F 
as always.


Just for fun, there are contexts in which one can give a full path at the 
GUI level, where : is used as the path separator.  This is for historical 
reasons and presumably is the reason for the above-noted behaviour.


I think the real tension here is between the POSIX level where filenames 
are byte strings (except for \x00, which is reserved for string 
termination) where \x2F has special interpretation, and absolutely every 
application ever written, in every language, which wants filenames to be 
character strings.


Isaac Morland   CSCF Web Guru
DC 2554C, x36650WWW Software Specialist___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)

2014-08-21 Thread Nick Coghlan
On 22 Aug 2014 04:45, "Benjamin Peterson"  wrote:
>
> Perhaps some board members could comment, but I hope the PSF could just
> pay a few hundred a year for a proper certificate.

That's exactly what we're doing - MAL reminded me we reached the same
conclusion last time this came up, we'll just track it better this time to
make sure it doesn't slip through the cracks again.

(And yes, switching to forced HTTPS once this is addressed would also be a
good idea - we'll add it to the list)

Regards,
Nick.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Nick Coghlan
On 22 Aug 2014 09:24, "Isaac Morland"  wrote:
> I think the real tension here is between the POSIX level where filenames
are byte strings (except for \x00, which is reserved for string
termination) where \x2F has special interpretation, and absolutely every
application ever written, in every language, which wants filenames to be
character strings.

That's one of the best summaries of the situation I've ever seen :)

Most languages (including Python 2) throw up their hands and say this is
the developer's problem to deal with. Python 3 says it's *our* problem to
deal with on behalf of our developers. The "surrogateescape" error handler
allows recalcitrant bytes to be dealt with relatively gracefully in most
situations. We don't quite cover *everything* yet (hence the complaints
from some of the folks that are experts at dealing with Python 2 Unicode
handling on POSIX systems), but the remaining problems are a lot more
tractable than the "teach every native English speaker everywhere how to
handle Unicode properly" problem.

Regards,
Nick.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Glenn Linderman

On 8/21/2014 3:42 PM, Paul Moore wrote:

I wonder how badly a Unix system would break if you specified UTF16 as
the system encoding...?
Paul


Does Unix even support UTF-16 as an encoding? I suppose, these days, it 
probably does, for reading contents of files created on Windows, etc. 
(Unicode was just gaining traction when I last used Unix in a 
significant manner; yes, my web host runs Linux, and I know enough to do 
what can be done there... but haven't experimented with encodings other 
than ASCII & UTF-8 on the web host, and don't intend to).


If it allows configuration of UTF-16 or UTF-32 as system encodings, I 
would consider that a bug, though, as too much of Unix predates Unicode, 
and would be likely to fail.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Glenn Linderman

On 8/21/2014 3:54 PM, Antoine Pitrou wrote:


Le 21/08/2014 18:27, Cameron Simpson a écrit :

As
remarked, codes 0 (NUL) and 47 (ASCII slash code) _are_ special to UNIX
filename bytes strings.


So you admit that POSIX mandates that file paths are expressed in an 
ASCII-compatible encoding after all? Good. I've nothing to add to your 
rant.


Antoine.


0 and 47 are certainly originally derived from ASCII.  However, there 
could be lots of encodings that are not ASCII compatible (but in 
practice, probably very few, since most encodings _are_ ASCII 
compatible) that could be fit those constraints.


So while as a technical matter, Cameron is correct that Unix only treats 
0 & 47 as special, and that is insufficient to declare that encodings 
must be ASCII compatible, as a practical matter, since most encodings 
are ASCII compatible anyway, it would be hard to find very many that 
could be used successfully with Unix file names that are not ASCII 
compatible, that could comply with the 0 & 47 requirements.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Oleg Broytman
On Thu, Aug 21, 2014 at 05:00:02PM -0700, Glenn Linderman 
 wrote:
> On 8/21/2014 3:42 PM, Paul Moore wrote:
> >I wonder how badly a Unix system would break if you specified UTF16 as
> >the system encoding...?
> 
> Does Unix even support UTF-16 as an encoding?

   As an encoding of file's content? Certainly yes. As a locale
encoding? Definitely no.

Oleg.
-- 
 Oleg Broytmanhttp://phdru.name/[email protected]
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Chris Barker - NOAA Federal
> Does Unix even support UTF-16 as an encoding? I suppose, these days, it 
> probably does, for reading contents of files created on Windows, etc.

I don't think Unix supports any encodings at all for the _contents_ of
files -- that's up to applications. Of course the command line text
processing tools need to know -- I'm guessing those are never going to
work w/UTF-16!

"System encoding" is a nice idea, but pretty much worthless. Only
helpful for files created and processed on the same system -- not rare
for that not to be the case.

This brings up the other key problem. If file names are (almost)
arbitrary bytes, how do you write one to/read one from a text file
with a particular encoding? ( or for that matter display it on a
terminal)

And people still want to say posix isn't broken in this regard?

Sigh.

-Chris
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] https:bugs.python.org -- Untrusted Connection (Firefox)

2014-08-21 Thread Terry Reedy

On 8/21/2014 7:25 PM, Nick Coghlan wrote:


On 22 Aug 2014 04:45, "Benjamin Peterson" mailto:[email protected]>> wrote:
 >
 > Perhaps some board members could comment, but I hope the PSF could just
 > pay a few hundred a year for a proper certificate.

That's exactly what we're doing - MAL reminded me we reached the same
conclusion last time this came up, we'll just track it better this time
to make sure it doesn't slip through the cracks again.

(And yes, switching to forced HTTPS once this is addressed would also be
a good idea - we'll add it to the list)


I just switched from a 'low variety' short password of the sort almost 
crackable with brute force (today, though not several years ago) to a 
higher variety longer password. People with admin privileges on the 
tracker might be reminded to recheck.  What was adequate 10 years ago is 
not so now.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Oleg Broytman
On Thu, Aug 21, 2014 at 05:30:14PM -0700, Chris Barker - NOAA Federal 
 wrote:
> This brings up the other key problem. If file names are (almost)
> arbitrary bytes, how do you write one to/read one from a text file
> with a particular encoding? ( or for that matter display it on a
> terminal)

   There is no such thing as an encoding of text files. So we just
write those bytes to the file or output them to the terminal. I often do
that. My filesystems are full of files with names and content in
at least 3 different encodings - koi8-r, utf-8 and cp1251. So I open a
terminal with koi8 or utf-8 locale and fonts and some file always look
weird. But however weird they are it's possible to work with them.

   The bigger problem is line feeds. A filename with linefeeds can be
put to a text file, but cannot be read back. So one has to transform
such names. Usually s/\\//g and s/\n/\\n/g is enough. (-:

> And people still want to say posix isn't broken in this regard?

   Not at all! And broken or not broken it's what I (for many different
reasons) prefer to use for my desktops, servers, notebooks, routers and
smartphones, so if Python would stand on my way I'd rather switch to a
different tools.

Oleg.
-- 
 Oleg Broytmanhttp://phdru.name/[email protected]
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Stephen J. Turnbull
Chris Barker - NOAA Federal writes:

 > This brings up the other key problem. If file names are (almost)
 > arbitrary bytes, how do you write one to/read one from a text file
 > with a particular encoding? ( or for that matter display it on a
 > terminal)

"Very carefully."

But this is strictly from need.  *Nobody* (with the exception of the
crackers who like to name their programs things like "\u0007") *wants*
to do this.  Real people want to name their files in some human
language they understand, and spell it in the usual way, and encode
those characters as bytes in the usual way.

Decoding those characters in the usual way and getting nonsense is the
exceptional case, and it must be the application's or user's problem
to decide what to do.  They know where they got the file from and
usually have some idea of what its name should look like.  Python
doesn't, so Python cannot solve it for them.

For that reason, I believe that Python's "normal"/high-level approach
to file handling should treat file names as (human-oriented) text.  Of
course Python should be able to handle bytes straight from the disk,
but most programmers shouldn't have to.

 > And people still want to say posix isn't broken in this regard?

Deal with it, bro'.





___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-21 Thread Marko Rauhamaa
Nick Coghlan :

> Python 3 says it's *our* problem to deal with on behalf of our
> developers.

http://www.imdb.com/title/tt0120623/quotes?item=qt0353406>

Flik: I was just trying to help.

Mr. Soil: Then help us; *don't* help us.


Marko
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com