[Python-Dev] Fwd: Accepting PEP 440: Version Identification and Dependency Specification

2014-08-26 Thread Nick Coghlan
Antoine pointed out that it would still be a good idea to forward
packaging PEP acceptance announcements to python-dev, even when the
actual acceptance happens on distutils-sig.

That makes sense to me, so here's last week's notice of the acceptance
of PEP 440, the implementation independent versioning standard derived
from pkg_resources, PEP 386, and ideas from both Linux distributions
and other open source language communities.

Regards,
Nick.

-- Forwarded message --
From: Nick Coghlan ncogh...@gmail.com
Date: 22 August 2014 22:34
Subject: Accepting PEP 440: Version Identification and Dependency Specification
To: DistUtils mailing list distutils-...@python.org


I just pushed Donald's final round of edits in response to the
feedback on the last PEP 440 thread, and as such I'm happy to announce
that I am accepting PEP 440 as the recommended approach to identifying
versions and specifying dependencies when distributing Python
software.

The PEP is available in the usual place at
http://www.python.org/dev/peps/pep-0440/

It's been a long road to get to an implementation independent
versioning standard that has a feasible migration path from the
current pkg_resources defined de facto standard, and I'd like to thank
a few folks:

* Donald Stufft for his extensive work on PEP 440 itself, especially
the proof of concept integration into pip
* Vinay Sajip for his efforts in validating earlier versions of the PEP
* Tarek Ziadé for starting us down the road to an implementation
independent versioning standard with the initial creation of PEP 386
back in June 2009, more than five years ago!

Regards,
Nick.

--
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia


-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-26 Thread Martin v. Löwis
Am 24.08.14 03:11, schrieb Greg Ewing:
 Isaac Morland wrote:
 In HTML 5 it allows non-ASCII-compatible encodings as long as U+FEFF
 (byte order mark) is used:

 http://www.w3.org/TR/html-markup/syntax.html#encoding-declaration

 Not sure about XML.
 
 According to Appendix F here:
 
 http://www.w3.org/TR/xml/#sec-guessing
 
 an XML parser needs to be prepared to try all the encodings it
 supports until it finds one that works well enough to decode
 the XML declaration, then it can find out the exact encoding
 used.

That's not what this section says. Instead, it says that
you need to auto-detect UCS-4, UTF-16, UTF-8 from the BOM,
or guess them or EBCDIC from the encoding of '?'. This should
be enough to actually parse the encoding declaration. Other
non-ASCII-compatible encodings can only be used if declared
in an upper-level protocol (such as HTTP).

The parser is not expected to try out all encodings it supports.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path related questions for Guido

2014-08-26 Thread MRAB

On 2014-08-26 03:11, Stephen J. Turnbull wrote:

Nick Coghlan writes:

   purge_surrogate_escapes was the other term that occurred to me.

purge suggests removal, not replacement.  That may be useful too.

neutralize_surrogate_escapes(s, remove=False, replacement='\uFFFD')


How about:

replace_surrogate_escapes(s, replacement='\uFFFD')

If you want them removed, just pass an empty string as the replacement.


maybe?  (Of course the remove argument is feature creep, so I'm only
about +0.5 myself.  And the name is long, but I can't think of any
better synonyms for make safe in English right now).

   Either way, my use case is to filter them out when I *don't* want to
   pass them along to other software, but would prefer the Unicode
   replacement character to the ASCII question mark created by using the
   replace filter when encoding.

I think it would be preferable to be unicodely correct here by
default, since this is a str - str function.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-26 Thread R. David Murray
On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan ncogh...@gmail.com wrote:
 As some examples of where bilingual computing breaks down:
 
 * My NFS client and server may have different locale settings
 * My FTP client and server may have different locale settings
 * My SSH client and server may have different locale settings
 * I save a file locally and send it to someone with a different locale setting
 * I attempt to access a Windows share from a Linux client (or vice-versa)
 * I clone my POSIX hosted git or Mercurial repository on a Windows client
 * I have to connect my Linux client to a Windows Active Directory
 domain (or vice-versa)
 * I have to interoperate between native code and JVM code
 
 The entire computing industry is currently struggling with this
 monolingual (ASCII/Extended ASCII/EBCDIC/etc) - bilingual (locale
 encoding/code pages) - multilingual (Unicode) transition. It's been
 going on for decades, and it's still going to be quite some time
 before we're done.
 
 The POSIX world is slowly clawing its way towards a multilingual model
 that actually works: UTF-8
 Windows (including the CLR) and the JVM adopted a different
 multilingual model, but still one that actually works: UTF-16-LE

This kind of puts the length of the python2-python3 transition
period in perspective, doesn't it?

--David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Windows Unicode console support [Was: Bytes path support]

2014-08-26 Thread Paul Moore
On 24 August 2014 04:27, Nick Coghlan ncogh...@gmail.com wrote:
 One of those areas is the fact that we still use the old 8-bit APIs to
 interact with the Windows console. Those are just as broken in a
 multilingual world as the other Windows 8-bit APIs, so Drekin came up
 with a project to expose the Windows console as a UTF-16-LE stream
 that uses the 16-bit APIs instead:
 https://pypi.python.org/pypi/win_unicode_console

 I personally hope we'll be able to get the issues Drekin references
 there resolved for Python 3.5 - if other folks hope for the same
 thing, then one of the best ways to help that happen is to try out the
 win_unicode_console module and provide feedback on what does and
 doesn't work.

This looks very cool, and I plan on giving it a try. But I don't see
any issues mentioned there (unless you mean the fact that it's not
possible to hook into Python's interactive interpreter directly, but I
don't see how that could be fixed in an external module). There's no
open issues on the project's github tracker.

I'd love to see this go into 3.5, so any more specific suggestions as
to what would be needed to move it forwards would be great.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-26 Thread Terry Reedy

On 8/26/2014 9:11 AM, R. David Murray wrote:

On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan ncogh...@gmail.com wrote:

As some examples of where bilingual computing breaks down:

* My NFS client and server may have different locale settings
* My FTP client and server may have different locale settings
* My SSH client and server may have different locale settings
* I save a file locally and send it to someone with a different locale setting
* I attempt to access a Windows share from a Linux client (or vice-versa)
* I clone my POSIX hosted git or Mercurial repository on a Windows client
* I have to connect my Linux client to a Windows Active Directory
domain (or vice-versa)
* I have to interoperate between native code and JVM code

The entire computing industry is currently struggling with this
monolingual (ASCII/Extended ASCII/EBCDIC/etc) - bilingual (locale
encoding/code pages) - multilingual (Unicode) transition. It's been
going on for decades, and it's still going to be quite some time
before we're done.

The POSIX world is slowly clawing its way towards a multilingual model
that actually works: UTF-8
Windows (including the CLR) and the JVM adopted a different
multilingual model, but still one that actually works: UTF-16-LE


Nick, I think the first half of your post is one of the clearest 
expositions yet of 'why Python 3' (in particular, the str to unicode 
change).  It is worthy of wider distribution and without much change, it 
would be a great blog post.



This kind of puts the length of the python2-python3 transition
period in perspective, doesn't it?


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-26 Thread Nick Coghlan
On 27 Aug 2014 02:52, Terry Reedy tjre...@udel.edu wrote:

 On 8/26/2014 9:11 AM, R. David Murray wrote:

 On Sun, 24 Aug 2014 13:27:55 +1000, Nick Coghlan ncogh...@gmail.com
wrote:

 As some examples of where bilingual computing breaks down:

 * My NFS client and server may have different locale settings
 * My FTP client and server may have different locale settings
 * My SSH client and server may have different locale settings
 * I save a file locally and send it to someone with a different locale
setting
 * I attempt to access a Windows share from a Linux client (or
vice-versa)
 * I clone my POSIX hosted git or Mercurial repository on a Windows
client
 * I have to connect my Linux client to a Windows Active Directory
 domain (or vice-versa)
 * I have to interoperate between native code and JVM code

 The entire computing industry is currently struggling with this
 monolingual (ASCII/Extended ASCII/EBCDIC/etc) - bilingual (locale
 encoding/code pages) - multilingual (Unicode) transition. It's been
 going on for decades, and it's still going to be quite some time
 before we're done.

 The POSIX world is slowly clawing its way towards a multilingual model
 that actually works: UTF-8
 Windows (including the CLR) and the JVM adopted a different
 multilingual model, but still one that actually works: UTF-16-LE


 Nick, I think the first half of your post is one of the clearest
expositions yet of 'why Python 3' (in particular, the str to unicode
change).  It is worthy of wider distribution and without much change, it
would be a great blog post.

Indeed, I had the same idea - I had been assuming users already understood
this context, which is almost certainly an invalid assumption.

The blog post version is already mostly written, but I ran out of weekend.
Will hopefully finish it up and post it some time in the next few days :)

 This kind of puts the length of the python2-python3 transition
 period in perspective, doesn't it?

I realised in writing the post that ASCII is over 50 years old at this
point, while Unicode as an official standard is more than 20. By the time
this is done, we'll likely be talking 30+ years for Unicode to displace the
confusing mess that is code pages and locale encodings :)

Cheers,
Nick.



 --
 Terry Jan Reedy


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-26 Thread Nikolaus Rath
Nick Coghlan ncogh...@gmail.com writes:
 As some examples of where bilingual computing breaks down:

 * My NFS client and server may have different locale settings
 * My FTP client and server may have different locale settings
 * My SSH client and server may have different locale settings
 * I save a file locally and send it to someone with a different locale
 setting
 * I attempt to access a Windows share from a Linux client (or
 vice-versa)
 * I clone my POSIX hosted git or Mercurial repository on a Windows
 client
 * I have to connect my Linux client to a Windows Active Directory
 domain (or vice-versa)
 * I have to interoperate between native code and JVM code

 The entire computing industry is currently struggling with this
 monolingual (ASCII/Extended ASCII/EBCDIC/etc) - bilingual (locale
 encoding/code pages) - multilingual (Unicode) transition. It's been
 going on for decades, and it's still going to be quite some time
 before we're done.

 The POSIX world is slowly clawing its way towards a multilingual model
 that actually works: UTF-8
 Windows (including the CLR) and the JVM adopted a different
 multilingual model, but still one that actually works: UTF-16-LE


 Nick, I think the first half of your post is one of the clearest
 expositions yet of 'why Python 3' (in particular, the str to unicode
 change).  It is worthy of wider distribution and without much change, it
 would be a great blog post.

 Indeed, I had the same idea - I had been assuming users already understood
 this context, which is almost certainly an invalid assumption.

 The blog post version is already mostly written, but I ran out of weekend.
 Will hopefully finish it up and post it some time in the next few days
 :)

In that case, maybe it'd be nice to also explain why you use the term
bilingual for codepage based encoding. At least to me, a
codepage/locale is pretty monolingual, or alternatively covering a whole
region (e.g. western europe). I figure with bilingual you mean ascii +
something, but that's mostly a guess from my side.


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-26 Thread Stephen J. Turnbull
Nikolaus Rath writes:

  In that case, maybe it'd be nice to also explain why you use the
  term bilingual for codepage based encoding.

Modern computing systems are written in languages which are invariably
based on syntax expressed using ASCII, and provide by default
functionality for expressing dates etc suitable for rendering American
English.  Thus ASCII (ie, American English) is always an available
language.  Code pages provide facilities for rendering one or more
languages languages sharing a common coded character set, but are
unsuitable for rendering most of the rest of the world's dozens of
language groups (grouping languages by common character set).

Multilingual has come to mean able to express (almost) any set of
languages in a single text (see, for example, Emacs's HELLO file),
not just more than two.  So code pages are closer in spirit to
bilingual (two of many) than to multilingual (all of many).

It's messy, analogical terminology.  But then, natural language is
messy and analogical.wink/


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com