subject:"Re\: Everything you did not want to know about Unicode in Python 3"

Re: Everything you did not want to know about Unicode in Python 3

2014-05-17 Thread Robert Kern


On 2014-05-17 13:07, Steven D'Aprano wrote:

On Sat, 17 May 2014 09:57:06 +0100, Robert Kern wrote:


On 2014-05-17 02:07, Steven D'Aprano wrote:

On Fri, 16 May 2014 14:46:23 +, Grant Edwards wrote:


At least in the US, there doesn't seem to be such a thing as "placing
a work into the public domain".  The copyright holder can transfer
ownershipt to soembody else, but there is no "public domain" to which
ownership can be trasferred.


That's factually incorrect. In the US, sufficiently old works, or works
of a certain age that were not explicitly registered for copyright, are
in the public domain. Under a wide range of circumstances, works
created by the federal government go immediately into the public
domain.


There is such a thing as the public domain in the US, and there are
works in it, but there isn't really such a thing as "placing a work"
there voluntarily, as Grant says. A work either is or isn't in the
public domain. The author has no choice in the matter.


That's incorrect.

http://cr.yp.to/publicdomain.html


Thanks for the link. While it has not really changed my opinion (as discussed at 
length in my other reply), I did not know that the 9th Circuit had formalized 
the "overt act" test in their civil procedure rules, so there is at least one 
jurisdiction in the US that does currently work like this. None of the others 
do, to my knowledge, and this is the product of judicial common law, not 
statutory law, so it's still pretty shaky.


--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-17 Thread Robert Kern


On 2014-05-17 15:15, Steven D'Aprano wrote:

On Sat, 17 May 2014 10:29:00 +0100, Robert Kern wrote:


One can state many things, but that doesn't mean they have legal effect.
The US Code has provisions for how works become copyrighted
automatically, how they leave copyright automatically at the end of
specific time periods, how some works automatically enter the public
domain on their creation (i.e. works of the US federal government), but
has nothing at all for how a private creator can voluntarily place their
work into the public domain when it would otherwise not be. It used to,
but does not any more.


The case for abandonment was stated as "well settled" in 1998 (Micro-Star
v. Formgen Inc). Unless there has been a major legal change in the years
since then, I don't think it is true that authors cannot abandon
copyright.


Good old Micro-Star v. Formgen Inc. A perennial favorite. No, that case did not 
settle this question. There is a statement in the opinion that would suggest 
this, but (and this seems to be a reoccurring theme) it's inclusion in the 
opinion did not create precedent to that effect. The statement that you refer to 
is, as far as my NAL eyes can tell, what the lawyers call "dictum": a statement 
made by a judicial opinion but is unnecessary to decide the case and therefore 
not precedential. FormGen explicitly registered the copyright to the works in 
question, and the case was decided on whether or not the 
Micro-Star-redistributed works counted as derivative works (yes). Now, if the 
case were about an author that affirmatively dedicated his work to the public 
domain and then sued someone who redistributed it, then such a statement would 
have a precedential effect (because then the judge would decide in favor of the 
defendant on the basis of that statement). The quote that you refer to 
references a previous case, which follows similar lines, and also predates the 
"automatic copyright" regime post-Berne Convention, so it's not even clear to me 
that it should have been precedential to Micro-Star.


Even if this case did so decide (which, I will grant it more or less did later 
by codifying such a rule in their jury instructions for such cases), it would 
only have effect in the 9th Circuit of the US and not even in the rest of the 
US, much less worldwide. Why bother when the CC0 gives you the desired effect 
with more assurance to your audience?



For a private individual to say about a work they just created that
"this work is in the Public Domain" is, under US law, merely an
erroneous statement of fact, not a speech act that effects a change in
the legal status of the work. For another example of this distinction,
saying "I am married" when I have not applied for, received, and
solemnified a valid marriage license is just an erroneous statement of
fact and does not make me legally married.


There may be something to what you say, although I think we're now
arguing fine semantic details.


Sure, it's the law. Fine semantic details are important. However, the difference 
between speech acts and statements of fact is a pretty gross semantic 
distinction and not just splitting semantic hairs. The act of making some 
statements (e.g. declaring that a work you own the copyright to is available 
under a given license) actually makes a change in the legal status of something. 
Most statements don't. Which ones do and don't are defined by statute and (in 
common law countries like the US) court decisions. Deciding which is which is 
often hairy, but that's an epistemological problem, not a semantic one. :-)



See:

https://en.wikipedia.org/wiki/Wikipedia:Granting_work_into_the_public_domain

To play Devil's Advocate in favour of your assertion, it may be that
abandoning copyright does not literally put the work in the public
domain, but merely makes it "quack like the public domain". That is to
say, the author still, in some abstract but legally meaningless sense,
has copyright in the work *but* has given unlimited usage rights. (I
don't actually think that is the case, at least not in the US.)

It's this tiny bit of residual uncertainty that leads some authorities to
say that it is "hard" to release a work into the public domain,
particularly in a world-wide context, and that merely stating "this is in
the public domain" is not sufficient to remove all legal doubt over the
status, and that a more overt and explicit release *may* be required.
Hence the CC0 licence which you refer to. The human readable summary says
in part:

  The person who associated a work with this deed has dedicated
  the work to the public domain by waiving all of his or her
  rights to the work worldwide under copyright law, including
  all related and neighboring rights, to the extent allowed by
  law.

  You can copy, modify, distribute and perform the work, even
  for commercial purposes, all without asking permission.

http://creativecommons.org/publicdomain/zero/1.0/

while

Re: Everything you did not want to know about Unicode in Python 3

2014-05-17 Thread Steven D'Aprano

On Sat, 17 May 2014 10:29:00 +0100, Robert Kern wrote:

> One can state many things, but that doesn't mean they have legal effect.
> The US Code has provisions for how works become copyrighted
> automatically, how they leave copyright automatically at the end of
> specific time periods, how some works automatically enter the public
> domain on their creation (i.e. works of the US federal government), but
> has nothing at all for how a private creator can voluntarily place their
> work into the public domain when it would otherwise not be. It used to,
> but does not any more.

The case for abandonment was stated as "well settled" in 1998 (Micro-Star 
v. Formgen Inc). Unless there has been a major legal change in the years 
since then, I don't think it is true that authors cannot abandon 
copyright.

> For a private individual to say about a work they just created that
> "this work is in the Public Domain" is, under US law, merely an
> erroneous statement of fact, not a speech act that effects a change in
> the legal status of the work. For another example of this distinction,
> saying "I am married" when I have not applied for, received, and
> solemnified a valid marriage license is just an erroneous statement of
> fact and does not make me legally married.

There may be something to what you say, although I think we're now 
arguing fine semantic details. See:

https://en.wikipedia.org/wiki/Wikipedia:Granting_work_into_the_public_domain

To play Devil's Advocate in favour of your assertion, it may be that 
abandoning copyright does not literally put the work in the public 
domain, but merely makes it "quack like the public domain". That is to 
say, the author still, in some abstract but legally meaningless sense, 
has copyright in the work *but* has given unlimited usage rights. (I 
don't actually think that is the case, at least not in the US.)

It's this tiny bit of residual uncertainty that leads some authorities to 
say that it is "hard" to release a work into the public domain, 
particularly in a world-wide context, and that merely stating "this is in 
the public domain" is not sufficient to remove all legal doubt over the 
status, and that a more overt and explicit release *may* be required. 
Hence the CC0 licence which you refer to. The human readable summary says 
in part:

 The person who associated a work with this deed has dedicated
 the work to the public domain by waiving all of his or her
 rights to the work worldwide under copyright law, including
 all related and neighboring rights, to the extent allowed by
 law.

 You can copy, modify, distribute and perform the work, even
 for commercial purposes, all without asking permission.

http://creativecommons.org/publicdomain/zero/1.0/

while the actual legal licence comes in at almost 800 words. This is 
basically the same as "I release this to the public domain" only longer.

(The CC0 licence is longer than you might expect, because it is assumed 
that it may have to apply in countries where you *really cannot* 
relinquish copyright. But we're specifically talking about the US, where 
the 9th Circuit says you can.)

> Relinquishing your rights can have some effect, but not all rights can
> be relinquished, 

Outside of the US, so-called "moral rights" or "reputation rights" cannot 
generally be relinquished, except perhaps in work-for-hire and perhaps 
not even then. (E.g. if you're a ghost writer.) The situation in the US 
is a bit murky -- there are no official moral rights per se, and 
copyright only controls usage rights such as copying, distribution and so 
forth. But this doesn't mean that you can (for example) claim authorship 
of a public domain work unless you actually wrote it.

In any case, we're discussing copyright, not other rights.

> and this is not the same as putting your work into the
> public domain. 

One might "not be the same" while still being "effectively the same". For 
example, the U.S. Copyright Office states that "one may not grant their 
work into the public domain. However, a copyright owner may release all 
of their rights to their work by stating the work may be freely 
reproduced, distributed, etc." as if it were in in the public domain.

But note that the Copyright Office does not make the final decision 
whether you can relinquish copyright or not. That's up to the courts.

> Among other things, your heirs can sometimes reclaim
> those rights in some circumstances if you are not careful (and if they
> are valuable enough to bother reclaiming).

That's a good point. A simplistic "I release this to the public domain" 
statement *may* (I emphasise the uncertainty) leave some doubt that it is 
*sufficiently overt* to prevent your heirs from disagreeing and coming 
after your users for infringement. Then the courts have to get involved, 
and it's all ugliness and only the lawyers win.

Hence the advice to be as explicit and overt as possible.

> If you wish to do something like thi

Re: Everything you did not want to know about Unicode in Python 3

2014-05-17 Thread Steven D'Aprano

On Sat, 17 May 2014 09:57:06 +0100, Robert Kern wrote:

> On 2014-05-17 02:07, Steven D'Aprano wrote:
>> On Fri, 16 May 2014 14:46:23 +, Grant Edwards wrote:
>>
>>> At least in the US, there doesn't seem to be such a thing as "placing
>>> a work into the public domain".  The copyright holder can transfer
>>> ownershipt to soembody else, but there is no "public domain" to which
>>> ownership can be trasferred.
>>
>> That's factually incorrect. In the US, sufficiently old works, or works
>> of a certain age that were not explicitly registered for copyright, are
>> in the public domain. Under a wide range of circumstances, works
>> created by the federal government go immediately into the public
>> domain.
> 
> There is such a thing as the public domain in the US, and there are
> works in it, but there isn't really such a thing as "placing a work"
> there voluntarily, as Grant says. A work either is or isn't in the
> public domain. The author has no choice in the matter.

That's incorrect.

http://cr.yp.to/publicdomain.html

Here's the money quote, from the 9th Circuit Court:

It is well settled that rights gained under the Copyright Act 
may be abandoned. But abandonment of a right must be manifested
by some overt act indicating an intention to abandon that right.


There's also this:

http://creativecommons.org/publicdomain/zero/1.0/

which counts as an overt act.


By the way, there's more info on US copyright terms here:

http://copyright.cornell.edu/resources/publicdomain.cfm

although it doesn't specifically mention voluntarily abandonment of 
copyright.



-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-17 Thread Ben Finney

Chris Angelico  writes:

> On Sat, May 17, 2014 at 6:57 PM, Robert Kern  wrote:
> > There is such a thing as the public domain in the US, and there are works in
> > it, but there isn't really such a thing as "placing a work" there
> > voluntarily, as Grant says. A work either is or isn't in the public domain.
> > The author has no choice in the matter.
>
> Then what's copyright status on PEPs?

My guess: They are in the default copyright status, with all rights
reserved (i.e. everything that copyright law restricts, is forbidden to
the recipient).

But, if any of those copyright holders were ever to assert their
copyright had been infringed by some recipient, the “this work is in the
public domain” or equivalent would be taken as a clear indication of the
*intent* of the copyright holder.

Ultimately, what matters is the determination of whatever judge you find
yourself facing. To that end, clarifying in the copyright statement and
license terms exactly what is permitted can be immensely helpful in
foreshortening and, ideally, avoiding a future copyright suit.

Copyright is a ridiculous burden on everyone — to the extent that even
those copyright holders who don't *want* those rights which the law
reserves to the copyright holder, and want to divest themselves of the
role of copyright holder, find it frustratingly difficult to do so
effectively across jurisdictions.

-- 
 \  “Computer perspective on Moore's Law: Human effort becomes |
  `\   twice as expensive roughly every two years.” —anonymous |
_o__)  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-17 Thread Robert Kern


On 2014-05-17 05:19, Marko Rauhamaa wrote:

Steven D'Aprano :


On Fri, 16 May 2014 14:46:23 +, Grant Edwards wrote:


At least in the US, there doesn't seem to be such a thing as "placing
a work into the public domain". The copyright holder can transfer
ownershipt to soembody else, but there is no "public domain" to which
ownership can be trasferred.


That's factually incorrect. In the US, sufficiently old works, or works
of a certain age that were not explicitly registered for copyright, are
in the public domain. Under a wide range of circumstances, works created
by the federal government go immediately into the public domain.


Steven, you're not disputing Grant. I am. The sole copyright holder can
simply state: "this work is in the Public Domain," or: "all rights
relinquished," or some such. Ultimately, everything is decided by the
courts, of course.


One can state many things, but that doesn't mean they have legal effect. The US 
Code has provisions for how works become copyrighted automatically, how they 
leave copyright automatically at the end of specific time periods, how some 
works automatically enter the public domain on their creation (i.e. works of the 
US federal government), but has nothing at all for how a private creator can 
voluntarily place their work into the public domain when it would otherwise not 
be. It used to, but does not any more.


For a private individual to say about a work they just created that "this work 
is in the Public Domain" is, under US law, merely an erroneous statement of 
fact, not a speech act that effects a change in the legal status of the work. 
For another example of this distinction, saying "I am married" when I have not 
applied for, received, and solemnified a valid marriage license is just an 
erroneous statement of fact and does not make me legally married.


Relinquishing your rights can have some effect, but not all rights can be 
relinquished, and this is not the same as putting your work into the public 
domain. Among other things, your heirs can sometimes reclaim those rights in 
some circumstances if you are not careful (and if they are valuable enough to 
bother reclaiming).


If you wish to do something like this, I highly recommend (though IANAL and 
TINLA) using the CC0 Waiver from Creative Commons. It has thorough legalese for 
relinquishing all the rights that one can relinquish for the maximum terms that 
one can do so in as many jurisdictions as possible and acts as a license to 
use/distribute/etc. without restriction even if some rights cannot be 
relinquished. Even if US law were to change to provide for dedicating works to 
the public domain, I would probably still use the CC0 anyways to account for the 
high variability in how different jurisdictions around the world treat their own 
public domains.


  http://creativecommons.org/about/cc0
  http://wiki.creativecommons.org/CC0_FAQ

Note how they distinguish the CC0 Waiver from their Public Domain Mark: the 
Public Domain Mark is just a label for things that are known to be free of 
copyright worldwide but does not make a work so. The CC0 *does* have an 
operative effect that is substantially similar to the work being in the public 
domain.


--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-17 Thread Chris Angelico

On Sat, May 17, 2014 at 6:57 PM, Robert Kern  wrote:
> There is such a thing as the public domain in the US, and there are works in
> it, but there isn't really such a thing as "placing a work" there
> voluntarily, as Grant says. A work either is or isn't in the public domain.
> The author has no choice in the matter.

Then what's copyright status on PEPs?

The nearest thing to "assigning to public domain" that works across
legislatures is probably CC0:

http://creativecommons.org/about/cc0

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-17 Thread Robert Kern


On 2014-05-17 02:07, Steven D'Aprano wrote:

On Fri, 16 May 2014 14:46:23 +, Grant Edwards wrote:


At least in the US, there doesn't seem to be such a thing as "placing a
work into the public domain".  The copyright holder can transfer
ownershipt to soembody else, but there is no "public domain" to which
ownership can be trasferred.


That's factually incorrect. In the US, sufficiently old works, or works
of a certain age that were not explicitly registered for copyright, are
in the public domain. Under a wide range of circumstances, works created
by the federal government go immediately into the public domain.


There is such a thing as the public domain in the US, and there are works in it, 
but there isn't really such a thing as "placing a work" there voluntarily, as 
Grant says. A work either is or isn't in the public domain. The author has no 
choice in the matter.


--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-17 Thread Mark Lawrence


On 17/05/2014 05:19, Marko Rauhamaa wrote:


The sole copyright holder can
simply state: "this work is in the Public Domain," or: "all rights
relinquished," or some such. Ultimately, everything is decided by the
courts, of course.



For examples see all the Python PEPs.

--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-16 Thread Marko Rauhamaa

Steven D'Aprano :

> On Fri, 16 May 2014 14:46:23 +, Grant Edwards wrote:
>
>> At least in the US, there doesn't seem to be such a thing as "placing
>> a work into the public domain". The copyright holder can transfer
>> ownershipt to soembody else, but there is no "public domain" to which
>> ownership can be trasferred.
>
> That's factually incorrect. In the US, sufficiently old works, or works 
> of a certain age that were not explicitly registered for copyright, are 
> in the public domain. Under a wide range of circumstances, works created 
> by the federal government go immediately into the public domain.

Steven, you're not disputing Grant. I am. The sole copyright holder can
simply state: "this work is in the Public Domain," or: "all rights
relinquished," or some such. Ultimately, everything is decided by the
courts, of course.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-16 Thread Steven D'Aprano

On Fri, 16 May 2014 14:46:23 +, Grant Edwards wrote:

> At least in the US, there doesn't seem to be such a thing as "placing a
> work into the public domain".  The copyright holder can transfer
> ownershipt to soembody else, but there is no "public domain" to which
> ownership can be trasferred.

That's factually incorrect. In the US, sufficiently old works, or works 
of a certain age that were not explicitly registered for copyright, are 
in the public domain. Under a wide range of circumstances, works created 
by the federal government go immediately into the public domain.

It is true that under the Mickey Mouse Copyright Grab Act[1] of , every time Mickey Mouse is about to reach the end of 
copyright, Congress retroactively extends copyright terms for another few 
decades, but that's another story.

[1] Not the real name of the act.

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-16 Thread Grant Edwards

On 2014-05-14, alister  wrote:
> On Wed, 14 May 2014 10:08:57 +1000, Chris Angelico wrote:
>
>> On Wed, May 14, 2014 at 9:53 AM, Steven D'Aprano
>>  wrote:
>>> With the current system, all of us here are technically violating
>>> copyright every time we reply to an email and quote more than a small
>>> percentage of it.
>> 
>> Oh wow... so when someone quotes heaps of text without trimming, and
>> adding blank lines, we can complain that it's a copyright violation -
>> reproducing our work with unauthorized modifications and without
>> permission...
>> 
>> I never thought of it like that.

> I think I could make a very strong case that anything sent to a public 
> forum with the intention of being broadcast has been placed into the 
> public domain by this action.

At least in the US, there doesn't seem to be such a thing as "placing
a work into the public domain".  The copyright holder can transfer
ownershipt to soembody else, but there is no "public domain" to which
ownership can be trasferred.  IIRC, there is a way under Germain
copyright law to release certain rights.  The mere act of widely
widely distributing something does not in any way relinquish
copyrights.

-- 
Grant Edwards   grant.b.edwardsYow! Am I elected yet?
  at   
  gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-16 Thread wxjmfauth

Le vendredi 16 mai 2014 13:50:47 UTC+2, Antoine Pitrou a écrit :
> Terry Reedy  udel.edu> writes:
> 
> > 
> 
> > On 5/13/2014 8:53 PM, Ethan Furman wrote:
> 
> > > On 05/13/2014 05:10 PM, Steven D'Aprano wrote:
> 
> > >> On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote:
> 
> > >>
> 
> > >>> Because Python 3 presents stdin and stdout as text streams however, it
> 
> > >>> makes them more difficult to use with binary data, which is why Armin
> 
> > >>> sets up all that extra code to make sure his file objects are binary.
> 
> > >>
> 
> > >> What surprises me is how hard that is. Surely there's a simpler way to
> 
> > >> open stdin and stdout in binary mode? If not, there ought to be.
> 
> > >
> 
> > > Somebody already posted this:
> 
> > >
> 
> > > https://docs.python.org/3/library/sys.html#sys.stdin
> 
> > >
> 
> > > which talks about .detach().
> 
> > 
> 
> > I sent a message to Armin about this.
> 
> 
> 
> And the documentation has now been fixed:
> 
> http://bugs.python.org/issue21364
> 
> 
> 
> So something *can* come out of a python-list rantfest, it seems.
> 
> 
> 
> Regards
> 
> 
> 
> Antoine.

==

http://www.unicode.org/

Avec mes meilleures salutations.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-16 Thread Antoine Pitrou

Terry Reedy  udel.edu> writes:
> 
> On 5/13/2014 8:53 PM, Ethan Furman wrote:
> > On 05/13/2014 05:10 PM, Steven D'Aprano wrote:
> >> On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote:
> >>
> >>> Because Python 3 presents stdin and stdout as text streams however, it
> >>> makes them more difficult to use with binary data, which is why Armin
> >>> sets up all that extra code to make sure his file objects are binary.
> >>
> >> What surprises me is how hard that is. Surely there's a simpler way to
> >> open stdin and stdout in binary mode? If not, there ought to be.
> >
> > Somebody already posted this:
> >
> > https://docs.python.org/3/library/sys.html#sys.stdin
> >
> > which talks about .detach().
> 
> I sent a message to Armin about this.

And the documentation has now been fixed:
http://bugs.python.org/issue21364

So something *can* come out of a python-list rantfest, it seems.

Regards

Antoine.


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-14 Thread Terry Reedy


On 5/13/2014 8:53 PM, Ethan Furman wrote:

On 05/13/2014 05:10 PM, Steven D'Aprano wrote:

On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote:


Because Python 3 presents stdin and stdout as text streams however, it
makes them more difficult to use with binary data, which is why Armin
sets up all that extra code to make sure his file objects are binary.


What surprises me is how hard that is. Surely there's a simpler way to
open stdin and stdout in binary mode? If not, there ought to be.


Somebody already posted this:

https://docs.python.org/3/library/sys.html#sys.stdin

which talks about .detach().


I sent a message to Armin about this.

--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-14 Thread Ian Kelly

On Wed, May 14, 2014 at 9:30 AM, Robin Becker  wrote:
> Doesn't this issue also come up wherever bytes are being read ie in sockets,
> pipe file handles etc? Some sources may have well defined encodings and so
> allow use of unicode strings but surely not all. I imagine all of the
> problems associated with a broken encoding promise for stdin can also occur
> with sockets & other sources ie error messages failing to be printable etc
> etc. Since bytes in Python 3 are not equivalent to the old str (Python 3
> bytes != Python 2 str) using bytes everywhere has its own problems.

Sockets send and receive bytes, and pipes created by the subprocess
module are opened in binary mode.  Pipes inherited as stdin are still
assumed to be unicode, though.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-14 Thread Robin Becker


On 13/05/2014 17:08, Ian Kelly wrote:
.


And since it's so simple, it shouldn't be hard to see that the use of
the shutil module has nothing to do with the Unicode woes here.  The
crux of the issue is that a general-purpose command like cat typically
can't know the encoding of its input and can't assume anything about
it. In fact, there may not even be an encoding; cat can be used with
binary data.  The only non-destructive approach then is to copy the
binary data straight from the source to the destination with no
decoding steps at all, and trust the user to ensure that the
destination will be able to accommodate the source encoding.  Because
Python 3 presents stdin and stdout as text streams however, it makes
them more difficult to use with binary data, which is why Armin sets
up all that extra code to make sure his file objects are binary.

Doesn't this issue also come up wherever bytes are being read ie in sockets, 
pipe file handles etc? Some sources may have well defined encodings and so allow 
use of unicode strings but surely not all. I imagine all of the problems 
associated with a broken encoding promise for stdin can also occur with sockets 
& other sources ie error messages failing to be printable etc etc. Since bytes 
in Python 3 are not equivalent to the old str (Python 3 bytes != Python 2 str) 
using bytes everywhere has its own problems.

--
Robin Becker

--
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-14 Thread Ian Kelly

On May 13, 2014 6:10 PM, "Chris Angelico"  wrote:
>
> On Wed, May 14, 2014 at 9:53 AM, Steven D'Aprano
>  wrote:
> > With the current system, all of us here are technically violating
> > copyright every time we reply to an email and quote more than a small
> > percentage of it.
>
> Oh wow... so when someone quotes heaps of text without trimming, and
> adding blank lines, we can complain that it's a copyright violation -
> reproducing our work with unauthorized modifications and without
> permission...
>
> I never thought of it like that.

I'd be surprised if this doesn't fall under fair use.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-14 Thread Dave Angel


On 05/13/2014 09:39 AM, Steven D'Aprano wrote:

On Tue, 13 May 2014 07:20:34 -0400, Roy Smith wrote:


ASCII *is* all I need.


You've never needed to copyright something? Copyright © Roy Smith 2014...
I know some people use (c) instead, but that actually has no legal
standing. (Not that any reasonable judge would invalidate a copyright
based on a technicality like that, not these days.)



(c) has no standing whatsoever, as it's properly spelled (copr)


--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-14 Thread Chris Angelico

On Wed, May 14, 2014 at 10:42 PM, alister
 wrote:
> On Wed, 14 May 2014 10:08:57 +1000, Chris Angelico wrote:
>
>> On Wed, May 14, 2014 at 9:53 AM, Steven D'Aprano
>>  wrote:
>>> With the current system, all of us here are technically violating
>>> copyright every time we reply to an email and quote more than a small
>>> percentage of it.
>>
>> Oh wow... so when someone quotes heaps of text without trimming, and
>> adding blank lines, we can complain that it's a copyright violation -
>> reproducing our work with unauthorized modifications and without
>> permission...
>>
>> I never thought of it like that.
>>
>> ChrisA
>
> I think I could make a very strong case that anything sent to a public
> forum with the intention of being broadcast has been placed into the
> public domain by this action.

I don't think so. One can reasonably assume that anything sent to a
public forum is permissible to read, and to copy verbatim (although
there may be "presumed limits" on the copying, but probably not with
python-list). But if I quote your text and edit it, then you would
rightly complain, which is not the case with public domain text. The
question is whether or not it's fair to try to scare people with that
when they repeatedly use buggy software that inserts blank lines
everywhere :)

In case it's not obvious, I am NOT seriously contemplating pursuing
anything like this legally. It's just funny to contemplate.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-14 Thread alister

On Wed, 14 May 2014 10:08:57 +1000, Chris Angelico wrote:

> On Wed, May 14, 2014 at 9:53 AM, Steven D'Aprano
>  wrote:
>> With the current system, all of us here are technically violating
>> copyright every time we reply to an email and quote more than a small
>> percentage of it.
> 
> Oh wow... so when someone quotes heaps of text without trimming, and
> adding blank lines, we can complain that it's a copyright violation -
> reproducing our work with unauthorized modifications and without
> permission...
> 
> I never thought of it like that.
> 
> ChrisA

I think I could make a very strong case that anything sent to a public 
forum with the intention of being broadcast has been placed into the 
public domain by this action.
  



-- 
Work expands to fill the time available.
-- Cyril Northcote Parkinson, "The Economist", 1955
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-14 Thread alister

On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote:

> On Tue, May 13, 2014 at 5:19 AM, alister
>  wrote:
>> I am only an amateur python coder which is why I asked if I am missing
>> something
>>
>> I could not see any reason to be using the shutil module if all that
>> the programm is doing is opening a file, reading it & then printing it.
>>
>> is it python that causes the issue, the shutil module or just the OS
>> not liking the data it is being sent?
>>
>> an explanation of why this approach is taken would be much appreciated.
> 
> No, that part is perfectly fine.  This is exactly what the shutil module
> is meant for: providing shell-like operations.  Although in this case
> the copyfileobj function is quite simple (have yourself a look at the
> source -- it just reads from one file and writes to the other in a
> loop), in general the Pythonic thing is to avoid reinventing the wheel.
> 
> And since it's so simple, it shouldn't be hard to see that the use of
> the shutil module has nothing to do with the Unicode woes here.  The
> crux of the issue is that a general-purpose command like cat typically
> can't know the encoding of its input and can't assume anything about it.
> In fact, there may not even be an encoding; cat can be used with binary
> data.  The only non-destructive approach then is to copy the binary data
> straight from the source to the destination with no decoding steps at
> all, and trust the user to ensure that the destination will be able to
> accommodate the source encoding.  Because Python 3 presents stdin and
> stdout as text streams however, it makes them more difficult to use with
> binary data, which is why Armin sets up all that extra code to make sure
> his file objects are binary.

I think I understand that 
in which case I owe Armin an apology, this certainly sounds like a 
failing in pythons handling of stdout



-- 
Get it up, keep it up... LINUX: Viagra for the PC.
   
   -- Chris Abbey
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-14 Thread wxjmfauth

Le mardi 13 mai 2014 10:08:45 UTC+2, Johannes Bauer a écrit :
> On 13.05.2014 03:18, Steven D'Aprano wrote:
> 
> 
> 
> > Armin Ronacher is an extremely experienced and knowledgeable Python 
> 
> > developer, and a Python core developer. He might be wrong, but he's not 
> 
> > *obviously* wrong.
> 
> 
> 
> He's correct about file name encodings. Which can be fixed really easily
> 
> wihtout messing everything up (sys.argv binary variant, open accepting
> 
> binary filenames). But that he suggests that Go would be superior:
> 
> 
> 
> > Which uses an even simpler model than Python 2: everything is a byte 
> > string. The assumed encoding is UTF-8. End of the story.
> 
> 
> 
> Is just a horrible idea. An obviously horrible idea, too.
> 
> 
> 
> Having dealt with the UTF-8 problems on Python2 I can safely say that I
> 
> never, never ever want to go back to that freaky hell. If I deal with
> 
> strings, I want to be able to sanely manipulate them and I want to be
> 
> sure that after manipulation they're still valid strings. Manipulating
> 
> the bytes representation of unicode data just doesn't work.
> 
> 
> 
> And I'm very very glad that some people felt the same way and
> 
> implemented a sane, consistent way of dealing with Unicode in Python3.
> 
> It's one of the reasons why I switched to Py3 very early and I love it.
> 
> 
> 
> Cheers,
> 
> Johannes
> 
> 
> 
> -- 
> 
> >> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> 
> > Zumindest nicht öffentlich!
> 
> Ah, der neueste und bis heute genialste Streich unsere großen
> 
> Kosmologen: Die Geheim-Vorhersage.
> 
>  - Karl Kaos über Rüdiger Thomas in dsa 

===

A Rob 'Commander' Pike will never put utf16 and
ebcdic in the same basket, when discussing coding
of characters.

jmf

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Ethan Furman


On 05/13/2014 05:10 PM, Steven D'Aprano wrote:

On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote:


Because Python 3 presents stdin and stdout as text streams however, it
makes them more difficult to use with binary data, which is why Armin
sets up all that extra code to make sure his file objects are binary.


What surprises me is how hard that is. Surely there's a simpler way to
open stdin and stdout in binary mode? If not, there ought to be.


Somebody already posted this:

https://docs.python.org/3/library/sys.html#sys.stdin

which talks about .detach().

--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Steven D'Aprano

On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote:

> Because Python 3 presents stdin and stdout as text streams however, it
> makes them more difficult to use with binary data, which is why Armin
> sets up all that extra code to make sure his file objects are binary.

What surprises me is how hard that is. Surely there's a simpler way to 
open stdin and stdout in binary mode? If not, there ought to be.




-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Chris Angelico

On Wed, May 14, 2014 at 9:53 AM, Steven D'Aprano
 wrote:
> With the current system, all of us here are technically violating
> copyright every time we reply to an email and quote more than a small
> percentage of it.

Oh wow... so when someone quotes heaps of text without trimming, and
adding blank lines, we can complain that it's a copyright violation -
reproducing our work with unauthorized modifications and without
permission...

I never thought of it like that.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Steven D'Aprano

On Tue, 13 May 2014 14:42:51 +, alister wrote:

> On Tue, 13 May 2014 13:51:20 +, Grant Edwards wrote:
> 
>> On 2014-05-13, Steven D'Aprano 
>> wrote:
>>> On Tue, 13 May 2014 07:20:34 -0400, Roy Smith wrote:
>>>
 ASCII *is* all I need.
>>>
>>> You've never needed to copyright something? Copyright © Roy Smith
>>> 2014...
>> 
>> Bah.  You don't need the little copyright symbol at all.  The statement
>> without the symbol has the exact same legal weight.
> 
> 
> You do not need any statements at all, copyright is automaticly assigned
> to anything you create (at least that is the case in UK Law) although
> proving the creation date my be difficult.

(1) In my lifetime, that wasn't always the case. Up until the 1970s or 
thereabouts, you had to explicitly register anything you wanted 
copyrighted, a much more sensible system which weeded out the meaningless 
copyrights on economically worthless content. If we still had that 
system, orphan works would be a lesser problem.

With the current system, all of us here are technically violating 
copyright every time we reply to an email and quote more than a small 
percentage of it. Not to mention all the mirror sites that violate 
copyright by mirroring our posts in their entirety without permission.

(Author's moral rights not to be misquoted or plagiarised are a different 
kettle of fish separate from their ownership rights over the work. That 
should be automatic.)

(2) You don't have to just prove copyright. You also have to *identify* 
who the work is copyrighted by, and it needs to be an identifiable legal 
person (actual person or corporation), not necessarily the author. In the 
absence of a statement otherwise, copyright is assumed to be held by the 
author, but that's not always the case -- it might be a work for hire, or 
copyright might have been transferred to another person or entity. Or the 
author is unidentifiable. Hence the orphan work problem: it's presumed to 
be copyrighted, but since nobody knows who owns the copyright, there's no 
way to get permission to copy that work. It might as well be lost, even 
when the original is sitting right there in front of you mouldering away.

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Ian Kelly

On Tue, May 13, 2014 at 5:19 AM, alister
 wrote:
> I am only an amateur python coder which is why I asked if I am missing
> something
>
> I could not see any reason to be using the shutil module if all that the
> programm is doing is opening a file, reading it & then printing it.
>
> is it python that causes the issue, the shutil module or just the OS not
> liking the data it is being sent?
>
> an explanation of why this approach is taken would be much appreciated.

No, that part is perfectly fine.  This is exactly what the shutil
module is meant for: providing shell-like operations.  Although in
this case the copyfileobj function is quite simple (have yourself a
look at the source -- it just reads from one file and writes to the
other in a loop), in general the Pythonic thing is to avoid
reinventing the wheel.

And since it's so simple, it shouldn't be hard to see that the use of
the shutil module has nothing to do with the Unicode woes here.  The
crux of the issue is that a general-purpose command like cat typically
can't know the encoding of its input and can't assume anything about
it. In fact, there may not even be an encoding; cat can be used with
binary data.  The only non-destructive approach then is to copy the
binary data straight from the source to the destination with no
decoding steps at all, and trust the user to ensure that the
destination will be able to accommodate the source encoding.  Because
Python 3 presents stdin and stdout as text streams however, it makes
them more difficult to use with binary data, which is why Armin sets
up all that extra code to make sure his file objects are binary.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Grant Edwards

On 2014-05-13, alister  wrote:
> On Tue, 13 May 2014 13:51:20 +, Grant Edwards wrote:
>
>> On 2014-05-13, Steven D'Aprano 
>> wrote:
>>> On Tue, 13 May 2014 07:20:34 -0400, Roy Smith wrote:
>>>
 ASCII *is* all I need.
>>>
>>> You've never needed to copyright something? Copyright © Roy Smith
>>> 2014...
>> 
>> Bah.  You don't need the little copyright symbol at all.  The statement
>> without the symbol has the exact same legal weight.
>
> You do not need any statements at all, copyright is automaticly assigned 
> to anything you create (at least that is the case in UK Law)
> although proving the creation date my be difficult.

Yep, it's the same in the US.

-- 
Grant Edwards   grant.b.edwardsYow! Hello.  Just walk
  at   along and try NOT to think
  gmail.comabout your INTESTINES being
   almost FORTY YARDS LONG!!
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread alister

On Tue, 13 May 2014 13:51:20 +, Grant Edwards wrote:

> On 2014-05-13, Steven D'Aprano 
> wrote:
>> On Tue, 13 May 2014 07:20:34 -0400, Roy Smith wrote:
>>
>>> ASCII *is* all I need.
>>
>> You've never needed to copyright something? Copyright © Roy Smith
>> 2014...
> 
> Bah.  You don't need the little copyright symbol at all.  The statement
> without the symbol has the exact same legal weight.


You do not need any statements at all, copyright is automaticly assigned 
to anything you create (at least that is the case in UK Law) although 
proving the creation date my be difficult.



-- 
Depends on how you define "always".  :-)
 -- Larry Wall in <199710211647.jaa17...@wall.org>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Chris Angelico

On Wed, May 14, 2014 at 12:30 AM, Rustom Mody  wrote:
> Come to think of it why have anything other than zeros and ones?

Obligatory: http://xkcd.com/257/

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Rustom Mody

On Tuesday, May 13, 2014 7:13:47 PM UTC+5:30, Chris Angelico wrote:
> On Tue, May 13, 2014 at 11:39 PM, Steven D'Aprano
> > Or price something in cents? I suppose the days of the 25¢ steak dinner
> > are long gone, but you might need to sell something for 99¢ a pound...
> 
> 
> $0.99/lb? :)

Dollars Zeros Slashes Question marks Smileys...
Just alphabets is enough I think...

Come to think of it why have anything other than zeros and ones?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Skip Montanaro

On Tue, May 13, 2014 at 3:38 AM, Chris Angelico  wrote:
>> Python 2's ambiguity allows me not to answer the tough philosophical
>> questions. I'm not saying it's necessarily a good thing, but it has its
>> benefits.
>
> It's not a good thing. It means that you have the convenience of
> pretending there's no problem, which means you don't notice trouble
> until something happens... and then, in all probability, your app is
> in production and you have no idea why stuff went wrong.

BITD, when I still maintained and developed Musi-Cal (an early online
concert calendar, long since gone), I faced a challenge when I first
started encountering non-ASCII band names and cities. I resisted UTF-8.
After all, if I printed a string containing an "é", it came out looking like

What kind of mess was that???

I tried to ignore it, or assume Latin-1 would cover all the bases (my first
non-ASCII inputs tended to come from Western Europe). If nothing else, at
least "é" was legible.

Needless to say, those approaches didn't work well. After perhaps six
months or a year, I broke down and started converting everything coming in
 or going out
to UTF-8 at the boundaries of my system (making educated guesses at
input
 encodings if necessary). My life got a whole lot easier after that. The
distinction between bytes and text didn't really matter much, certainly not
compared to the mess I had before where strings of unknown data leaked into
my system and its database.

Skip

P.S. My apologies for the mess this message probably is. Amazing as it may
seem, Gmail in Chrome does a crappy job editing anything other than plain
text. Also, I'm surprised in this day and age that common tools like Gnome
Terminal have little or no encoding support. I wound up having to pop up
urxvt to get an encodings-flexible terminal emulator...
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Grant Edwards

On 2014-05-13, Steven D'Aprano  wrote:
> On Tue, 13 May 2014 07:20:34 -0400, Roy Smith wrote:
>
>> ASCII *is* all I need.
>
> You've never needed to copyright something? Copyright © Roy Smith 2014...

Bah.  You don't need the little copyright symbol at all.  The
statement without the symbol has the exact same legal weight.

-- 
Grant Edwards   grant.b.edwardsYow! World War Three can
  at   be averted by adherence
  gmail.comto a strictly enforced
   dress code!
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Grant Edwards

On 2014-05-13, Chris Angelico  wrote:
> On Tue, May 13, 2014 at 4:03 PM, Ben Finney  wrote:
>> (It's always a good day to remind people that the rest of the world
>> exists.)
>
> Ironic that this should come up in a discussion on Unicode, given that
> Unicode's fundamental purpose is to welcome that whole rest of the
> world instead of yelling "LALALALALA America is everything" and
> pretending that ASCII, or Latin-1, or something, is all you need.

Well, strictly speaking, it ASCII or Latin-1 _is_ all I need.

I will however admit to the existence of other people who might need
something else...

-- 
Grant Edwards   grant.b.edwardsYow! How many retured
  at   bricklayers from FLORIDA
  gmail.comare out purchasing PENCIL
   SHARPENERS right NOW??
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Chris Angelico

On Tue, May 13, 2014 at 11:39 PM, Steven D'Aprano
 wrote:
> You've never needed to copyright something? Copyright © Roy Smith 2014...
> I know some people use (c) instead, but that actually has no legal
> standing. (Not that any reasonable judge would invalidate a copyright
> based on a technicality like that, not these days.)

Copyright Chris Angelico 2014. The full word "copyright" has legal
standing. I tend to stick with that in my README files; staying ASCII
makes it that bit safer for random text editors
(*cough*Notepad*cough*) that might otherwise misinterpret it (only a
bit, though [1]).

> Or price something in cents? I suppose the days of the 25¢ steak dinner
> are long gone, but you might need to sell something for 99¢ a pound...

$0.99/lb? :)

ChrisA

[1] https://en.wikipedia.org/wiki/Bush_hid_the_facts
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Steven D'Aprano

On Tue, 13 May 2014 07:20:34 -0400, Roy Smith wrote:

> ASCII *is* all I need.

You've never needed to copyright something? Copyright © Roy Smith 2014... 
I know some people use (c) instead, but that actually has no legal 
standing. (Not that any reasonable judge would invalidate a copyright 
based on a technicality like that, not these days.)

Or price something in cents? I suppose the days of the 25¢ steak dinner 
are long gone, but you might need to sell something for 99¢ a pound... 

> The problem is, it's not all that other people
> need, and I need to interact with those other people.

True, true.

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Chris Angelico

On Tue, May 13, 2014 at 11:30 PM, Mark Lawrence  wrote:
> On 13/05/2014 09:38, Chris Angelico wrote:
>>
>>
>> It's not a good thing. It means that you have the convenience of
>> pretending there's no problem, which means you don't notice trouble
>> until something happens... and then, in all probability, your app is
>> in production and you have no idea why stuff went wrong.
>>
>
> Unless you're (un)lucky enough to be working on IIRC the 1/3 of major IT
> projects that deliver nothing :)

Been there, done that. At least, most likely so... there is a chance,
albeit slim, that the boss/owner will either discover someone who'll
finish the project for him, or find the time to finish it himself. I
gather he's looking at ripping all my code out and replacing it with
PHP of his own design, which should be fun. On the plus side, that
does mean he can get any idiot straight out of a uni course to do the
work; much easier than finding someone who knows Python, Pike, bash,
and C++. The White King told Alice that cynicism is a disease that can
be cured... but it can also be inflicted, and a promising-looking
N-year project that collapses because the boss starts getting stupid
with code formatting rules and then ends up firing his last remaining
competent employee is a pretty effective means of instilling cynicism.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Mark Lawrence


On 13/05/2014 09:38, Chris Angelico wrote:


It's not a good thing. It means that you have the convenience of
pretending there's no problem, which means you don't notice trouble
until something happens... and then, in all probability, your app is
in production and you have no idea why stuff went wrong.



Unless you're (un)lucky enough to be working on IIRC the 1/3 of major IT 
projects that deliver nothing :)


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Roy Smith

In article ,
 Chris Angelico  wrote:

> On Tue, May 13, 2014 at 4:03 PM, Ben Finney  wrote:
> > (It's always a good day to remind people that the rest of the world
> > exists.)
> 
> Ironic that this should come up in a discussion on Unicode, given that
> Unicode's fundamental purpose is to welcome that whole rest of the
> world instead of yelling "LALALALALA America is everything" and
> pretending that ASCII, or Latin-1, or something, is all you need.

ASCII *is* all I need.  The problem is, it's not all that other people 
need, and I need to interact with those other people.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread alister

On Tue, 13 May 2014 01:18:35 +, Steven D'Aprano wrote:

> On Mon, 12 May 2014 17:47:48 +, alister wrote:
> 
>> On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote:
>> 
>>> This was *NOT* written by our resident unicode expert
>>> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/
>>> 
>>> Posted as I thought it would make a rather pleasant change from
>>> interminable threads about names vs values vs variables vs objects.
>> 
>> Surely those example programs are not the pythonoic way to do things or
>> am i missing something?
> 
> Armin Ronacher is an extremely experienced and knowledgeable Python
> developer, and a Python core developer. He might be wrong, but he's not
> *obviously* wrong.
> 
I am only an amateur python coder which is why I asked if I am missing 
something

I could not see any reason to be using the shutil module if all that the 
programm is doing is opening a file, reading it & then printing it.

is it python that causes the issue, the shutil module or just the OS not 
liking the data it is being sent?

an explanation of why this approach is taken would be much appreciated.



-- 
Revenge is a form of nostalgia.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Marko Rauhamaa

Johannes Bauer :

> The only people who are angered by this now is people who always
> treated encodings sloppily and it "just worked". Well, there's a good
> chance it has worked by pure chance so far. It's a good thing that
> Python does this now more strictly as it gives developers *guarantees*
> about what they can and cannot do with text datatypes without having
> to deal with encoding issues in many places. Just one place: The
> interface where text is read or written, just as it should be.

I'm not angered by text. I'm just wondering if it has any practical use
that is not misuse...

For example, Py3 should not make any pretense that there is a "default"
encoding for strings. Locale's are an abhorrent invention from the early
8-bit days. IOW, you should never input or output text without explicit
serialization.

I get the feeling that Py3 would like to present a world where strings
are first-class I/O objects that can exist in files, in filenames,
inside pipes. You say, "text is read or written." I'm saying text is
never read or written. It only exists as an abstraction (not even
unicode) inside the virtual machine.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Johannes Bauer

On 13.05.2014 10:25, Marko Rauhamaa wrote:

> Based on my background (network and system programming), I'm a bit
> suspicious of strings, that is, text. For example, is the stuff that
> goes to syslog bytes or text? Does an XML file contain bytes or
> (encoded) text? The answers are not obvious to me. Modern computing is
> full of ASCII-esque binary communication standards and formats.

Traditional Unix programs (syslog for example) are notorious for being
clear, ambiguous and/or ignorant of character encodings altogether. And
this works, unfortunately, for the most time because many encodings
share a common subset. If they wouldn't, the problems would be VERY
apparent and people would be forced to handle the issues not so sloppily.

Which is the route that Py3 chose. Don't be sloppy, make a great
distinction between "text" (which handles naturally as strings) and its
respective encoding.

The only people who are angered by this now is people who always treated
encodings sloppily and it "just worked". Well, there's a good chance it
has worked by pure chance so far. It's a good thing that Python does
this now more strictly as it gives developers *guarantees* about what
they can and cannot do with text datatypes without having to deal with
encoding issues in many places. Just one place: The interface where text
is read or written, just as it should be.

Regards,
Johannes

-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa 
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Steven D'Aprano

On Tue, 13 May 2014 12:06:50 +0300, Marko Rauhamaa wrote:

> Chris Angelico :
> 
>> These are problems that Unicode can't solve.
> 
> I actually think the problem has little to do with Unicode. Text is an
> abstract data type just like any class. If I have an object (say, a
> subprocess or a dictionary) in memory, I don't expect the object to have
> any existence independently of the Python virtual machine. I have the
> same feeling about Py3 strings: they only exist inside the Python
> virtual machine.

And you would be correct. When you write them to a device (say, push them 
over a network, or write them to a file) they need to be serialized. If 
you're lucky, you have an API that takes a string and serializes it for 
you, and then all you have to deal with is:

- am I happy with the default encoding?

- if not, what encoding do I want?

Otherwise you ought to have an API that requires bytes, not strings, and 
you have to perform your own serialization by encoding it.

But abstractions leak, and this abstraction leaks because *right now* 
there isn't a single serialization for text strings. There are HUNDREDS, 
and sometimes you don't know which one is being used.

[...]
> What I'm saying is that strings definitely have an important application
> in the human interface. However, I feel strings might be overused in the
> Py3 API. Case in point: are pathnames bytes objects or strings?

Yes. On POSIX systems, file names are sequences of bytes, with a very few 
restrictions. On recent Windows file systems (NTFS I believe?), file 
names are Unicode strings encoded to UTF-16, but with a whole lot of 
other restrictions imposed by the OS.

> The
> linux position is that they are bytes objects. Py3 supports both
> interpretations seemingly throughout:
> 
>open(b"/bin/ls")vsopen("/bin/ls") os.path.join(b"a", b"b")   
>vsos.path.join("a", "b")

Because it has to, otherwise there will be files that are unreachable on 
one platform or another.

-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Johannes Bauer

On 13.05.2014 10:38, Chris Angelico wrote:

>> Python 2's ambiguity allows me not to answer the tough philosophical
>> questions. I'm not saying it's necessarily a good thing, but it has its
>> benefits.
> 
> It's not a good thing. It means that you have the convenience of
> pretending there's no problem, which means you don't notice trouble
> until something happens... and then, in all probability, your app is
> in production and you have no idea why stuff went wrong.

Exactly. With Py2 "strings" you never know what encoding they are, if
they already have been converted or something like that. And it's very
well possible to mix already converted strings with other, not yet
encoded strings. What a mess!

All these issues are avoided by Py3. There is a very clear distinction
between strings and string representation (data bytes), which is
beautiful. Accidental mixing is not possible. And you have some thing
*guaranteed* for the string type which aren't guaranteed for the bytes
type (for example when doing string manipulation).

Regards,
Johannes

-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa 
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Chris Angelico

On Tue, May 13, 2014 at 7:06 PM, Marko Rauhamaa  wrote:
> Chris Angelico :
>
>> These are problems that Unicode can't solve.
>
> I actually think the problem has little to do with Unicode. Text is an
> abstract data type just like any class. If I have an object (say, a
> subprocess or a dictionary) in memory, I don't expect the object to have
> any existence independently of the Python virtual machine. I have the
> same feeling about Py3 strings: they only exist inside the Python
> virtual machine.

That's true; the only difference is that text is extremely prevalent.
You can share a dict with another program, or store it in a file, or
whatever, simply by agreeing on an encoding - for instance, JSON. As
long as you and the other program know that this file is JSON encoded,
you can write it and he can read it, and you'll get the right data at
the far end. It's no different; there are encodings that are easy to
handle and have limitations, and there are encodings that are
elaborate and have lots of features (XML comes to mind, although
technically you can't encode a dict in XML).

> Case in point: are pathnames bytes objects or strings? The
> linux position is that they are bytes objects. Py3 supports both
> interpretations seemingly throughout:
>
>open(b"/bin/ls")vsopen("/bin/ls")
>os.path.join(b"a", b"b")vsos.path.join("a", "b")

That's a problem that comes from the underlying file systems. If every
FS in the world worked with Unicode file names, it would be easy.
(Most would encode them onto the platters in UTF-8 or maybe UTF-16;
some might choose to use a PEP 393 or Pike string structure, with the
size_shift being a file mode just like the 'directory' bit; others
might use a limited encoding for legacy reasons, storing uppercased
CP437 on the disk, and returning an error if the desired name didn't
fit.) But since they don't, we have to cope with that. What happens if
you're running on Linux, and you have a mounted drive from an OS/2
share, and inside that, you access an aliased drive that represents a
Windows share, on which you've mounted a remote-backup share? A single
path name could have components parsed by each of those systems, so
what's its encoding? How do you handle that? There's no solution.
(Well, okay. There is a solution: don't do something so stupidly
convoluted. But there's no law against cackling admins making circular
mounts. In fact, I just mounted my own home directory as a
subdirectory under my home directory, via sshfs. I can now encrypt my
own file reads and writes exactly as many times as I choose to. I also
cackled.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Marko Rauhamaa

Chris Angelico :

> These are problems that Unicode can't solve.

I actually think the problem has little to do with Unicode. Text is an
abstract data type just like any class. If I have an object (say, a
subprocess or a dictionary) in memory, I don't expect the object to have
any existence independently of the Python virtual machine. I have the
same feeling about Py3 strings: they only exist inside the Python
virtual machine.

An abstract object like a subprocess or dictionary justifies its
existence through its behaviour (its quacking). Now, do strings quack or
are they silent? I guess if you are writing a word processor they might
quack to you. Otherwise, they are just an esoteric storage format.

What I'm saying is that strings definitely have an important application
in the human interface. However, I feel strings might be overused in the
Py3 API. Case in point: are pathnames bytes objects or strings? The
linux position is that they are bytes objects. Py3 supports both
interpretations seemingly throughout:

   open(b"/bin/ls")vsopen("/bin/ls")
   os.path.join(b"a", b"b")vsos.path.join("a", "b")


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Chris Angelico

On Tue, May 13, 2014 at 6:25 PM, Marko Rauhamaa  wrote:
> Johannes Bauer :
>
>> Having dealt with the UTF-8 problems on Python2 I can safely say that
>> I never, never ever want to go back to that freaky hell. If I deal
>> with strings, I want to be able to sanely manipulate them and I want
>> to be sure that after manipulation they're still valid strings.
>> Manipulating the bytes representation of unicode data just doesn't
>> work.
>
> Based on my background (network and system programming), I'm a bit
> suspicious of strings, that is, text. For example, is the stuff that
> goes to syslog bytes or text? Does an XML file contain bytes or
> (encoded) text? The answers are not obvious to me. Modern computing is
> full of ASCII-esque binary communication standards and formats.

These are problems that Unicode can't solve. In theory, XML should
contain text in a known encoding (defaulting to UTF-8). With syslog,
it's problematic - I don't remember what it's meant to be, but I know
there are issues. Same with other log files.

> Python 2's ambiguity allows me not to answer the tough philosophical
> questions. I'm not saying it's necessarily a good thing, but it has its
> benefits.

It's not a good thing. It means that you have the convenience of
pretending there's no problem, which means you don't notice trouble
until something happens... and then, in all probability, your app is
in production and you have no idea why stuff went wrong.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Marko Rauhamaa

Johannes Bauer :

> Having dealt with the UTF-8 problems on Python2 I can safely say that
> I never, never ever want to go back to that freaky hell. If I deal
> with strings, I want to be able to sanely manipulate them and I want
> to be sure that after manipulation they're still valid strings.
> Manipulating the bytes representation of unicode data just doesn't
> work.

Based on my background (network and system programming), I'm a bit
suspicious of strings, that is, text. For example, is the stuff that
goes to syslog bytes or text? Does an XML file contain bytes or
(encoded) text? The answers are not obvious to me. Modern computing is
full of ASCII-esque binary communication standards and formats.

Python 2's ambiguity allows me not to answer the tough philosophical
questions. I'm not saying it's necessarily a good thing, but it has its
benefits.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread Johannes Bauer

On 13.05.2014 03:18, Steven D'Aprano wrote:

> Armin Ronacher is an extremely experienced and knowledgeable Python 
> developer, and a Python core developer. He might be wrong, but he's not 
> *obviously* wrong.

He's correct about file name encodings. Which can be fixed really easily
wihtout messing everything up (sys.argv binary variant, open accepting
binary filenames). But that he suggests that Go would be superior:

> Which uses an even simpler model than Python 2: everything is a byte string. 
> The assumed encoding is UTF-8. End of the story.

Is just a horrible idea. An obviously horrible idea, too.

Having dealt with the UTF-8 problems on Python2 I can safely say that I
never, never ever want to go back to that freaky hell. If I deal with
strings, I want to be able to sanely manipulate them and I want to be
sure that after manipulation they're still valid strings. Manipulating
the bytes representation of unicode data just doesn't work.

And I'm very very glad that some people felt the same way and
implemented a sane, consistent way of dealing with Unicode in Python3.
It's one of the reasons why I switched to Py3 very early and I love it.

Cheers,
Johannes

-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa 
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-13 Thread gregor

Am 13 May 2014 01:18:35 GMT
schrieb Steven D'Aprano :

> 
> - have a simple way to write bytes to stdout and stderr.

there is the underlying binary buffer:

https://docs.python.org/3/library/sys.html#sys.stdin

greg

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread Mark H Harris


On 5/13/14 1:18 AM, Chris Angelico wrote:

instead of yelling "LALALALALA America is everything" and
pretending that ASCII, or Latin-1, or something, is all you need.



... it isn't?



LALALALALALALALALA   :))

--
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread Chris Angelico

On Tue, May 13, 2014 at 4:25 PM, alex23  wrote:
> On 13/05/2014 11:39 AM, Chris Angelico wrote:
>>
>> On Tue, May 13, 2014 at 11:18 AM, Steven D'Aprano
>>  wrote:
>>>
>>> - have a bytes version of sys.argv (bargv? argvb?) and read
>>>the file names from that;
>>
>>
>> argb? :)
>
>
> I tried and failed to come up with an "argy bargy" joke here so decided to
> go for a meta-reference instead.

I'm just waiting for someone to have need for arguments in both
network byte order and host byte order. The latter, of course, would
be "argh".

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread alex23


On 13/05/2014 11:39 AM, Chris Angelico wrote:

On Tue, May 13, 2014 at 11:18 AM, Steven D'Aprano
 wrote:

- have a bytes version of sys.argv (bargv? argvb?) and read
   the file names from that;


argb? :)


I tried and failed to come up with an "argy bargy" joke here so decided 
to go for a meta-reference instead.


--
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread Chris Angelico

On Tue, May 13, 2014 at 4:03 PM, Ben Finney  wrote:
> (It's always a good day to remind people that the rest of the world
> exists.)

Ironic that this should come up in a discussion on Unicode, given that
Unicode's fundamental purpose is to welcome that whole rest of the
world instead of yelling "LALALALALA America is everything" and
pretending that ASCII, or Latin-1, or something, is all you need.

ChrisA
Currently enjoying "Monday Night Flagging" on Threshold RPG... at 4pm
on Tuesday.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread Rustom Mody

On Tuesday, May 13, 2014 11:09:06 AM UTC+5:30, Mark H. Harris wrote:
> On 5/13/14 12:10 AM, Rustom Mody wrote:
> 
> > I think the most helpful way forward is to accept two things:
> > a. Unicode is a headache
> > b. No-unicode is a non-option
> 
> 
> QOTW(so far...)

I said that getting unicode right straight off is unrealistic.

I should have added this:
Armin makes a (sarcastic?) dig about the fact that python (3) goofs because
its mismatched with the assumptions of unix.

| UNIX is bytes, has been defined that way and will always be that way. To 

| Unicode on UNIX is only madness if you force it on everything. But that's not 
| how Unicode on UNIX works. UNIX does not have a distinction between unicode 
| and byte APIs. They are one and the same which makes them easy to deal with.]

| Python 3 takes a very difference stance on Unicode than UNIX does. Python 3 
| says: everything is Unicode ...

This may be right...
Or it may be the other way round as I claim at 
http://blog.languager.org/2014/04/unicode-and-unix-assumption.html

At this point I dont believe that anyone is very clear what is the
right way and and wrong way
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread Ben Finney

Gene Heskett  writes:

> On Tuesday 13 May 2014 01:39:06 Mark H Harris did opine

> > QOTW(so far...)
>
> But its early yet, only Tuesday & its just barely started... :)

Says who? For some of us, Tuesday is approaching sunset.

(It's always a good day to remind people that the rest of the world
exists.)

-- 
 \ “Reality must take precedence over public relations, for nature |
  `\   cannot be fooled.” —Richard P. Feynman, _Rogers' Commission |
_o__)   Report into the Challenger Crash_, 1986-06 |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread Gene Heskett

On Tuesday 13 May 2014 01:39:06 Mark H Harris did opine
And Gene did reply:
> On 5/13/14 12:10 AM, Rustom Mody wrote:
> > I think the most helpful way forward is to accept two things:
> > a. Unicode is a headache
> > b. No-unicode is a non-option
> 
> QOTW(so far...)

But its early yet, only Tuesday & its just barely started... :)

Cheers, Gene
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page 
US V Castleman, SCOTUS, Mar 2014 is grounds for Impeaching SCOTUS
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread Mark H Harris


On 5/13/14 12:10 AM, Rustom Mody wrote:

I think the most helpful way forward is to accept two things:
a. Unicode is a headache
b. No-unicode is a non-option


QOTW(so far...)

--
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread Rustom Mody

On Tuesday, May 13, 2014 6:48:35 AM UTC+5:30, Steven D'Aprano wrote:
> On Mon, 12 May 2014 17:47:48 +, alister wrote:
> 
> > Surely those example programs are not the pythonoic way to do things or
> > am i missing something?
> 
> 
> 
> Feel free to show us your version of "cat" for Python then. Feel free to 
> target any version you like. Don't forget to test it against files with 
> names and content that:
> 
> 
> - aren't valid UTF-8;
> 
> 
> - are valid UTF-8, but not valid in the local encoding.

Thanks for a non-defensive appraisal!

> 
> 
> > if those code samples are anything to go by this guy makes JMF look
> > sensible.
> 
> 
> 
> Armin Ronacher is an extremely experienced and knowledgeable Python 
> developer, and a Python core developer. He might be wrong, but he's not 
> *obviously* wrong.
> 
> 
> 
> Unicode is hard, not because Unicode is hard, but because of legacy 
> problems. I can create a file on a machine that uses ISO-8859-7 for the 
> file name, put JShift-JIS encoded text inside it, transfer it to a 
> machine that uses Windows-1251 as the file system encoding, then SSH into 
> that machine from a system using Big5, and try to make sense of it. If 
> everybody used UTF-8 any time data touched a disk or network, we'd be 
> laughing. It would all be so simple.

I think the most helpful way forward is to accept two things:
a. Unicode is a headache
b. No-unicode is a non-option

> 
> 
> 
> Reading Armin's post, I think that all that is needed to simplify his 
> Python 3 version is:
> 
> 
> 
> - have a bytes version of sys.argv (bargv? argvb?) and read 
>   the file names from that;
> 
> - have a simple way to write bytes to stdout and stderr.
> 
> 
> Most programs won't need either of those, but file system utilities will.

About the technical merits of Armin's post and your suggestions, Ive 
nothing to say, since I am an ignoramus on (the mechanics of) unicode

[Consider me an eager, early, ignorant adopter :-) ]

Its however good to note that unicode is rather unique in the history
not just of IT/CS but of humanity, in the sense that no one (to the best
of my knowledge) has ever tried to come up with an all-encompassing umbrella
for all humanity's scripts/writing systems etc.

So hiccups and mistakes are only to be expected.  The absence of these would
be much more surprising!
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread Mark Lawrence


On 13/05/2014 02:18, Steven D'Aprano wrote:

On Mon, 12 May 2014 17:47:48 +, alister wrote:


On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote:


This was *NOT* written by our resident unicode expert
http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/

Posted as I thought it would make a rather pleasant change from
interminable threads about names vs values vs variables vs objects.


Surely those example programs are not the pythonoic way to do things or
am i missing something?


Feel free to show us your version of "cat" for Python then. Feel free to
target any version you like. Don't forget to test it against files with
names and content that:

- aren't valid UTF-8;

- are valid UTF-8, but not valid in the local encoding.




if those code samples are anything to go by this guy makes JMF look
sensible.


Armin Ronacher is an extremely experienced and knowledgeable Python
developer, and a Python core developer. He might be wrong, but he's not
*obviously* wrong.

Unicode is hard, not because Unicode is hard, but because of legacy
problems. I can create a file on a machine that uses ISO-8859-7 for the
file name, put JShift-JIS encoded text inside it, transfer it to a
machine that uses Windows-1251 as the file system encoding, then SSH into
that machine from a system using Big5, and try to make sense of it. If
everybody used UTF-8 any time data touched a disk or network, we'd be
laughing. It would all be so simple.

Reading Armin's post, I think that all that is needed to simplify his
Python 3 version is:

- have a bytes version of sys.argv (bargv? argvb?) and read
   the file names from that;

- have a simple way to write bytes to stdout and stderr.

Most programs won't need either of those, but file system utilities will.



I think http://bugs.python.org/issue8776 and 
http://bugs.python.org/issue8775 are relevant but both were placed in 
the small round filing cabinet.


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread Mark H Harris


On 5/12/14 8:18 PM, Steven D'Aprano wrote:

Unicode is hard, not because Unicode is hard, but because of legacy
problems.


Yes.  To put a finer point on that, Unicode (which is only a 
specification constantly being improved upon) is harder to implement 
when it hasn't been on the design board from the ground up; Python in 
this case.


Julia has Unicode support from the ground up, and it was easier for 
those guys to implement (in beta release) than for the Python crew when 
they undertook the Unicode work that had to be done for Python3.x (just 
an observation).


Anytime there are legacy code issues, regression testing problems, and a 
host of domain issues that weren't thought through from the get-go there 
are going to be more problematic hurdles; not to mention bugs.


Having said that, I still think Unicode is somewhat harder than you're 
admitting.


marcus

--
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread Chris Angelico

On Tue, May 13, 2014 at 11:18 AM, Steven D'Aprano
 wrote:
> Reading Armin's post, I think that all that is needed to simplify his
> Python 3 version is:
>
> - have a bytes version of sys.argv (bargv? argvb?) and read
>   the file names from that;

argb? :)

> - have a simple way to write bytes to stdout and stderr.

I'm not sure how that goes with I/O redirection, but sure.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread Steven D'Aprano

On Mon, 12 May 2014 17:47:48 +, alister wrote:

> On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote:
> 
>> This was *NOT* written by our resident unicode expert
>> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/
>> 
>> Posted as I thought it would make a rather pleasant change from
>> interminable threads about names vs values vs variables vs objects.
> 
> Surely those example programs are not the pythonoic way to do things or
> am i missing something?

Feel free to show us your version of "cat" for Python then. Feel free to 
target any version you like. Don't forget to test it against files with 
names and content that:

- aren't valid UTF-8;

- are valid UTF-8, but not valid in the local encoding.

> if those code samples are anything to go by this guy makes JMF look
> sensible.

Armin Ronacher is an extremely experienced and knowledgeable Python 
developer, and a Python core developer. He might be wrong, but he's not 
*obviously* wrong.

Unicode is hard, not because Unicode is hard, but because of legacy 
problems. I can create a file on a machine that uses ISO-8859-7 for the 
file name, put JShift-JIS encoded text inside it, transfer it to a 
machine that uses Windows-1251 as the file system encoding, then SSH into 
that machine from a system using Big5, and try to make sense of it. If 
everybody used UTF-8 any time data touched a disk or network, we'd be 
laughing. It would all be so simple.

Reading Armin's post, I think that all that is needed to simplify his 
Python 3 version is:

- have a bytes version of sys.argv (bargv? argvb?) and read 
  the file names from that;

- have a simple way to write bytes to stdout and stderr.

Most programs won't need either of those, but file system utilities will.

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread Chris Angelico

On Tue, May 13, 2014 at 4:31 AM, Ian Kelly  wrote:
> Just because his code sucks doesn't mean he's
> wrong about the state of Unicode and UNIX in Python 3.

Uhm... I think wrongness of code is generally fairly indicative of
wrongness of thinking :) If I write a rant about how Python's list
type sucks and it turns out my code is using it like a cons cell and
never putting more than two elements into a list, then you would
accurately conclude that I'm wrong about the state of data type
support in Python.

I don't have a problem with someone coming to the list here with
misconceptions. That's what discussions are for. But rants like that,
on blogs, I quickly get weary of reading. The tone is always "Look
what's so wrong", not inviting dialogue, and I can't be bothered
digging into the details to compose a full response. Chances are the
author's (a) not looking at what 3.4 and what's happened to improve
things (and certainly not 3.5 and what's going to happen), and (b) not
listening to responses anyway.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread Ian Kelly

On Mon, May 12, 2014 at 1:42 PM, MRAB  wrote:
> How about checking sys.stdin.mode and sys.stdout.mode?

Seems to work, but I notice that the docs only define the mode
attribute for the FileIO class, which sys.stdin and sys.stdout are not
instances of.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread MRAB


On 2014-05-12 19:31, Ian Kelly wrote:

On Mon, May 12, 2014 at 11:47 AM, alister
 wrote:

On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote:


This was *NOT* written by our resident unicode expert
http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/

Posted as I thought it would make a rather pleasant change from
interminable threads about names vs values vs variables vs objects.


Surely those example programs are not the pythonoic way to do things or
am i missing something?


The _is_binary_reader and _is_binary_writer functions look like they
could be simplified by calling isinstance on the io object itself
against io.TextIOBase, io.BufferedIOBase or io.RawIOBase, rather than
doing those odd 0-length reads and writes.  And then perhaps those
exception-swallowing try-excepts wouldn't be necessary.  But perhaps
there's a non-obvious reason why it's written the way it is.


How about checking sys.stdin.mode and sys.stdout.mode?


And there appears to be a bug where everything *except* the filename
'-' is treated as stdin, so the script probably hasn't been tested at
all.


if those code samples are anything to go by this guy makes JMF look
sensible.


This is an ad hominem.  Just because his code sucks doesn't mean he's
wrong about the state of Unicode and UNIX in Python 3.



--
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread Ian Kelly

On Mon, May 12, 2014 at 11:47 AM, alister
 wrote:
> On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote:
>
>> This was *NOT* written by our resident unicode expert
>> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/
>>
>> Posted as I thought it would make a rather pleasant change from
>> interminable threads about names vs values vs variables vs objects.
>
> Surely those example programs are not the pythonoic way to do things or
> am i missing something?

The _is_binary_reader and _is_binary_writer functions look like they
could be simplified by calling isinstance on the io object itself
against io.TextIOBase, io.BufferedIOBase or io.RawIOBase, rather than
doing those odd 0-length reads and writes.  And then perhaps those
exception-swallowing try-excepts wouldn't be necessary.  But perhaps
there's a non-obvious reason why it's written the way it is.

And there appears to be a bug where everything *except* the filename
'-' is treated as stdin, so the script probably hasn't been tested at
all.

> if those code samples are anything to go by this guy makes JMF look
> sensible.

This is an ad hominem.  Just because his code sucks doesn't mean he's
wrong about the state of Unicode and UNIX in Python 3.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Everything you did not want to know about Unicode in Python 3

2014-05-12 Thread alister

On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote:

> This was *NOT* written by our resident unicode expert
> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/
> 
> Posted as I thought it would make a rather pleasant change from
> interminable threads about names vs values vs variables vs objects.

Surely those example programs are not the pythonoic way to do things or 
am i missing something?

if those code samples are anything to go by this guy makes JMF look 
sensible.



-- 
The Heineken Uncertainty Principle:
You can never be sure how many beers you had last night.
-- 
https://mail.python.org/mailman/listinfo/python-list

69 matches

Mail list logo