Re: Everything you did not want to know about Unicode in Python 3
On 17/05/2014 05:19, Marko Rauhamaa wrote: The sole copyright holder can simply state: this work is in the Public Domain, or: all rights relinquished, or some such. Ultimately, everything is decided by the courts, of course. For examples see all the Python PEPs. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 2014-05-17 02:07, Steven D'Aprano wrote: On Fri, 16 May 2014 14:46:23 +, Grant Edwards wrote: At least in the US, there doesn't seem to be such a thing as placing a work into the public domain. The copyright holder can transfer ownershipt to soembody else, but there is no public domain to which ownership can be trasferred. That's factually incorrect. In the US, sufficiently old works, or works of a certain age that were not explicitly registered for copyright, are in the public domain. Under a wide range of circumstances, works created by the federal government go immediately into the public domain. There is such a thing as the public domain in the US, and there are works in it, but there isn't really such a thing as placing a work there voluntarily, as Grant says. A work either is or isn't in the public domain. The author has no choice in the matter. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Sat, May 17, 2014 at 6:57 PM, Robert Kern robert.k...@gmail.com wrote: There is such a thing as the public domain in the US, and there are works in it, but there isn't really such a thing as placing a work there voluntarily, as Grant says. A work either is or isn't in the public domain. The author has no choice in the matter. Then what's copyright status on PEPs? The nearest thing to assigning to public domain that works across legislatures is probably CC0: http://creativecommons.org/about/cc0 ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 2014-05-17 05:19, Marko Rauhamaa wrote: Steven D'Aprano steve+comp.lang.pyt...@pearwood.info: On Fri, 16 May 2014 14:46:23 +, Grant Edwards wrote: At least in the US, there doesn't seem to be such a thing as placing a work into the public domain. The copyright holder can transfer ownershipt to soembody else, but there is no public domain to which ownership can be trasferred. That's factually incorrect. In the US, sufficiently old works, or works of a certain age that were not explicitly registered for copyright, are in the public domain. Under a wide range of circumstances, works created by the federal government go immediately into the public domain. Steven, you're not disputing Grant. I am. The sole copyright holder can simply state: this work is in the Public Domain, or: all rights relinquished, or some such. Ultimately, everything is decided by the courts, of course. One can state many things, but that doesn't mean they have legal effect. The US Code has provisions for how works become copyrighted automatically, how they leave copyright automatically at the end of specific time periods, how some works automatically enter the public domain on their creation (i.e. works of the US federal government), but has nothing at all for how a private creator can voluntarily place their work into the public domain when it would otherwise not be. It used to, but does not any more. For a private individual to say about a work they just created that this work is in the Public Domain is, under US law, merely an erroneous statement of fact, not a speech act that effects a change in the legal status of the work. For another example of this distinction, saying I am married when I have not applied for, received, and solemnified a valid marriage license is just an erroneous statement of fact and does not make me legally married. Relinquishing your rights can have some effect, but not all rights can be relinquished, and this is not the same as putting your work into the public domain. Among other things, your heirs can sometimes reclaim those rights in some circumstances if you are not careful (and if they are valuable enough to bother reclaiming). If you wish to do something like this, I highly recommend (though IANAL and TINLA) using the CC0 Waiver from Creative Commons. It has thorough legalese for relinquishing all the rights that one can relinquish for the maximum terms that one can do so in as many jurisdictions as possible and acts as a license to use/distribute/etc. without restriction even if some rights cannot be relinquished. Even if US law were to change to provide for dedicating works to the public domain, I would probably still use the CC0 anyways to account for the high variability in how different jurisdictions around the world treat their own public domains. http://creativecommons.org/about/cc0 http://wiki.creativecommons.org/CC0_FAQ Note how they distinguish the CC0 Waiver from their Public Domain Mark: the Public Domain Mark is just a label for things that are known to be free of copyright worldwide but does not make a work so. The CC0 *does* have an operative effect that is substantially similar to the work being in the public domain. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
Chris Angelico ros...@gmail.com writes: On Sat, May 17, 2014 at 6:57 PM, Robert Kern robert.k...@gmail.com wrote: There is such a thing as the public domain in the US, and there are works in it, but there isn't really such a thing as placing a work there voluntarily, as Grant says. A work either is or isn't in the public domain. The author has no choice in the matter. Then what's copyright status on PEPs? My guess: They are in the default copyright status, with all rights reserved (i.e. everything that copyright law restricts, is forbidden to the recipient). But, if any of those copyright holders were ever to assert their copyright had been infringed by some recipient, the “this work is in the public domain” or equivalent would be taken as a clear indication of the *intent* of the copyright holder. Ultimately, what matters is the determination of whatever judge you find yourself facing. To that end, clarifying in the copyright statement and license terms exactly what is permitted can be immensely helpful in foreshortening and, ideally, avoiding a future copyright suit. Copyright is a ridiculous burden on everyone — to the extent that even those copyright holders who don't *want* those rights which the law reserves to the copyright holder, and want to divest themselves of the role of copyright holder, find it frustratingly difficult to do so effectively across jurisdictions. -- \ “Computer perspective on Moore's Law: Human effort becomes | `\ twice as expensive roughly every two years.” —anonymous | _o__) | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Sat, 17 May 2014 09:57:06 +0100, Robert Kern wrote: On 2014-05-17 02:07, Steven D'Aprano wrote: On Fri, 16 May 2014 14:46:23 +, Grant Edwards wrote: At least in the US, there doesn't seem to be such a thing as placing a work into the public domain. The copyright holder can transfer ownershipt to soembody else, but there is no public domain to which ownership can be trasferred. That's factually incorrect. In the US, sufficiently old works, or works of a certain age that were not explicitly registered for copyright, are in the public domain. Under a wide range of circumstances, works created by the federal government go immediately into the public domain. There is such a thing as the public domain in the US, and there are works in it, but there isn't really such a thing as placing a work there voluntarily, as Grant says. A work either is or isn't in the public domain. The author has no choice in the matter. That's incorrect. http://cr.yp.to/publicdomain.html Here's the money quote, from the 9th Circuit Court: It is well settled that rights gained under the Copyright Act may be abandoned. But abandonment of a right must be manifested by some overt act indicating an intention to abandon that right. There's also this: http://creativecommons.org/publicdomain/zero/1.0/ which counts as an overt act. By the way, there's more info on US copyright terms here: http://copyright.cornell.edu/resources/publicdomain.cfm although it doesn't specifically mention voluntarily abandonment of copyright. -- Steven D'Aprano http://import-that.dreamwidth.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Sat, 17 May 2014 10:29:00 +0100, Robert Kern wrote: One can state many things, but that doesn't mean they have legal effect. The US Code has provisions for how works become copyrighted automatically, how they leave copyright automatically at the end of specific time periods, how some works automatically enter the public domain on their creation (i.e. works of the US federal government), but has nothing at all for how a private creator can voluntarily place their work into the public domain when it would otherwise not be. It used to, but does not any more. The case for abandonment was stated as well settled in 1998 (Micro-Star v. Formgen Inc). Unless there has been a major legal change in the years since then, I don't think it is true that authors cannot abandon copyright. For a private individual to say about a work they just created that this work is in the Public Domain is, under US law, merely an erroneous statement of fact, not a speech act that effects a change in the legal status of the work. For another example of this distinction, saying I am married when I have not applied for, received, and solemnified a valid marriage license is just an erroneous statement of fact and does not make me legally married. There may be something to what you say, although I think we're now arguing fine semantic details. See: https://en.wikipedia.org/wiki/Wikipedia:Granting_work_into_the_public_domain To play Devil's Advocate in favour of your assertion, it may be that abandoning copyright does not literally put the work in the public domain, but merely makes it quack like the public domain. That is to say, the author still, in some abstract but legally meaningless sense, has copyright in the work *but* has given unlimited usage rights. (I don't actually think that is the case, at least not in the US.) It's this tiny bit of residual uncertainty that leads some authorities to say that it is hard to release a work into the public domain, particularly in a world-wide context, and that merely stating this is in the public domain is not sufficient to remove all legal doubt over the status, and that a more overt and explicit release *may* be required. Hence the CC0 licence which you refer to. The human readable summary says in part: The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. http://creativecommons.org/publicdomain/zero/1.0/ while the actual legal licence comes in at almost 800 words. This is basically the same as I release this to the public domain only longer. (The CC0 licence is longer than you might expect, because it is assumed that it may have to apply in countries where you *really cannot* relinquish copyright. But we're specifically talking about the US, where the 9th Circuit says you can.) Relinquishing your rights can have some effect, but not all rights can be relinquished, Outside of the US, so-called moral rights or reputation rights cannot generally be relinquished, except perhaps in work-for-hire and perhaps not even then. (E.g. if you're a ghost writer.) The situation in the US is a bit murky -- there are no official moral rights per se, and copyright only controls usage rights such as copying, distribution and so forth. But this doesn't mean that you can (for example) claim authorship of a public domain work unless you actually wrote it. In any case, we're discussing copyright, not other rights. and this is not the same as putting your work into the public domain. One might not be the same while still being effectively the same. For example, the U.S. Copyright Office states that one may not grant their work into the public domain. However, a copyright owner may release all of their rights to their work by stating the work may be freely reproduced, distributed, etc. as if it were in in the public domain. But note that the Copyright Office does not make the final decision whether you can relinquish copyright or not. That's up to the courts. Among other things, your heirs can sometimes reclaim those rights in some circumstances if you are not careful (and if they are valuable enough to bother reclaiming). That's a good point. A simplistic I release this to the public domain statement *may* (I emphasise the uncertainty) leave some doubt that it is *sufficiently overt* to prevent your heirs from disagreeing and coming after your users for infringement. Then the courts have to get involved, and it's all ugliness and only the lawyers win. Hence the advice to be as explicit and overt as possible. If you wish to do something like this, I highly recommend (though IANAL and TINLA)
Re: Everything you did not want to know about Unicode in Python 3
On 2014-05-17 15:15, Steven D'Aprano wrote: On Sat, 17 May 2014 10:29:00 +0100, Robert Kern wrote: One can state many things, but that doesn't mean they have legal effect. The US Code has provisions for how works become copyrighted automatically, how they leave copyright automatically at the end of specific time periods, how some works automatically enter the public domain on their creation (i.e. works of the US federal government), but has nothing at all for how a private creator can voluntarily place their work into the public domain when it would otherwise not be. It used to, but does not any more. The case for abandonment was stated as well settled in 1998 (Micro-Star v. Formgen Inc). Unless there has been a major legal change in the years since then, I don't think it is true that authors cannot abandon copyright. Good old Micro-Star v. Formgen Inc. A perennial favorite. No, that case did not settle this question. There is a statement in the opinion that would suggest this, but (and this seems to be a reoccurring theme) it's inclusion in the opinion did not create precedent to that effect. The statement that you refer to is, as far as my NAL eyes can tell, what the lawyers call dictum: a statement made by a judicial opinion but is unnecessary to decide the case and therefore not precedential. FormGen explicitly registered the copyright to the works in question, and the case was decided on whether or not the Micro-Star-redistributed works counted as derivative works (yes). Now, if the case were about an author that affirmatively dedicated his work to the public domain and then sued someone who redistributed it, then such a statement would have a precedential effect (because then the judge would decide in favor of the defendant on the basis of that statement). The quote that you refer to references a previous case, which follows similar lines, and also predates the automatic copyright regime post-Berne Convention, so it's not even clear to me that it should have been precedential to Micro-Star. Even if this case did so decide (which, I will grant it more or less did later by codifying such a rule in their jury instructions for such cases), it would only have effect in the 9th Circuit of the US and not even in the rest of the US, much less worldwide. Why bother when the CC0 gives you the desired effect with more assurance to your audience? For a private individual to say about a work they just created that this work is in the Public Domain is, under US law, merely an erroneous statement of fact, not a speech act that effects a change in the legal status of the work. For another example of this distinction, saying I am married when I have not applied for, received, and solemnified a valid marriage license is just an erroneous statement of fact and does not make me legally married. There may be something to what you say, although I think we're now arguing fine semantic details. Sure, it's the law. Fine semantic details are important. However, the difference between speech acts and statements of fact is a pretty gross semantic distinction and not just splitting semantic hairs. The act of making some statements (e.g. declaring that a work you own the copyright to is available under a given license) actually makes a change in the legal status of something. Most statements don't. Which ones do and don't are defined by statute and (in common law countries like the US) court decisions. Deciding which is which is often hairy, but that's an epistemological problem, not a semantic one. :-) See: https://en.wikipedia.org/wiki/Wikipedia:Granting_work_into_the_public_domain To play Devil's Advocate in favour of your assertion, it may be that abandoning copyright does not literally put the work in the public domain, but merely makes it quack like the public domain. That is to say, the author still, in some abstract but legally meaningless sense, has copyright in the work *but* has given unlimited usage rights. (I don't actually think that is the case, at least not in the US.) It's this tiny bit of residual uncertainty that leads some authorities to say that it is hard to release a work into the public domain, particularly in a world-wide context, and that merely stating this is in the public domain is not sufficient to remove all legal doubt over the status, and that a more overt and explicit release *may* be required. Hence the CC0 licence which you refer to. The human readable summary says in part: The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. http://creativecommons.org/publicdomain/zero/1.0/ while the actual
Re: Everything you did not want to know about Unicode in Python 3
On 2014-05-17 13:07, Steven D'Aprano wrote: On Sat, 17 May 2014 09:57:06 +0100, Robert Kern wrote: On 2014-05-17 02:07, Steven D'Aprano wrote: On Fri, 16 May 2014 14:46:23 +, Grant Edwards wrote: At least in the US, there doesn't seem to be such a thing as placing a work into the public domain. The copyright holder can transfer ownershipt to soembody else, but there is no public domain to which ownership can be trasferred. That's factually incorrect. In the US, sufficiently old works, or works of a certain age that were not explicitly registered for copyright, are in the public domain. Under a wide range of circumstances, works created by the federal government go immediately into the public domain. There is such a thing as the public domain in the US, and there are works in it, but there isn't really such a thing as placing a work there voluntarily, as Grant says. A work either is or isn't in the public domain. The author has no choice in the matter. That's incorrect. http://cr.yp.to/publicdomain.html Thanks for the link. While it has not really changed my opinion (as discussed at length in my other reply), I did not know that the 9th Circuit had formalized the overt act test in their civil procedure rules, so there is at least one jurisdiction in the US that does currently work like this. None of the others do, to my knowledge, and this is the product of judicial common law, not statutory law, so it's still pretty shaky. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
Terry Reedy tjreedy at udel.edu writes: On 5/13/2014 8:53 PM, Ethan Furman wrote: On 05/13/2014 05:10 PM, Steven D'Aprano wrote: On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote: Because Python 3 presents stdin and stdout as text streams however, it makes them more difficult to use with binary data, which is why Armin sets up all that extra code to make sure his file objects are binary. What surprises me is how hard that is. Surely there's a simpler way to open stdin and stdout in binary mode? If not, there ought to be. Somebody already posted this: https://docs.python.org/3/library/sys.html#sys.stdin which talks about .detach(). I sent a message to Armin about this. And the documentation has now been fixed: http://bugs.python.org/issue21364 So something *can* come out of a python-list rantfest, it seems. Regards Antoine. -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
Le vendredi 16 mai 2014 13:50:47 UTC+2, Antoine Pitrou a écrit : Terry Reedy tjreedy at udel.edu writes: On 5/13/2014 8:53 PM, Ethan Furman wrote: On 05/13/2014 05:10 PM, Steven D'Aprano wrote: On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote: Because Python 3 presents stdin and stdout as text streams however, it makes them more difficult to use with binary data, which is why Armin sets up all that extra code to make sure his file objects are binary. What surprises me is how hard that is. Surely there's a simpler way to open stdin and stdout in binary mode? If not, there ought to be. Somebody already posted this: https://docs.python.org/3/library/sys.html#sys.stdin which talks about .detach(). I sent a message to Armin about this. And the documentation has now been fixed: http://bugs.python.org/issue21364 So something *can* come out of a python-list rantfest, it seems. Regards Antoine. == http://www.unicode.org/ Avec mes meilleures salutations. -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 2014-05-14, alister alister.nospam.w...@ntlworld.com wrote: On Wed, 14 May 2014 10:08:57 +1000, Chris Angelico wrote: On Wed, May 14, 2014 at 9:53 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: With the current system, all of us here are technically violating copyright every time we reply to an email and quote more than a small percentage of it. Oh wow... so when someone quotes heaps of text without trimming, and adding blank lines, we can complain that it's a copyright violation - reproducing our work with unauthorized modifications and without permission... I never thought of it like that. I think I could make a very strong case that anything sent to a public forum with the intention of being broadcast has been placed into the public domain by this action. At least in the US, there doesn't seem to be such a thing as placing a work into the public domain. The copyright holder can transfer ownershipt to soembody else, but there is no public domain to which ownership can be trasferred. IIRC, there is a way under Germain copyright law to release certain rights. The mere act of widely widely distributing something does not in any way relinquish copyrights. -- Grant Edwards grant.b.edwardsYow! Am I elected yet? at gmail.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Fri, 16 May 2014 14:46:23 +, Grant Edwards wrote: At least in the US, there doesn't seem to be such a thing as placing a work into the public domain. The copyright holder can transfer ownershipt to soembody else, but there is no public domain to which ownership can be trasferred. That's factually incorrect. In the US, sufficiently old works, or works of a certain age that were not explicitly registered for copyright, are in the public domain. Under a wide range of circumstances, works created by the federal government go immediately into the public domain. It is true that under the Mickey Mouse Copyright Grab Act[1] of insert years here, every time Mickey Mouse is about to reach the end of copyright, Congress retroactively extends copyright terms for another few decades, but that's another story. [1] Not the real name of the act. -- Steven D'Aprano http://import-that.dreamwidth.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info: On Fri, 16 May 2014 14:46:23 +, Grant Edwards wrote: At least in the US, there doesn't seem to be such a thing as placing a work into the public domain. The copyright holder can transfer ownershipt to soembody else, but there is no public domain to which ownership can be trasferred. That's factually incorrect. In the US, sufficiently old works, or works of a certain age that were not explicitly registered for copyright, are in the public domain. Under a wide range of circumstances, works created by the federal government go immediately into the public domain. Steven, you're not disputing Grant. I am. The sole copyright holder can simply state: this work is in the Public Domain, or: all rights relinquished, or some such. Ultimately, everything is decided by the courts, of course. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
Le mardi 13 mai 2014 10:08:45 UTC+2, Johannes Bauer a écrit : On 13.05.2014 03:18, Steven D'Aprano wrote: Armin Ronacher is an extremely experienced and knowledgeable Python developer, and a Python core developer. He might be wrong, but he's not *obviously* wrong. He's correct about file name encodings. Which can be fixed really easily wihtout messing everything up (sys.argv binary variant, open accepting binary filenames). But that he suggests that Go would be superior: Which uses an even simpler model than Python 2: everything is a byte string. The assumed encoding is UTF-8. End of the story. Is just a horrible idea. An obviously horrible idea, too. Having dealt with the UTF-8 problems on Python2 I can safely say that I never, never ever want to go back to that freaky hell. If I deal with strings, I want to be able to sanely manipulate them and I want to be sure that after manipulation they're still valid strings. Manipulating the bytes representation of unicode data just doesn't work. And I'm very very glad that some people felt the same way and implemented a sane, consistent way of dealing with Unicode in Python3. It's one of the reasons why I switched to Py3 very early and I love it. Cheers, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org === A Rob 'Commander' Pike will never put utf16 and ebcdic in the same basket, when discussing coding of characters. jmf -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote: On Tue, May 13, 2014 at 5:19 AM, alister alister.nospam.w...@ntlworld.com wrote: I am only an amateur python coder which is why I asked if I am missing something I could not see any reason to be using the shutil module if all that the programm is doing is opening a file, reading it then printing it. is it python that causes the issue, the shutil module or just the OS not liking the data it is being sent? an explanation of why this approach is taken would be much appreciated. No, that part is perfectly fine. This is exactly what the shutil module is meant for: providing shell-like operations. Although in this case the copyfileobj function is quite simple (have yourself a look at the source -- it just reads from one file and writes to the other in a loop), in general the Pythonic thing is to avoid reinventing the wheel. And since it's so simple, it shouldn't be hard to see that the use of the shutil module has nothing to do with the Unicode woes here. The crux of the issue is that a general-purpose command like cat typically can't know the encoding of its input and can't assume anything about it. In fact, there may not even be an encoding; cat can be used with binary data. The only non-destructive approach then is to copy the binary data straight from the source to the destination with no decoding steps at all, and trust the user to ensure that the destination will be able to accommodate the source encoding. Because Python 3 presents stdin and stdout as text streams however, it makes them more difficult to use with binary data, which is why Armin sets up all that extra code to make sure his file objects are binary. I think I understand that in which case I owe Armin an apology, this certainly sounds like a failing in pythons handling of stdout -- Get it up, keep it up... LINUX: Viagra for the PC. -- Chris Abbey -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Wed, 14 May 2014 10:08:57 +1000, Chris Angelico wrote: On Wed, May 14, 2014 at 9:53 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: With the current system, all of us here are technically violating copyright every time we reply to an email and quote more than a small percentage of it. Oh wow... so when someone quotes heaps of text without trimming, and adding blank lines, we can complain that it's a copyright violation - reproducing our work with unauthorized modifications and without permission... I never thought of it like that. ChrisA I think I could make a very strong case that anything sent to a public forum with the intention of being broadcast has been placed into the public domain by this action. -- Work expands to fill the time available. -- Cyril Northcote Parkinson, The Economist, 1955 -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Wed, May 14, 2014 at 10:42 PM, alister alister.nospam.w...@ntlworld.com wrote: On Wed, 14 May 2014 10:08:57 +1000, Chris Angelico wrote: On Wed, May 14, 2014 at 9:53 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: With the current system, all of us here are technically violating copyright every time we reply to an email and quote more than a small percentage of it. Oh wow... so when someone quotes heaps of text without trimming, and adding blank lines, we can complain that it's a copyright violation - reproducing our work with unauthorized modifications and without permission... I never thought of it like that. ChrisA I think I could make a very strong case that anything sent to a public forum with the intention of being broadcast has been placed into the public domain by this action. I don't think so. One can reasonably assume that anything sent to a public forum is permissible to read, and to copy verbatim (although there may be presumed limits on the copying, but probably not with python-list). But if I quote your text and edit it, then you would rightly complain, which is not the case with public domain text. The question is whether or not it's fair to try to scare people with that when they repeatedly use buggy software that inserts blank lines everywhere :) In case it's not obvious, I am NOT seriously contemplating pursuing anything like this legally. It's just funny to contemplate. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 05/13/2014 09:39 AM, Steven D'Aprano wrote: On Tue, 13 May 2014 07:20:34 -0400, Roy Smith wrote: ASCII *is* all I need. You've never needed to copyright something? Copyright © Roy Smith 2014... I know some people use (c) instead, but that actually has no legal standing. (Not that any reasonable judge would invalidate a copyright based on a technicality like that, not these days.) (c) has no standing whatsoever, as it's properly spelled (copr) -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On May 13, 2014 6:10 PM, Chris Angelico ros...@gmail.com wrote: On Wed, May 14, 2014 at 9:53 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: With the current system, all of us here are technically violating copyright every time we reply to an email and quote more than a small percentage of it. Oh wow... so when someone quotes heaps of text without trimming, and adding blank lines, we can complain that it's a copyright violation - reproducing our work with unauthorized modifications and without permission... I never thought of it like that. I'd be surprised if this doesn't fall under fair use. -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 13/05/2014 17:08, Ian Kelly wrote: . And since it's so simple, it shouldn't be hard to see that the use of the shutil module has nothing to do with the Unicode woes here. The crux of the issue is that a general-purpose command like cat typically can't know the encoding of its input and can't assume anything about it. In fact, there may not even be an encoding; cat can be used with binary data. The only non-destructive approach then is to copy the binary data straight from the source to the destination with no decoding steps at all, and trust the user to ensure that the destination will be able to accommodate the source encoding. Because Python 3 presents stdin and stdout as text streams however, it makes them more difficult to use with binary data, which is why Armin sets up all that extra code to make sure his file objects are binary. Doesn't this issue also come up wherever bytes are being read ie in sockets, pipe file handles etc? Some sources may have well defined encodings and so allow use of unicode strings but surely not all. I imagine all of the problems associated with a broken encoding promise for stdin can also occur with sockets other sources ie error messages failing to be printable etc etc. Since bytes in Python 3 are not equivalent to the old str (Python 3 bytes != Python 2 str) using bytes everywhere has its own problems. -- Robin Becker -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Wed, May 14, 2014 at 9:30 AM, Robin Becker ro...@reportlab.com wrote: Doesn't this issue also come up wherever bytes are being read ie in sockets, pipe file handles etc? Some sources may have well defined encodings and so allow use of unicode strings but surely not all. I imagine all of the problems associated with a broken encoding promise for stdin can also occur with sockets other sources ie error messages failing to be printable etc etc. Since bytes in Python 3 are not equivalent to the old str (Python 3 bytes != Python 2 str) using bytes everywhere has its own problems. Sockets send and receive bytes, and pipes created by the subprocess module are opened in binary mode. Pipes inherited as stdin are still assumed to be unicode, though. -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 5/13/2014 8:53 PM, Ethan Furman wrote: On 05/13/2014 05:10 PM, Steven D'Aprano wrote: On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote: Because Python 3 presents stdin and stdout as text streams however, it makes them more difficult to use with binary data, which is why Armin sets up all that extra code to make sure his file objects are binary. What surprises me is how hard that is. Surely there's a simpler way to open stdin and stdout in binary mode? If not, there ought to be. Somebody already posted this: https://docs.python.org/3/library/sys.html#sys.stdin which talks about .detach(). I sent a message to Armin about this. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
Gene Heskett ghesk...@wdtv.com writes: On Tuesday 13 May 2014 01:39:06 Mark H Harris did opine QOTW(so far...) But its early yet, only Tuesday its just barely started... :) Says who? For some of us, Tuesday is approaching sunset. (It's always a good day to remind people that the rest of the world exists.) -- \ “Reality must take precedence over public relations, for nature | `\ cannot be fooled.” —Richard P. Feynman, _Rogers' Commission | _o__) Report into the Challenger Crash_, 1986-06 | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tuesday, May 13, 2014 11:09:06 AM UTC+5:30, Mark H. Harris wrote: On 5/13/14 12:10 AM, Rustom Mody wrote: I think the most helpful way forward is to accept two things: a. Unicode is a headache b. No-unicode is a non-option QOTW(so far...) I said that getting unicode right straight off is unrealistic. I should have added this: Armin makes a (sarcastic?) dig about the fact that python (3) goofs because its mismatched with the assumptions of unix. | UNIX is bytes, has been defined that way and will always be that way. To | Unicode on UNIX is only madness if you force it on everything. But that's not | how Unicode on UNIX works. UNIX does not have a distinction between unicode | and byte APIs. They are one and the same which makes them easy to deal with.] | Python 3 takes a very difference stance on Unicode than UNIX does. Python 3 | says: everything is Unicode ... This may be right... Or it may be the other way round as I claim at http://blog.languager.org/2014/04/unicode-and-unix-assumption.html At this point I dont believe that anyone is very clear what is the right way and and wrong way -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, May 13, 2014 at 4:03 PM, Ben Finney b...@benfinney.id.au wrote: (It's always a good day to remind people that the rest of the world exists.) Ironic that this should come up in a discussion on Unicode, given that Unicode's fundamental purpose is to welcome that whole rest of the world instead of yelling LALALALALA America is everything and pretending that ASCII, or Latin-1, or something, is all you need. ChrisA Currently enjoying Monday Night Flagging on Threshold RPG... at 4pm on Tuesday. -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 13/05/2014 11:39 AM, Chris Angelico wrote: On Tue, May 13, 2014 at 11:18 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: - have a bytes version of sys.argv (bargv? argvb?) and read the file names from that; argb? :) I tried and failed to come up with an argy bargy joke here so decided to go for a meta-reference instead. -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, May 13, 2014 at 4:25 PM, alex23 wuwe...@gmail.com wrote: On 13/05/2014 11:39 AM, Chris Angelico wrote: On Tue, May 13, 2014 at 11:18 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: - have a bytes version of sys.argv (bargv? argvb?) and read the file names from that; argb? :) I tried and failed to come up with an argy bargy joke here so decided to go for a meta-reference instead. I'm just waiting for someone to have need for arguments in both network byte order and host byte order. The latter, of course, would be argh. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 5/13/14 1:18 AM, Chris Angelico wrote: instead of yelling LALALALALA America is everything and pretending that ASCII, or Latin-1, or something, is all you need. ... it isn't? LALALALALALALALALA :)) -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
Am 13 May 2014 01:18:35 GMT schrieb Steven D'Aprano steve+comp.lang.pyt...@pearwood.info: - have a simple way to write bytes to stdout and stderr. there is the underlying binary buffer: https://docs.python.org/3/library/sys.html#sys.stdin greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 13.05.2014 03:18, Steven D'Aprano wrote: Armin Ronacher is an extremely experienced and knowledgeable Python developer, and a Python core developer. He might be wrong, but he's not *obviously* wrong. He's correct about file name encodings. Which can be fixed really easily wihtout messing everything up (sys.argv binary variant, open accepting binary filenames). But that he suggests that Go would be superior: Which uses an even simpler model than Python 2: everything is a byte string. The assumed encoding is UTF-8. End of the story. Is just a horrible idea. An obviously horrible idea, too. Having dealt with the UTF-8 problems on Python2 I can safely say that I never, never ever want to go back to that freaky hell. If I deal with strings, I want to be able to sanely manipulate them and I want to be sure that after manipulation they're still valid strings. Manipulating the bytes representation of unicode data just doesn't work. And I'm very very glad that some people felt the same way and implemented a sane, consistent way of dealing with Unicode in Python3. It's one of the reasons why I switched to Py3 very early and I love it. Cheers, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
Johannes Bauer dfnsonfsdu...@gmx.de: Having dealt with the UTF-8 problems on Python2 I can safely say that I never, never ever want to go back to that freaky hell. If I deal with strings, I want to be able to sanely manipulate them and I want to be sure that after manipulation they're still valid strings. Manipulating the bytes representation of unicode data just doesn't work. Based on my background (network and system programming), I'm a bit suspicious of strings, that is, text. For example, is the stuff that goes to syslog bytes or text? Does an XML file contain bytes or (encoded) text? The answers are not obvious to me. Modern computing is full of ASCII-esque binary communication standards and formats. Python 2's ambiguity allows me not to answer the tough philosophical questions. I'm not saying it's necessarily a good thing, but it has its benefits. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, May 13, 2014 at 6:25 PM, Marko Rauhamaa ma...@pacujo.net wrote: Johannes Bauer dfnsonfsdu...@gmx.de: Having dealt with the UTF-8 problems on Python2 I can safely say that I never, never ever want to go back to that freaky hell. If I deal with strings, I want to be able to sanely manipulate them and I want to be sure that after manipulation they're still valid strings. Manipulating the bytes representation of unicode data just doesn't work. Based on my background (network and system programming), I'm a bit suspicious of strings, that is, text. For example, is the stuff that goes to syslog bytes or text? Does an XML file contain bytes or (encoded) text? The answers are not obvious to me. Modern computing is full of ASCII-esque binary communication standards and formats. These are problems that Unicode can't solve. In theory, XML should contain text in a known encoding (defaulting to UTF-8). With syslog, it's problematic - I don't remember what it's meant to be, but I know there are issues. Same with other log files. Python 2's ambiguity allows me not to answer the tough philosophical questions. I'm not saying it's necessarily a good thing, but it has its benefits. It's not a good thing. It means that you have the convenience of pretending there's no problem, which means you don't notice trouble until something happens... and then, in all probability, your app is in production and you have no idea why stuff went wrong. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
Chris Angelico ros...@gmail.com: These are problems that Unicode can't solve. I actually think the problem has little to do with Unicode. Text is an abstract data type just like any class. If I have an object (say, a subprocess or a dictionary) in memory, I don't expect the object to have any existence independently of the Python virtual machine. I have the same feeling about Py3 strings: they only exist inside the Python virtual machine. An abstract object like a subprocess or dictionary justifies its existence through its behaviour (its quacking). Now, do strings quack or are they silent? I guess if you are writing a word processor they might quack to you. Otherwise, they are just an esoteric storage format. What I'm saying is that strings definitely have an important application in the human interface. However, I feel strings might be overused in the Py3 API. Case in point: are pathnames bytes objects or strings? The linux position is that they are bytes objects. Py3 supports both interpretations seemingly throughout: open(b/bin/ls)vsopen(/bin/ls) os.path.join(ba, bb)vsos.path.join(a, b) Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, May 13, 2014 at 7:06 PM, Marko Rauhamaa ma...@pacujo.net wrote: Chris Angelico ros...@gmail.com: These are problems that Unicode can't solve. I actually think the problem has little to do with Unicode. Text is an abstract data type just like any class. If I have an object (say, a subprocess or a dictionary) in memory, I don't expect the object to have any existence independently of the Python virtual machine. I have the same feeling about Py3 strings: they only exist inside the Python virtual machine. That's true; the only difference is that text is extremely prevalent. You can share a dict with another program, or store it in a file, or whatever, simply by agreeing on an encoding - for instance, JSON. As long as you and the other program know that this file is JSON encoded, you can write it and he can read it, and you'll get the right data at the far end. It's no different; there are encodings that are easy to handle and have limitations, and there are encodings that are elaborate and have lots of features (XML comes to mind, although technically you can't encode a dict in XML). Case in point: are pathnames bytes objects or strings? The linux position is that they are bytes objects. Py3 supports both interpretations seemingly throughout: open(b/bin/ls)vsopen(/bin/ls) os.path.join(ba, bb)vsos.path.join(a, b) That's a problem that comes from the underlying file systems. If every FS in the world worked with Unicode file names, it would be easy. (Most would encode them onto the platters in UTF-8 or maybe UTF-16; some might choose to use a PEP 393 or Pike string structure, with the size_shift being a file mode just like the 'directory' bit; others might use a limited encoding for legacy reasons, storing uppercased CP437 on the disk, and returning an error if the desired name didn't fit.) But since they don't, we have to cope with that. What happens if you're running on Linux, and you have a mounted drive from an OS/2 share, and inside that, you access an aliased drive that represents a Windows share, on which you've mounted a remote-backup share? A single path name could have components parsed by each of those systems, so what's its encoding? How do you handle that? There's no solution. (Well, okay. There is a solution: don't do something so stupidly convoluted. But there's no law against cackling admins making circular mounts. In fact, I just mounted my own home directory as a subdirectory under my home directory, via sshfs. I can now encrypt my own file reads and writes exactly as many times as I choose to. I also cackled.) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 13.05.2014 10:38, Chris Angelico wrote: Python 2's ambiguity allows me not to answer the tough philosophical questions. I'm not saying it's necessarily a good thing, but it has its benefits. It's not a good thing. It means that you have the convenience of pretending there's no problem, which means you don't notice trouble until something happens... and then, in all probability, your app is in production and you have no idea why stuff went wrong. Exactly. With Py2 strings you never know what encoding they are, if they already have been converted or something like that. And it's very well possible to mix already converted strings with other, not yet encoded strings. What a mess! All these issues are avoided by Py3. There is a very clear distinction between strings and string representation (data bytes), which is beautiful. Accidental mixing is not possible. And you have some thing *guaranteed* for the string type which aren't guaranteed for the bytes type (for example when doing string manipulation). Regards, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, 13 May 2014 12:06:50 +0300, Marko Rauhamaa wrote: Chris Angelico ros...@gmail.com: These are problems that Unicode can't solve. I actually think the problem has little to do with Unicode. Text is an abstract data type just like any class. If I have an object (say, a subprocess or a dictionary) in memory, I don't expect the object to have any existence independently of the Python virtual machine. I have the same feeling about Py3 strings: they only exist inside the Python virtual machine. And you would be correct. When you write them to a device (say, push them over a network, or write them to a file) they need to be serialized. If you're lucky, you have an API that takes a string and serializes it for you, and then all you have to deal with is: - am I happy with the default encoding? - if not, what encoding do I want? Otherwise you ought to have an API that requires bytes, not strings, and you have to perform your own serialization by encoding it. But abstractions leak, and this abstraction leaks because *right now* there isn't a single serialization for text strings. There are HUNDREDS, and sometimes you don't know which one is being used. [...] What I'm saying is that strings definitely have an important application in the human interface. However, I feel strings might be overused in the Py3 API. Case in point: are pathnames bytes objects or strings? Yes. On POSIX systems, file names are sequences of bytes, with a very few restrictions. On recent Windows file systems (NTFS I believe?), file names are Unicode strings encoded to UTF-16, but with a whole lot of other restrictions imposed by the OS. The linux position is that they are bytes objects. Py3 supports both interpretations seemingly throughout: open(b/bin/ls)vsopen(/bin/ls) os.path.join(ba, bb) vsos.path.join(a, b) Because it has to, otherwise there will be files that are unreachable on one platform or another. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 13.05.2014 10:25, Marko Rauhamaa wrote: Based on my background (network and system programming), I'm a bit suspicious of strings, that is, text. For example, is the stuff that goes to syslog bytes or text? Does an XML file contain bytes or (encoded) text? The answers are not obvious to me. Modern computing is full of ASCII-esque binary communication standards and formats. Traditional Unix programs (syslog for example) are notorious for being clear, ambiguous and/or ignorant of character encodings altogether. And this works, unfortunately, for the most time because many encodings share a common subset. If they wouldn't, the problems would be VERY apparent and people would be forced to handle the issues not so sloppily. Which is the route that Py3 chose. Don't be sloppy, make a great distinction between text (which handles naturally as strings) and its respective encoding. The only people who are angered by this now is people who always treated encodings sloppily and it just worked. Well, there's a good chance it has worked by pure chance so far. It's a good thing that Python does this now more strictly as it gives developers *guarantees* about what they can and cannot do with text datatypes without having to deal with encoding issues in many places. Just one place: The interface where text is read or written, just as it should be. Regards, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
Johannes Bauer dfnsonfsdu...@gmx.de: The only people who are angered by this now is people who always treated encodings sloppily and it just worked. Well, there's a good chance it has worked by pure chance so far. It's a good thing that Python does this now more strictly as it gives developers *guarantees* about what they can and cannot do with text datatypes without having to deal with encoding issues in many places. Just one place: The interface where text is read or written, just as it should be. I'm not angered by text. I'm just wondering if it has any practical use that is not misuse... For example, Py3 should not make any pretense that there is a default encoding for strings. Locale's are an abhorrent invention from the early 8-bit days. IOW, you should never input or output text without explicit serialization. I get the feeling that Py3 would like to present a world where strings are first-class I/O objects that can exist in files, in filenames, inside pipes. You say, text is read or written. I'm saying text is never read or written. It only exists as an abstraction (not even unicode) inside the virtual machine. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, 13 May 2014 01:18:35 +, Steven D'Aprano wrote: On Mon, 12 May 2014 17:47:48 +, alister wrote: On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote: This was *NOT* written by our resident unicode expert http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ Posted as I thought it would make a rather pleasant change from interminable threads about names vs values vs variables vs objects. Surely those example programs are not the pythonoic way to do things or am i missing something? Armin Ronacher is an extremely experienced and knowledgeable Python developer, and a Python core developer. He might be wrong, but he's not *obviously* wrong. I am only an amateur python coder which is why I asked if I am missing something I could not see any reason to be using the shutil module if all that the programm is doing is opening a file, reading it then printing it. is it python that causes the issue, the shutil module or just the OS not liking the data it is being sent? an explanation of why this approach is taken would be much appreciated. -- Revenge is a form of nostalgia. -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
In article mailman.9939.1399961928.18130.python-l...@python.org, Chris Angelico ros...@gmail.com wrote: On Tue, May 13, 2014 at 4:03 PM, Ben Finney b...@benfinney.id.au wrote: (It's always a good day to remind people that the rest of the world exists.) Ironic that this should come up in a discussion on Unicode, given that Unicode's fundamental purpose is to welcome that whole rest of the world instead of yelling LALALALALA America is everything and pretending that ASCII, or Latin-1, or something, is all you need. ASCII *is* all I need. The problem is, it's not all that other people need, and I need to interact with those other people. -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 13/05/2014 09:38, Chris Angelico wrote: It's not a good thing. It means that you have the convenience of pretending there's no problem, which means you don't notice trouble until something happens... and then, in all probability, your app is in production and you have no idea why stuff went wrong. Unless you're (un)lucky enough to be working on IIRC the 1/3 of major IT projects that deliver nothing :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, May 13, 2014 at 11:30 PM, Mark Lawrence breamore...@yahoo.co.uk wrote: On 13/05/2014 09:38, Chris Angelico wrote: It's not a good thing. It means that you have the convenience of pretending there's no problem, which means you don't notice trouble until something happens... and then, in all probability, your app is in production and you have no idea why stuff went wrong. Unless you're (un)lucky enough to be working on IIRC the 1/3 of major IT projects that deliver nothing :) Been there, done that. At least, most likely so... there is a chance, albeit slim, that the boss/owner will either discover someone who'll finish the project for him, or find the time to finish it himself. I gather he's looking at ripping all my code out and replacing it with PHP of his own design, which should be fun. On the plus side, that does mean he can get any idiot straight out of a uni course to do the work; much easier than finding someone who knows Python, Pike, bash, and C++. The White King told Alice that cynicism is a disease that can be cured... but it can also be inflicted, and a promising-looking N-year project that collapses because the boss starts getting stupid with code formatting rules and then ends up firing his last remaining competent employee is a pretty effective means of instilling cynicism. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, 13 May 2014 07:20:34 -0400, Roy Smith wrote: ASCII *is* all I need. You've never needed to copyright something? Copyright © Roy Smith 2014... I know some people use (c) instead, but that actually has no legal standing. (Not that any reasonable judge would invalidate a copyright based on a technicality like that, not these days.) Or price something in cents? I suppose the days of the 25¢ steak dinner are long gone, but you might need to sell something for 99¢ a pound... The problem is, it's not all that other people need, and I need to interact with those other people. True, true. -- Steven D'Aprano http://import-that.dreamwidth.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, May 13, 2014 at 11:39 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: You've never needed to copyright something? Copyright © Roy Smith 2014... I know some people use (c) instead, but that actually has no legal standing. (Not that any reasonable judge would invalidate a copyright based on a technicality like that, not these days.) Copyright Chris Angelico 2014. The full word copyright has legal standing. I tend to stick with that in my README files; staying ASCII makes it that bit safer for random text editors (*cough*Notepad*cough*) that might otherwise misinterpret it (only a bit, though [1]). Or price something in cents? I suppose the days of the 25¢ steak dinner are long gone, but you might need to sell something for 99¢ a pound... $0.99/lb? :) ChrisA [1] https://en.wikipedia.org/wiki/Bush_hid_the_facts -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 2014-05-13, Chris Angelico ros...@gmail.com wrote: On Tue, May 13, 2014 at 4:03 PM, Ben Finney b...@benfinney.id.au wrote: (It's always a good day to remind people that the rest of the world exists.) Ironic that this should come up in a discussion on Unicode, given that Unicode's fundamental purpose is to welcome that whole rest of the world instead of yelling LALALALALA America is everything and pretending that ASCII, or Latin-1, or something, is all you need. Well, strictly speaking, it ASCII or Latin-1 _is_ all I need. I will however admit to the existence of other people who might need something else... -- Grant Edwards grant.b.edwardsYow! How many retured at bricklayers from FLORIDA gmail.comare out purchasing PENCIL SHARPENERS right NOW?? -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 2014-05-13, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Tue, 13 May 2014 07:20:34 -0400, Roy Smith wrote: ASCII *is* all I need. You've never needed to copyright something? Copyright © Roy Smith 2014... Bah. You don't need the little copyright symbol at all. The statement without the symbol has the exact same legal weight. -- Grant Edwards grant.b.edwardsYow! World War Three can at be averted by adherence gmail.comto a strictly enforced dress code! -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, May 13, 2014 at 3:38 AM, Chris Angelico ros...@gmail.com wrote: Python 2's ambiguity allows me not to answer the tough philosophical questions. I'm not saying it's necessarily a good thing, but it has its benefits. It's not a good thing. It means that you have the convenience of pretending there's no problem, which means you don't notice trouble until something happens... and then, in all probability, your app is in production and you have no idea why stuff went wrong. BITD, when I still maintained and developed Musi-Cal (an early online concert calendar, long since gone), I faced a challenge when I first started encountering non-ASCII band names and cities. I resisted UTF-8. After all, if I printed a string containing an é, it came out looking like What kind of mess was that??? I tried to ignore it, or assume Latin-1 would cover all the bases (my first non-ASCII inputs tended to come from Western Europe). If nothing else, at least é was legible. Needless to say, those approaches didn't work well. After perhaps six months or a year, I broke down and started converting everything coming in or going out to UTF-8 at the boundaries of my system (making educated guesses at input encodings if necessary). My life got a whole lot easier after that. The distinction between bytes and text didn't really matter much, certainly not compared to the mess I had before where strings of unknown data leaked into my system and its database. Skip P.S. My apologies for the mess this message probably is. Amazing as it may seem, Gmail in Chrome does a crappy job editing anything other than plain text. Also, I'm surprised in this day and age that common tools like Gnome Terminal have little or no encoding support. I wound up having to pop up urxvt to get an encodings-flexible terminal emulator... -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tuesday, May 13, 2014 7:13:47 PM UTC+5:30, Chris Angelico wrote: On Tue, May 13, 2014 at 11:39 PM, Steven D'Aprano Or price something in cents? I suppose the days of the 25¢ steak dinner are long gone, but you might need to sell something for 99¢ a pound... $0.99/lb? :) Dollars Zeros Slashes Question marks Smileys... Just alphabets is enough I think... Come to think of it why have anything other than zeros and ones? -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Wed, May 14, 2014 at 12:30 AM, Rustom Mody rustompm...@gmail.com wrote: Come to think of it why have anything other than zeros and ones? Obligatory: http://xkcd.com/257/ ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, 13 May 2014 13:51:20 +, Grant Edwards wrote: On 2014-05-13, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Tue, 13 May 2014 07:20:34 -0400, Roy Smith wrote: ASCII *is* all I need. You've never needed to copyright something? Copyright © Roy Smith 2014... Bah. You don't need the little copyright symbol at all. The statement without the symbol has the exact same legal weight. You do not need any statements at all, copyright is automaticly assigned to anything you create (at least that is the case in UK Law) although proving the creation date my be difficult. -- Depends on how you define always. :-) -- Larry Wall in 199710211647.jaa17...@wall.org -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 2014-05-13, alister alister.nospam.w...@ntlworld.com wrote: On Tue, 13 May 2014 13:51:20 +, Grant Edwards wrote: On 2014-05-13, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Tue, 13 May 2014 07:20:34 -0400, Roy Smith wrote: ASCII *is* all I need. You've never needed to copyright something? Copyright © Roy Smith 2014... Bah. You don't need the little copyright symbol at all. The statement without the symbol has the exact same legal weight. You do not need any statements at all, copyright is automaticly assigned to anything you create (at least that is the case in UK Law) although proving the creation date my be difficult. Yep, it's the same in the US. -- Grant Edwards grant.b.edwardsYow! Hello. Just walk at along and try NOT to think gmail.comabout your INTESTINES being almost FORTY YARDS LONG!! -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, May 13, 2014 at 5:19 AM, alister alister.nospam.w...@ntlworld.com wrote: I am only an amateur python coder which is why I asked if I am missing something I could not see any reason to be using the shutil module if all that the programm is doing is opening a file, reading it then printing it. is it python that causes the issue, the shutil module or just the OS not liking the data it is being sent? an explanation of why this approach is taken would be much appreciated. No, that part is perfectly fine. This is exactly what the shutil module is meant for: providing shell-like operations. Although in this case the copyfileobj function is quite simple (have yourself a look at the source -- it just reads from one file and writes to the other in a loop), in general the Pythonic thing is to avoid reinventing the wheel. And since it's so simple, it shouldn't be hard to see that the use of the shutil module has nothing to do with the Unicode woes here. The crux of the issue is that a general-purpose command like cat typically can't know the encoding of its input and can't assume anything about it. In fact, there may not even be an encoding; cat can be used with binary data. The only non-destructive approach then is to copy the binary data straight from the source to the destination with no decoding steps at all, and trust the user to ensure that the destination will be able to accommodate the source encoding. Because Python 3 presents stdin and stdout as text streams however, it makes them more difficult to use with binary data, which is why Armin sets up all that extra code to make sure his file objects are binary. -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, 13 May 2014 14:42:51 +, alister wrote: On Tue, 13 May 2014 13:51:20 +, Grant Edwards wrote: On 2014-05-13, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Tue, 13 May 2014 07:20:34 -0400, Roy Smith wrote: ASCII *is* all I need. You've never needed to copyright something? Copyright © Roy Smith 2014... Bah. You don't need the little copyright symbol at all. The statement without the symbol has the exact same legal weight. You do not need any statements at all, copyright is automaticly assigned to anything you create (at least that is the case in UK Law) although proving the creation date my be difficult. (1) In my lifetime, that wasn't always the case. Up until the 1970s or thereabouts, you had to explicitly register anything you wanted copyrighted, a much more sensible system which weeded out the meaningless copyrights on economically worthless content. If we still had that system, orphan works would be a lesser problem. With the current system, all of us here are technically violating copyright every time we reply to an email and quote more than a small percentage of it. Not to mention all the mirror sites that violate copyright by mirroring our posts in their entirety without permission. (Author's moral rights not to be misquoted or plagiarised are a different kettle of fish separate from their ownership rights over the work. That should be automatic.) (2) You don't have to just prove copyright. You also have to *identify* who the work is copyrighted by, and it needs to be an identifiable legal person (actual person or corporation), not necessarily the author. In the absence of a statement otherwise, copyright is assumed to be held by the author, but that's not always the case -- it might be a work for hire, or copyright might have been transferred to another person or entity. Or the author is unidentifiable. Hence the orphan work problem: it's presumed to be copyrighted, but since nobody knows who owns the copyright, there's no way to get permission to copy that work. It might as well be lost, even when the original is sitting right there in front of you mouldering away. -- Steven D'Aprano http://import-that.dreamwidth.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Wed, May 14, 2014 at 9:53 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: With the current system, all of us here are technically violating copyright every time we reply to an email and quote more than a small percentage of it. Oh wow... so when someone quotes heaps of text without trimming, and adding blank lines, we can complain that it's a copyright violation - reproducing our work with unauthorized modifications and without permission... I never thought of it like that. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote: Because Python 3 presents stdin and stdout as text streams however, it makes them more difficult to use with binary data, which is why Armin sets up all that extra code to make sure his file objects are binary. What surprises me is how hard that is. Surely there's a simpler way to open stdin and stdout in binary mode? If not, there ought to be. -- Steven D'Aprano http://import-that.dreamwidth.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 05/13/2014 05:10 PM, Steven D'Aprano wrote: On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote: Because Python 3 presents stdin and stdout as text streams however, it makes them more difficult to use with binary data, which is why Armin sets up all that extra code to make sure his file objects are binary. What surprises me is how hard that is. Surely there's a simpler way to open stdin and stdout in binary mode? If not, there ought to be. Somebody already posted this: https://docs.python.org/3/library/sys.html#sys.stdin which talks about .detach(). -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote: This was *NOT* written by our resident unicode expert http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ Posted as I thought it would make a rather pleasant change from interminable threads about names vs values vs variables vs objects. Surely those example programs are not the pythonoic way to do things or am i missing something? if those code samples are anything to go by this guy makes JMF look sensible. -- The Heineken Uncertainty Principle: You can never be sure how many beers you had last night. -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Mon, May 12, 2014 at 11:47 AM, alister alister.nospam.w...@ntlworld.com wrote: On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote: This was *NOT* written by our resident unicode expert http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ Posted as I thought it would make a rather pleasant change from interminable threads about names vs values vs variables vs objects. Surely those example programs are not the pythonoic way to do things or am i missing something? The _is_binary_reader and _is_binary_writer functions look like they could be simplified by calling isinstance on the io object itself against io.TextIOBase, io.BufferedIOBase or io.RawIOBase, rather than doing those odd 0-length reads and writes. And then perhaps those exception-swallowing try-excepts wouldn't be necessary. But perhaps there's a non-obvious reason why it's written the way it is. And there appears to be a bug where everything *except* the filename '-' is treated as stdin, so the script probably hasn't been tested at all. if those code samples are anything to go by this guy makes JMF look sensible. This is an ad hominem. Just because his code sucks doesn't mean he's wrong about the state of Unicode and UNIX in Python 3. -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 2014-05-12 19:31, Ian Kelly wrote: On Mon, May 12, 2014 at 11:47 AM, alister alister.nospam.w...@ntlworld.com wrote: On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote: This was *NOT* written by our resident unicode expert http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ Posted as I thought it would make a rather pleasant change from interminable threads about names vs values vs variables vs objects. Surely those example programs are not the pythonoic way to do things or am i missing something? The _is_binary_reader and _is_binary_writer functions look like they could be simplified by calling isinstance on the io object itself against io.TextIOBase, io.BufferedIOBase or io.RawIOBase, rather than doing those odd 0-length reads and writes. And then perhaps those exception-swallowing try-excepts wouldn't be necessary. But perhaps there's a non-obvious reason why it's written the way it is. How about checking sys.stdin.mode and sys.stdout.mode? And there appears to be a bug where everything *except* the filename '-' is treated as stdin, so the script probably hasn't been tested at all. if those code samples are anything to go by this guy makes JMF look sensible. This is an ad hominem. Just because his code sucks doesn't mean he's wrong about the state of Unicode and UNIX in Python 3. -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Mon, May 12, 2014 at 1:42 PM, MRAB pyt...@mrabarnett.plus.com wrote: How about checking sys.stdin.mode and sys.stdout.mode? Seems to work, but I notice that the docs only define the mode attribute for the FileIO class, which sys.stdin and sys.stdout are not instances of. -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, May 13, 2014 at 4:31 AM, Ian Kelly ian.g.ke...@gmail.com wrote: Just because his code sucks doesn't mean he's wrong about the state of Unicode and UNIX in Python 3. Uhm... I think wrongness of code is generally fairly indicative of wrongness of thinking :) If I write a rant about how Python's list type sucks and it turns out my code is using it like a cons cell and never putting more than two elements into a list, then you would accurately conclude that I'm wrong about the state of data type support in Python. I don't have a problem with someone coming to the list here with misconceptions. That's what discussions are for. But rants like that, on blogs, I quickly get weary of reading. The tone is always Look what's so wrong, not inviting dialogue, and I can't be bothered digging into the details to compose a full response. Chances are the author's (a) not looking at what 3.4 and what's happened to improve things (and certainly not 3.5 and what's going to happen), and (b) not listening to responses anyway. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Mon, 12 May 2014 17:47:48 +, alister wrote: On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote: This was *NOT* written by our resident unicode expert http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ Posted as I thought it would make a rather pleasant change from interminable threads about names vs values vs variables vs objects. Surely those example programs are not the pythonoic way to do things or am i missing something? Feel free to show us your version of cat for Python then. Feel free to target any version you like. Don't forget to test it against files with names and content that: - aren't valid UTF-8; - are valid UTF-8, but not valid in the local encoding. if those code samples are anything to go by this guy makes JMF look sensible. Armin Ronacher is an extremely experienced and knowledgeable Python developer, and a Python core developer. He might be wrong, but he's not *obviously* wrong. Unicode is hard, not because Unicode is hard, but because of legacy problems. I can create a file on a machine that uses ISO-8859-7 for the file name, put JShift-JIS encoded text inside it, transfer it to a machine that uses Windows-1251 as the file system encoding, then SSH into that machine from a system using Big5, and try to make sense of it. If everybody used UTF-8 any time data touched a disk or network, we'd be laughing. It would all be so simple. Reading Armin's post, I think that all that is needed to simplify his Python 3 version is: - have a bytes version of sys.argv (bargv? argvb?) and read the file names from that; - have a simple way to write bytes to stdout and stderr. Most programs won't need either of those, but file system utilities will. -- Steven D'Aprano http://import-that.dreamwidth.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tue, May 13, 2014 at 11:18 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Reading Armin's post, I think that all that is needed to simplify his Python 3 version is: - have a bytes version of sys.argv (bargv? argvb?) and read the file names from that; argb? :) - have a simple way to write bytes to stdout and stderr. I'm not sure how that goes with I/O redirection, but sure. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 5/12/14 8:18 PM, Steven D'Aprano wrote: Unicode is hard, not because Unicode is hard, but because of legacy problems. Yes. To put a finer point on that, Unicode (which is only a specification constantly being improved upon) is harder to implement when it hasn't been on the design board from the ground up; Python in this case. Julia has Unicode support from the ground up, and it was easier for those guys to implement (in beta release) than for the Python crew when they undertook the Unicode work that had to be done for Python3.x (just an observation). Anytime there are legacy code issues, regression testing problems, and a host of domain issues that weren't thought through from the get-go there are going to be more problematic hurdles; not to mention bugs. Having said that, I still think Unicode is somewhat harder than you're admitting. marcus -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 13/05/2014 02:18, Steven D'Aprano wrote: On Mon, 12 May 2014 17:47:48 +, alister wrote: On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote: This was *NOT* written by our resident unicode expert http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ Posted as I thought it would make a rather pleasant change from interminable threads about names vs values vs variables vs objects. Surely those example programs are not the pythonoic way to do things or am i missing something? Feel free to show us your version of cat for Python then. Feel free to target any version you like. Don't forget to test it against files with names and content that: - aren't valid UTF-8; - are valid UTF-8, but not valid in the local encoding. if those code samples are anything to go by this guy makes JMF look sensible. Armin Ronacher is an extremely experienced and knowledgeable Python developer, and a Python core developer. He might be wrong, but he's not *obviously* wrong. Unicode is hard, not because Unicode is hard, but because of legacy problems. I can create a file on a machine that uses ISO-8859-7 for the file name, put JShift-JIS encoded text inside it, transfer it to a machine that uses Windows-1251 as the file system encoding, then SSH into that machine from a system using Big5, and try to make sense of it. If everybody used UTF-8 any time data touched a disk or network, we'd be laughing. It would all be so simple. Reading Armin's post, I think that all that is needed to simplify his Python 3 version is: - have a bytes version of sys.argv (bargv? argvb?) and read the file names from that; - have a simple way to write bytes to stdout and stderr. Most programs won't need either of those, but file system utilities will. I think http://bugs.python.org/issue8776 and http://bugs.python.org/issue8775 are relevant but both were placed in the small round filing cabinet. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tuesday, May 13, 2014 6:48:35 AM UTC+5:30, Steven D'Aprano wrote: On Mon, 12 May 2014 17:47:48 +, alister wrote: Surely those example programs are not the pythonoic way to do things or am i missing something? Feel free to show us your version of cat for Python then. Feel free to target any version you like. Don't forget to test it against files with names and content that: - aren't valid UTF-8; - are valid UTF-8, but not valid in the local encoding. Thanks for a non-defensive appraisal! if those code samples are anything to go by this guy makes JMF look sensible. Armin Ronacher is an extremely experienced and knowledgeable Python developer, and a Python core developer. He might be wrong, but he's not *obviously* wrong. Unicode is hard, not because Unicode is hard, but because of legacy problems. I can create a file on a machine that uses ISO-8859-7 for the file name, put JShift-JIS encoded text inside it, transfer it to a machine that uses Windows-1251 as the file system encoding, then SSH into that machine from a system using Big5, and try to make sense of it. If everybody used UTF-8 any time data touched a disk or network, we'd be laughing. It would all be so simple. I think the most helpful way forward is to accept two things: a. Unicode is a headache b. No-unicode is a non-option Reading Armin's post, I think that all that is needed to simplify his Python 3 version is: - have a bytes version of sys.argv (bargv? argvb?) and read the file names from that; - have a simple way to write bytes to stdout and stderr. Most programs won't need either of those, but file system utilities will. About the technical merits of Armin's post and your suggestions, Ive nothing to say, since I am an ignoramus on (the mechanics of) unicode [Consider me an eager, early, ignorant adopter :-) ] Its however good to note that unicode is rather unique in the history not just of IT/CS but of humanity, in the sense that no one (to the best of my knowledge) has ever tried to come up with an all-encompassing umbrella for all humanity's scripts/writing systems etc. So hiccups and mistakes are only to be expected. The absence of these would be much more surprising! -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On 5/13/14 12:10 AM, Rustom Mody wrote: I think the most helpful way forward is to accept two things: a. Unicode is a headache b. No-unicode is a non-option QOTW(so far...) -- https://mail.python.org/mailman/listinfo/python-list
Re: Everything you did not want to know about Unicode in Python 3
On Tuesday 13 May 2014 01:39:06 Mark H Harris did opine And Gene did reply: On 5/13/14 12:10 AM, Rustom Mody wrote: I think the most helpful way forward is to accept two things: a. Unicode is a headache b. No-unicode is a non-option QOTW(so far...) But its early yet, only Tuesday its just barely started... :) Cheers, Gene -- There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Genes Web page http://geneslinuxbox.net:6309/gene US V Castleman, SCOTUS, Mar 2014 is grounds for Impeaching SCOTUS -- https://mail.python.org/mailman/listinfo/python-list