Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Lennart Regebro
It seems to me that when opening a file, the following is the only flow that makes sense for the typical opening of a file flow: if encoding is not None: use encoding elif file has BOM: use BOM else: use system default And hence a encoding='BOM' isn't needed there. Although I'm trying to

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Nick Coghlan
MRAB wrote: > Maybe there should also be a way of determining what encoding it decided > it was, so that you can then write a new file in that same encoding. I thought of that question as well - the f.encoding attribute on the opened file should be sufficient. Cheers, Nick. -- Nick Coghlan |

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Glenn Linderman
On approximately 1/8/2010 5:12 PM, came the following characters from the keyboard of MRAB: Glenn Linderman wrote: On approximately 1/8/2010 3:59 PM, came the following characters from the keyboard of Victor Stinner: Hi, Thanks for all the answers! I will try to sum up all ideas here. One co

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Martin v. Löwis
>>> Antoine would like to check BOM by default, because both options >>> (system locale vs checking for BOM) is the same thing. >>> >> To be clear, I am not saying it is the same thing. What I think is >> that it would be a mistake to use a mildly unreliable heuristic by >> default (the locale +

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread MRAB
Glenn Linderman wrote: On approximately 1/8/2010 3:59 PM, came the following characters from the keyboard of Victor Stinner: Hi, Thanks for all the answers! I will try to sum up all ideas here. One concern I have with this implementation encoding="BOM" is that if there is no BOM it assumes U

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Glenn Linderman
On approximately 1/8/2010 3:59 PM, came the following characters from the keyboard of Victor Stinner: Hi, Thanks for all the answers! I will try to sum up all ideas here. One concern I have with this implementation encoding="BOM" is that if there is no BOM it assumes UTF-8. That is probably

Re: [Python-Dev] --enabled-shared broken on freebsd5?

2010-01-08 Thread Floris Bruynooghe
On Fri, Jan 08, 2010 at 10:11:51AM +0100, "Martin v. Löwis" wrote: > Nicholas Bastin wrote: > > I think this problem probably needs to move over to distutils-sig, as > > it doesn't seem to be specific to the way that Python itself uses > > distutils. > > I'm fairly skeptical that anybody on distut

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Michael Foord
On 09/01/2010 00:09, Antoine Pitrou wrote: Hello Victor, Victor Stinner haypocalc.com> writes: (1) Change default open() behaviour or make it optional? [...] Antoine would like to check BOM by default, because both options (system locale vs checking for BOM) is the same thing

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Antoine Pitrou
Hello Victor, Victor Stinner haypocalc.com> writes: > > (1) Change default open() behaviour or make it optional? > [...] > > Antoine would like to check BOM by default, because both options (system > locale vs checking for BOM) is the same thing. To be clear, I am not saying it is the same t

[Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Victor Stinner
Hi, Thanks for all the answers! I will try to sum up all ideas here. (1) Change default open() behaviour or make it optional? Guido would like to add an option and keep open() unchanged. He wrote that checking for BOM and using system locale are too much different to be the same option (encod

Re: [Python-Dev] relation between Python.asdl and Tools/compiler/ast.txt

2010-01-08 Thread Martin v. Löwis
> I see. So if people want to analyze python code they have to use > other tools (like rope?) ? or use reflection ? Correct. One such tool might be the true Python compiler, along with the _ast module. Regards, Martin ___ Python-Dev mailing list Python-

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Georg Brandl
Am 08.01.2010 22:14, schrieb Tres Seaver: >> FWIW, I'm personally in favor of using the UTF-8 signature. If people >> consider them crazy talk, that may be because UTF-8 can't possibly have >> a byte order - hence I call it a signature, not the BOM. As a signature, >> I don't consider it crazy at

Re: [Python-Dev] relation between Python.asdl and Tools/compiler/ast.txt

2010-01-08 Thread Yoann Padioleau
On Jan 7, 2010, at 1:16 PM, Martin v. Löwis wrote: >>> astgen.py is not used to process asdl files; ast.txt lives right >>> next to astgen.py. Instead, the asdl file is processed by >>> Parser/asdl_c.py. >> >> Yes, I know that. That's why I asked about the relation between >> ast.txt and Python.

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Victor Stinner
Le vendredi 08 janvier 2010 22:40:47, Eric Smith a écrit : > >> Shouldn't this encoding guessing be a separate function that you call > >> on either a file or a seekable stream ? > >> > >> After all, detecting encodings is just as useful to have for non-file > >> streams. > > > > Other stream sourc

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Eric Smith wrote: >>> Shouldn't this encoding guessing be a separate function that you call >>> on either a file or a seekable stream ? >>> >>> After all, detecting encodings is just as useful to have for non-file >>> streams. >> Other stream sources t

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread M.-A. Lemburg
Tres Seaver wrote: > M.-A. Lemburg wrote: > >> Shouldn't this encoding guessing be a separate function that you call >> on either a file or a seekable stream ? > >> After all, detecting encodings is just as useful to have for non-file >> streams. > > Other stream sources typically have out-of-ba

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread James Y Knight
On Jan 8, 2010, at 4:14 PM, Tres Seaver wrote: I understood this proposal as a general processing guideline, not something the io library should do (but, say, a text editor). FWIW, I'm personally in favor of using the UTF-8 signature. If people consider them crazy talk, that may be because UTF-8

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Eric Smith
>> Shouldn't this encoding guessing be a separate function that you call >> on either a file or a seekable stream ? >> >> After all, detecting encodings is just as useful to have for non-file >> streams. > > Other stream sources typically have out-of-band ways to signal the > encoding: only when r

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Martin v. Löwis wrote: >>> It *is* crazy, but unfortunately rather common. Wikipedia has a good >>> description of the issues: >>> . Basically, some >>> Windows text APIs will emit a UTF-8 "BOM" in

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 M.-A. Lemburg wrote: > Shouldn't this encoding guessing be a separate function that you call > on either a file or a seekable stream ? > > After all, detecting encodings is just as useful to have for non-file > streams. Other stream sources typicall

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Guido van Rossum wrote: > On Thu, Jan 7, 2010 at 10:12 PM, Tres Seaver wrote: >> The BOM should not be seekeable if the file is opened with the proposed >> "guess encoding from BOM" mode: it isn't properly part of the stream at >> all in that case. >

[Python-Dev] Summary of Python tracker Issues

2010-01-08 Thread Python tracker
ACTIVITY SUMMARY (01/01/10 - 01/08/10) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2544 open (+27) / 16937 closed (+15) / 19481 total (+42) Open issues with patches: 1017 Average

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread MRAB
Victor Stinner wrote: Le vendredi 08 janvier 2010 05:21:04, Guido van Rossum a écrit : (...) (And yes, I know this happens. Doesn't mean we need to auto-guess by default; there are lots of issues e.g. what should happen after seeking to offset 0?) I wrote a new version of my patch (version 3):

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Antoine Pitrou
Guido van Rossum python.org> writes: > > On Thu, Jan 7, 2010 at 10:12 PM, Tres Seaver palladion.com> wrote: > > The BOM should not be seekeable if the file is opened with the proposed > > "guess encoding from BOM" mode: it isn't properly part of the stream at > > all in that case. > > This fee

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread M.-A. Lemburg
Guido van Rossum wrote: > On Fri, Jan 8, 2010 at 6:34 AM, Antoine Pitrou wrote: >> Victor Stinner haypocalc.com> writes: >>> >>> I wrote a new version of my patch (version 3): >>> >>> * don't change the default behaviour: use open(filename, encoding="BOM") to >>> check the BOM is there is any >>

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Antoine Pitrou
Guido van Rossum python.org> writes: > > > Well, I think if we implement this the default behaviour *should* be > > changed. > > It looks a bit senseless to have two different "auto-choose" options, one with > > encoding=None and one with encoding="BOM". > > Well there *are* two different auto

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Guido van Rossum
On Thu, Jan 7, 2010 at 10:12 PM, Tres Seaver wrote: > The BOM should not be seekeable if the file is opened with the proposed > "guess encoding from BOM" mode:  it isn't properly part of the stream at > all in that case. This feels about right to me. There are still questions though: immediately

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Guido van Rossum
On Fri, Jan 8, 2010 at 1:05 AM, "Martin v. Löwis" wrote: >>> It *is* crazy, but unfortunately rather common.  Wikipedia has a good >>> description of the issues: >>> .  Basically, some >>> Windows text APIs will emit a UTF-8 "BOM" in order to ide

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Guido van Rossum
On Fri, Jan 8, 2010 at 6:34 AM, Antoine Pitrou wrote: > Victor Stinner haypocalc.com> writes: >> >> I wrote a new version of my patch (version 3): >> >>  * don't change the default behaviour: use open(filename, encoding="BOM") to >> check the BOM is there is any > > Well, I think if we implement

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Guido van Rossum
On Thu, Jan 7, 2010 at 11:55 PM, Glyph Lefkowitz wrote: > I'm saying that the BOM itself isn't enough to detect that the file is > actually UTF-8. And I'm saying that it is, with as much certainty as we can ever guess the encoding of a file. > If (for whatever reason: explicitly specified, gues

Re: [Python-Dev] GIL required for _all_ Python calls?

2010-01-08 Thread Guido van Rossum
On Fri, Jan 8, 2010 at 6:27 AM, Antoine Pitrou wrote: > Le Thu, 07 Jan 2010 22:11:36 +0100, Martin v. Löwis a écrit : >> >> Even if we do use the new API, and correctly, it still might be >> confusing if the contents of the buffer changes underneath. > > Well, no more confusing than when you compu

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Antoine Pitrou
Victor Stinner haypocalc.com> writes: > > I wrote a new version of my patch (version 3): > > * don't change the default behaviour: use open(filename, encoding="BOM") to > check the BOM is there is any Well, I think if we implement this the default behaviour *should* be changed. It looks a bit

Re: [Python-Dev] GIL required for _all_ Python calls?

2010-01-08 Thread Antoine Pitrou
Le Thu, 07 Jan 2010 22:11:36 +0100, Martin v. Löwis a écrit : > > Even if we do use the new API, and correctly, it still might be > confusing if the contents of the buffer changes underneath. Well, no more confusing than when you compute a SHA1 hash or zlib- compress the buffer, is it? Regards

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Victor Stinner
Le vendredi 08 janvier 2010 10:10:23, Martin v. Löwis a écrit : > > Builtin open() function is unable to open an UTF-16/32 file starting with > > a BOM if the encoding is not specified (raise an unicode error). For an > > UTF-8 file starting with a BOM, read()/readline() returns also the BOM > > wh

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Victor Stinner
Le vendredi 08 janvier 2010 01:52:20, Guido van Rossum a écrit : > And for the other two, perhaps it would make more sense to have > a separate encoding-guessing function that takes a binary stream and > returns a text stream wrapping it with the proper encoding? I choosed to modify open()+TextIOW

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Victor Stinner
Le vendredi 08 janvier 2010 05:21:04, Guido van Rossum a écrit : (...) > (And yes, I know this happens. Doesn't mean we need to auto-guess by > default; there are lots of issues e.g. what should happen after > seeking to offset 0?) I wrote a new version of my patch (version 3): * don't change th

Re: [Python-Dev] --enabled-shared broken on freebsd5?

2010-01-08 Thread Martin v. Löwis
Nicholas Bastin wrote: > I think this problem probably needs to move over to distutils-sig, as > it doesn't seem to be specific to the way that Python itself uses > distutils. I'm fairly skeptical that anybody on distutils SIG is interested in details of the Python build process... Regards, Marti

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Martin v. Löwis
> Builtin open() function is unable to open an UTF-16/32 file starting with a > BOM if the encoding is not specified (raise an unicode error). For an UTF-8 > file starting with a BOM, read()/readline() returns also the BOM whereas the > BOM should be "ignored". It depends. If you use the utf-8-

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Victor Stinner
Le vendredi 08 janvier 2010 03:23:08, MRAB a écrit : > Guido van Rossum wrote: > > I'm a little hesitant about this. First of all, UTF-8 + BOM is crazy > > talk. And for the other two, perhaps it would make more sense to have > > a separate encoding-guessing function that takes a binary stream and

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Martin v. Löwis
> But it should do something sane when reading such files. I can't > really see any harm in throwing it away, especially since use of > ZERO-WIDTH NO-BREAK SPACE as a joining character has been deprecated > IIRC. And indeed it does, when you open the file in the utf-8-sig encoding. Regards, Mart

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-08 Thread Martin v. Löwis
>> It *is* crazy, but unfortunately rather common. Wikipedia has a good >> description of the issues: >> . Basically, some >> Windows text APIs will emit a UTF-8 "BOM" in order to identify the file as >> being UTF-8, so it's become a convention