Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Martin v. Löwis
>> How does the requirement that it be implemented as a codec miss the >> point? > > If we want it to be the default, it must be able to fallback on the current > locale-based algorithm if no BOM is found. I don't think it would be easy for > a > codec to do that. Yes - however, Victor currently

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Michael Foord
On 09/01/2010 22:14, Lennart Regebro wrote: On Sat, Jan 9, 2010 at 21:28, Antoine Pitrou wrote: If we want it to be the default, it must be able to fallback on the current locale-based algorithm if no BOM is found. I don't think it would be easy for a codec to do that. Right. It seem

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Lennart Regebro
On Sat, Jan 9, 2010 at 21:28, Antoine Pitrou wrote: > If we want it to be the default, it must be able to fallback on the current > locale-based algorithm if no BOM is found. I don't think it would be easy for > a > codec to do that. Right. It seems like encoding=None is the right way to go ther

Re: [Python-Dev] Unladen cPickle speedups in 2.7 & 3.1

2010-01-09 Thread skip
> "Antoine" == Antoine Pitrou writes: Antoine> pobox.com> writes: >> >> If a patch to merge this to 2.7 is already under >> consideration I won't look at it, Antoine> Why won't you look at it? :) I meant I wouldn't look at developing one. Skip

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Antoine Pitrou
Martin v. Löwis v.loewis.de> writes: > > > Sorry but this is missing the point. The point here is to improve the open() > > function. I'm sure people who know about encodings are able to install the > > chardet library or even whip up their own BOM detection routine... > > How does the requireme

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Martin v. Löwis
Antoine Pitrou wrote: > Walter Dörwald livinglogic.de> writes: >> On the surface this looks like there's an encoding named "BOM", but >> looking at your patch I found that the check is still done in >> TextIOWrapper. IMHO the best approach would to the implement a *real* >> codec named "BOM" (o

Re: [Python-Dev] Unladen cPickle speedups in 2.7 & 3.1

2010-01-09 Thread Philip Jenvey
On Jan 9, 2010, at 12:00 PM, s...@pobox.com wrote: > How much of the Unladen Swallow cPickle speedups have been incorporated into > 2.7 & 3.1? I'm working on trying to develop patches for 2.4 and 2.6 (the > two versions I currently care about at work - we will skip 2.5 entirely). > It appears so

Re: [Python-Dev] Unladen cPickle speedups in 2.7 & 3.1

2010-01-09 Thread Antoine Pitrou
pobox.com> writes: > > If a patch to merge this to 2.7 is already under > consideration I won't look at it, Why won't you look at it? :) Actually, if these patches are to be merged someone should certainly look at them, and do the (possibly) remaining work. http://bugs.python.org/issue5683 http

Re: [Python-Dev] Unladen cPickle speedups in 2.7 & 3.1

2010-01-09 Thread skip
Philip> They've documented their upstream patches here: Philip> http://code.google.com/p/unladen-swallow/wiki/UpstreamPatches Thanks. That will help immensely. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mai

[Python-Dev] Unladen cPickle speedups in 2.7 & 3.1

2010-01-09 Thread skip
How much of the Unladen Swallow cPickle speedups have been incorporated into 2.7 & 3.1? I'm working on trying to develop patches for 2.4 and 2.6 (the two versions I currently care about at work - we will skip 2.5 entirely). It appears some of their speedups may have already been merged to trunk, b

Re: [Python-Dev] [RELEASED] Python 2.7 alpha 2

2010-01-09 Thread Benjamin Peterson
2010/1/9 Karen Tracey : > On Sat, Jan 9, 2010 at 12:29 PM, Benjamin Peterson > wrote: >> >> On behalf of the Python development team, I'm gleeful to announce the >> second >> alpha release of Python 2.7. >> > > Well yay.  Django's test suite (1242 tests) runs with just one failure on > the 2.7 alp

Re: [Python-Dev] [RELEASED] Python 2.7 alpha 2

2010-01-09 Thread Karen Tracey
On Sat, Jan 9, 2010 at 12:29 PM, Benjamin Peterson wrote: > On behalf of the Python development team, I'm gleeful to announce the > second > alpha release of Python 2.7. > > Well yay. Django's test suite (1242 tests) runs with just one failure on the 2.7 alpha 2 level, and that looks to be likely

[Python-Dev] [RELEASED] Python 2.7 alpha 2

2010-01-09 Thread Benjamin Peterson
On behalf of the Python development team, I'm gleeful to announce the second alpha release of Python 2.7. Python 2.7 is scheduled to be the last major version in the 2.x series. It includes many features that were first released in Python 3.1. The faster io module, the new nested with statement

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Antoine Pitrou
Walter Dörwald livinglogic.de> writes: > > On the surface this looks like there's an encoding named "BOM", but > looking at your patch I found that the check is still done in > TextIOWrapper. IMHO the best approach would to the implement a *real* > codec named "BOM" (or "sniff"). This doesn't

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Hi, Le samedi 09 janvier 2010 13:45:58, vous avez écrit : > > Note: I implemented the BOM check in TextIOWrapper; so it's already > > usable for any file-like object. > > Yes, but the implementation is limited to just BOM checking > and thus only supports UTF-8-SIG, UTF-16 and UTF-32. Sure, but

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Victor Stinner
Le samedi 09 janvier 2010 12:18:33, Walter Dörwald a écrit : > > Good idea, I choosed open(filename, encoding="BOM"). > > On the surface this looks like there's an encoding named "BOM", but > looking at your patch I found that the check is still done in > TextIOWrapper. IMHO the best approach woul

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Le samedi 09 janvier 2010 02:12:28, MRAB a écrit : > What about listing the possible encodings? It would try each in turn > until it found one where the BOM matched or had no BOM: > > my_file = open(filename, 'r', encoding='UTF-8-sig|UTF-16|UTF-8') > > or is that taking it too far? Yes, you'

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Le samedi 09 janvier 2010 01:47:38, vous avez écrit : > One concern I have with this implementation encoding="BOM" is that if > there is no BOM it assumes UTF-8. If no BOM is found, it fallback to the current heuristic: os.device_encoding() or system local. > (...) Hence, it might be that someon

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread M.-A. Lemburg
Victor Stinner wrote: > (2) Check for a BOM while reading or detect it before? > > Everybody agree that checking BOM is an interesting option and should not be > limited to open(). > > Marc-Andre proposed a codecs.guess_file_encoding() function accepting a file > name or a binary file-like obje

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Le samedi 09 janvier 2010 02:23:07, Martin v. Löwis a écrit : > While I would support combining BOM detection in the case where a file > is opened for reading and no encoding is specified, I see two problems: > a) if a seek operations is performed before having looked at the BOM, >no determinat

Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-09 Thread Walter Dörwald
Victor Stinner wrote: Le vendredi 08 janvier 2010 10:10:23, Martin v. Löwis a écrit : Builtin open() function is unable to open an UTF-16/32 file starting with a BOM if the encoding is not specified (raise an unicode error). For an UTF-8 file starting with a BOM, read()/readline() returns also t

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Walter Dörwald
On 09.01.10 01:47, Glenn Linderman wrote: > On approximately 1/8/2010 3:59 PM, came the following characters from > the keyboard of Victor Stinner: >> Hi, >> >> Thanks for all the answers! I will try to sum up all ideas here. > > One concern I have with this implementation encoding="BOM" is that