Re: [Python-Dev] Python 3.0.1 (io-in-c)
[Scott David Daniels] Comparison of three cases (including performance rations): MB/S MB/SMB/S in C in py3k in 2.7 C/3k 2.7/3k ** Text append ** 10M write 1e6 units at a time261.00 218.000 1540.000 1.20 7.06 20K write one unit at a time 0.983 0.0811.33 12.08 16.34 400K write 20 units at a time 16.000 1.510 22.90 10.60 15.17 400K write 4096 units at a time 236.00 118.000 1244.000 2.00 10.54 Do you know why the text-appends fell off so much in the 1st and last cases? ** Text input ** 10M read whole contents at once 89.700 68.700 966.000 1.31 14.06 20K read whole contents at once 108.000 70.500 1196.000 1.53 16.96 ... 400K read one line at a time 71.700 3.690 207.00 19.43 56.10 ... 400K read whole contents at once 112.000 81.000 841.000 1.38 10.38 400K seek forward 1000 units at a time 87.400 67.300 589.000 1.30 8.75 400K seek forward one unit at a time0.090 0.0710.873 1.28 12.31 Looks like most of these still have substantial falloffs in performance. Is this part still a work in progress or is this as good as its going to get? ** Text overwrite ** 20K modify one unit at a time 0.296 0.0721.320 4.09 18.26 400K modify 20 units at a time 5.690 1.360 22.500 4.18 16.54 400K modify 4096 units at a time 151.000 88.300 509.000 1.71 5.76 Same question on this batch. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
Hello, Raymond Hettinger python at rcn.com writes: MB/S MB/SMB/S in C in py3k in 2.7 C/3k 2.7/3k ** Text append ** 10M write 1e6 units at a time261.00 218.000 1540.000 1.20 7.06 20K write one unit at a time 0.983 0.0811.33 12.08 16.34 400K write 20 units at a time 16.000 1.510 22.90 10.60 15.17 400K write 4096 units at a time 236.00 118.000 1244.000 2.00 10.54 Do you know why the text-appends fell off so much in the 1st and last cases? When writing large chunks of text (4096, 1e6), bookkeeping costs become marginal and encoding costs dominate. 2.x has no encoding costs, which explains why it's so much faster. A quick test tells me utf-8 encoding runs at 280 MB/s. on this dataset (the 400KB text file). You see that there is not much left to optimize on large writes. ** Text input ** 10M read whole contents at once 89.700 68.700 966.000 1.31 14.06 20K read whole contents at once 108.000 70.500 1196.000 1.53 16.96 ... 400K read one line at a time 71.700 3.690 207.00 19.43 56.10 ... 400K read whole contents at once 112.000 81.000 841.000 1.38 10.38 400K seek forward 1000 units at a time 87.400 67.300 589.000 1.30 8.75 400K seek forward one unit at a time0.090 0.0710.873 1.28 12.31 Looks like most of these still have substantial falloffs in performance. Is this part still a work in progress or is this as good as its going to get? There is nothing left obvious to optimize in the read() department. Decoding and newline translation costs dominate. Decoding has already been optimized for the most popular encodings in py3k: http://mail.python.org/pipermail/python-checkins/2009-January/077024.html Newline translation follows a fast path depending on various heuristics. I also took particular care of the read one line at a time scenario because it's the most likely idiom when reading a text file. I think there is hardly anything left to optimize on this one. Your eyes are welcome, though. Note that the benchmark is run with the following default settings for text I/O: utf-8 encoding, universal newlines enabled, text containing only \n newlines. You can play with settings here: http://svn.python.org/view/sandbox/trunk/iobench/ Text seek() and tell(), on the other hand, is known to be slow, and it could perhaps be improved. It is assumed, however, that they won't be used a lot for text files. ** Text overwrite ** 20K modify one unit at a time 0.296 0.0721.320 4.09 18.26 400K modify 20 units at a time 5.690 1.360 22.500 4.18 16.54 400K modify 4096 units at a time 151.000 88.300 509.000 1.71 5.76 Same question on this batch. There seems to be some additional overhead in this case. Perhaps it could be improved, I'll have to take a look... But I doubt overwriting chunks of text is a common scenario. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
2009/1/28 Antoine Pitrou solip...@pitrou.net: When writing large chunks of text (4096, 1e6), bookkeeping costs become marginal and encoding costs dominate. 2.x has no encoding costs, which explains why it's so much faster. Interesting. However, it's still slower in terms of perception. In 2.x, I regularly do the equivalent of f = open(filename, r) ... read strings from f ... Yes, I know this is byte I/O in reality, but for everything I do (Latin-1 on input and output, and for most practical purposes ASCII-only) it simply isn't relevant to me. If Python 3.x makes this substantially slower (working in a naive mode where I ignore encoding issues), claiming it's encoding costs doesn't make any difference - in a practical sense, I don't get any benefits and yet I pay the cost. (You can say my approach is wrong, but so what? I'll just say that 2.x is faster for me, and not migrate. Ultimately, this is about marketing 3.x...) It would be helpful to limit this cost as much as possible - maybe that's simply ensuring that the default encoding for open is (in the majority of cases) a highly-optimised one whose costs *don't* dominate in the way you describe (although if you're using UTF-8, I'd guess that would be the usual default on Linux, so it looks like there's some work needed there). Hmm, I just checked and on Windows, it appears that sys.getdefaultencoding() is UTF-8. That seems odd - I would have thought the majority of Windows systems were NOT set to use UTF-8 by default... Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
Le Wednesday 28 January 2009 11:55:16 Antoine Pitrou, vous avez écrit : 2.x has no encoding costs, which explains why it's so much faster. Why not testing io.open() or codecs.open() which create unicode strings? -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
Paul Moore p.f.moore at gmail.com writes: It would be helpful to limit this cost as much as possible - maybe that's simply ensuring that the default encoding for open is (in the majority of cases) a highly-optimised one whose costs *don't* dominate in the way you describe As I pointed out, utf-8, utf-16 and latin1 decoders have already been optimized in py3k. For *pure ASCII* input, utf-8 decoding is blazingly fast (1GB/s here). The dataset for iobench isn't pure ASCII though, and that's why it's not as fast. People are invited to test their own workloads with the io-c branch and report performance figures (and possible bugs). There are so many possibilities that the benchmark figures given by a generic tool can only be indicative. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
Victor Stinner victor.stinner at haypocalc.com writes: Le Wednesday 28 January 2009 11:55:16 Antoine Pitrou, vous avez écrit : 2.x has no encoding costs, which explains why it's so much faster. Why not testing io.open() or codecs.open() which create unicode strings? The goal is to test the idiomatic way of opening text files (the one obvious way to do it, if you want). There is no doubt that io.open() and codecs.open() in 2.x are much slower than the io-c branch. However, nobody is expecting very good performance from io.open() and codecs.open() in 2.x either. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
2009/1/28 Antoine Pitrou solip...@pitrou.net: Paul Moore p.f.moore at gmail.com writes: It would be helpful to limit this cost as much as possible - maybe that's simply ensuring that the default encoding for open is (in the majority of cases) a highly-optimised one whose costs *don't* dominate in the way you describe As I pointed out, utf-8, utf-16 and latin1 decoders have already been optimized in py3k. For *pure ASCII* input, utf-8 decoding is blazingly fast (1GB/s here). The dataset for iobench isn't pure ASCII though, and that's why it's not as fast. Ah, thanks. Although you said your data was 95% ASCII, and you're getting decode speeds of 250MB/s. That's 75% slowdown for 5% of the data! Surely that's not right??? People are invited to test their own workloads with the io-c branch and report performance figures (and possible bugs). There are so many possibilities that the benchmark figures given by a generic tool can only be indicative. At the moment, I don't have the time to download and build the branch, and in any case as I only have Visual Studio Express, I don't get the PGO optimisations, making any tests I do highly suspect. Paul. PS Can anyone comment on why Python defaults to utf-8 on Windows? That seems like a highly suspect default... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
Le Wednesday 28 January 2009 12:41:07 Antoine Pitrou, vous avez écrit : Why not testing io.open() or codecs.open() which create unicode strings? There is no doubt that io.open() and codecs.open() in 2.x are much slower than the io-c branch. However, nobody is expecting very good performance from io.open() and codecs.open() in 2.x either. I use codecs.open() in my programs and so I'm interested by the benchmark on this function ;-) But if I understand correctly, Python (3.1 ?) will be faster (or much faster) to read/write files in unicode, and that's a great news ;-) -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1
On Wed, Jan 28, 2009 at 4:32 AM, Steve Holden st...@holdenweb.com wrote: I think that both 3.0 and 2.6 were rushed releases. 2.6 showed it in the inclusion (later recognizable as somewhat ill-advised so late in the day) of multiprocessing; 3.0 shows it in the very fact that this discussion has become necessary. What about some kine of mechanism to triage 3rd party modules? Something like: module gains popularity - the core team decides it's worthy - the module is included in the library in some kind of contrib/ext package (like the future mechanism) and for one major release stays in that package (so developers don't have to rush fixing _all_ the bugs they can while making a major release) - after (at least) one major release the module moves up one level and it's considered stable and rock solid. Meanwhile the documentation must say that the 3rd party contributed module is not considered production ready, though usable, until the release current + 1 I don't know if it feasible, if it's insane or what, it's just an idea I had. -- Lawrence, http://oluyede.org - http://twitter.com/lawrenceoluyede It is difficult to get a man to understand something when his salary depends on not understanding it - Upton Sinclair ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
Paul Moore p.f.moore at gmail.com writes: As I pointed out, utf-8, utf-16 and latin1 decoders have already been optimized in py3k. For *pure ASCII* input, utf-8 decoding is blazingly fast (1GB/s here). The dataset for iobench isn't pure ASCII though, and that's why it's not as fast. Ah, thanks. Although you said your data was 95% ASCII, and you're getting decode speeds of 250MB/s. That's 75% slowdown for 5% of the data! Surely that's not right??? If you look at how utf-8 decoding is implemented (in unicodeobject.c), it's quite obvious why it is so :-) There is a (very) fast path for chunks of pure ASCII data, and (fast but not blazingly fast) fallback for non ASCII data. Please don't think of it as a slowdown... It's still much faster than 2.x, which manages 130MB/s on the same data. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
2009/1/28 Antoine Pitrou solip...@pitrou.net: If you look at how utf-8 decoding is implemented (in unicodeobject.c), it's quite obvious why it is so :-) There is a (very) fast path for chunks of pure ASCII data, and (fast but not blazingly fast) fallback for non ASCII data. Thanks for the explanation. Please don't think of it as a slowdown... It's still much faster than 2.x, which manages 130MB/s on the same data. Don't get me wrong - I'm hugely grateful for this work. And personally, I don't expect that I/O speed is ever likely to be a real bottleneck in the type of program I write. But I'm concerned that (much as with the whole Python 3.0 is incompatible, and it will be hard to port to meme) people will pick up on raw benchmark figures - no matter how much they aren't comparing like with like - and start making it sound like Python 3.0 I/O is slower than 2.x - which is a great disservice to the good work that's been done. I do think it's worth taking care over the default encoding, though. Quite apart from performance, getting correct behaviour is important. I can't speak for Unix, but on Windows, the following behaviour feels like a bug to me: echo a£b a1 python Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. print open(a1).read() a£b ^Z \Apps\Python30\python.exe Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. print(open(a1).read()) Traceback (most recent call last): File stdin, line 1, in module File D:\Apps\Python30\lib\io.py, line 1491, in write b = encoder.encode(s) File D:\Apps\Python30\lib\encodings\cp850.py, line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0153' in position 1: character maps to undefined ^Z chcp Active code page: 850 Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
Le mercredi 28 janvier 2009 à 16:54 +, Paul Moore a écrit : I do think it's worth taking care over the default encoding, though. Quite apart from performance, getting correct behaviour is important. I can't speak for Unix, but on Windows, the following behaviour feels like a bug to me: [...] Please open a bug :) cheers Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1
On 2009-01-27 22:19, Raymond Hettinger wrote: From: Martin v. Löwis mar...@v.loewis.de Releasing 3.1 6 months after 3.0 sounds reasonable; I don't think it should be released earlier (else 3.0 looks fairly ridiculous). I think it should be released earlier and completely supplant 3.0 before more third-party developers spend time migrating code. We needed 3.0 to get released so we could get the feedback necessary to shake it out. Now, it is time for it to fade into history and take advantage of the lessons learned. The principles for the 2.x series don't really apply here. In 2.x, there was always a useful, stable, clean release already fielded and there were tons of third-party apps that needed a slow rate of change. In contrast, 3.0 has a near zero installed user base (at least in terms of being used in production). It has very few migrated apps. It is not particularly clean and some of the work for it was incomplete when it was released. My preference is to drop 3.0 entirely (no incompatable bugfix release) and in early February release 3.1 as the real 3.x that migrators ought to aim for and that won't have incompatable bugfix releases. Then at PyCon, we can have a real bug day and fix-up any chips in the paint. If 3.1 goes out right away, then it doesn't matter if 3.0 looks ridiculous. All eyes go to the latest release. Better to get this done before more people download 3.0 to kick the tires. Why don't we just mark 3.0.x as experimental branch and keep updating/ fixing things that were not sorted out for the 3.0.0 release ?! I think that's a fair approach, given that the only way to get field testing for new open-source software is to release early and often. A 3.1 release should then be the first stable release of the 3.x series and mark the start of the usual deprecation mechanisms we have in the 2.x series. Needless to say, that rushing 3.1 out now would only cause yet another experimental release... major releases do take time to stabilize. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 28 2009) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
PS Can anyone comment on why Python defaults to utf-8 on Windows? Don't panic. It doesn't, and you are misinterpreting what you are seeing. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
Paul Moore wrote: Hmm, I just checked and on Windows, it appears that sys.getdefaultencoding() is UTF-8. That seems odd - I would have thought the majority of Windows systems were NOT set to use UTF-8 by default... In Python 3, sys.getdefaultencoding() is utf-8 on all platforms, just as it was ascii in 2.x, on all platforms. The default encoding isn't used for I/O; check f.encoding to find out what encoding is used to read the file you are reading. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
print(open(a1).read()) Traceback (most recent call last): File stdin, line 1, in module File D:\Apps\Python30\lib\io.py, line 1491, in write b = encoder.encode(s) File D:\Apps\Python30\lib\encodings\cp850.py, line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0153' in position 1: character maps to undefined Looks right to me. Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
2009/1/28 Martin v. Löwis mar...@v.loewis.de: Paul Moore wrote: Hmm, I just checked and on Windows, it appears that sys.getdefaultencoding() is UTF-8. That seems odd - I would have thought the majority of Windows systems were NOT set to use UTF-8 by default... In Python 3, sys.getdefaultencoding() is utf-8 on all platforms, just as it was ascii in 2.x, on all platforms. The default encoding isn't used for I/O; check f.encoding to find out what encoding is used to read the file you are reading. Thanks for the explanation. It might be clearer to document this a little more explicitly in the docs for open() (on the basis that people using open() are the most likely to be naive about encodings). I'll see if I can come up with an appropriate doc patch. Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
2009/1/28 Martin v. Löwis mar...@v.loewis.de: print(open(a1).read()) Traceback (most recent call last): File stdin, line 1, in module File D:\Apps\Python30\lib\io.py, line 1491, in write b = encoder.encode(s) File D:\Apps\Python30\lib\encodings\cp850.py, line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0153' in position 1: character maps to undefined Looks right to me. I don't see why. I wrote the file from the console (cp850), read it in Python using the default encoding (which I would expect to match the console encoding), wrote it to sys.stdout (which I would expect to use the console encoding). How did the character end up not being encodable, when I've only used one encoding throughout? (And if my assumptions about the encodings used are wrong at some point, that's what I'm suggesting is the error). Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
On Wed, Jan 28, 2009 at 10:29 AM, Martin v. Löwis mar...@v.loewis.de wrote: Notice that the determination of the specific encoding used is fairly elaborate: - if IO is to a terminal, Python tries to determine the encoding of the terminal. This is mostly relevant for Windows (which uses, by default, the OEM code page in the terminal). - if IO is to a file, Python tries to guess the common encoding for the system. On Unix, it queries the locale, and falls back to ascii if no locale is set. On Windows, it uses the ANSI code page. On OSX, it uses the system encoding. - if IO is binary, (clearly) no encoding is used. Network IO is always binary. - for file names, yet different algorithms apply. On Windows, it uses the Unicode API, so no need for an encoding. On Unix, it (again) uses the locale encoding. On OSX, it uses UTF-8 (just to be clear: this applies to the first argument of open(), not to the resulting file object) This a very helpful explanation. Is it in the docs somewhere, or if it isn't, could it be? Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
Paul Moore wrote: 2009/1/28 Martin v. Löwis mar...@v.loewis.de: print(open(a1).read()) Traceback (most recent call last): File stdin, line 1, in module File D:\Apps\Python30\lib\io.py, line 1491, in write b = encoder.encode(s) File D:\Apps\Python30\lib\encodings\cp850.py, line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0153' in position 1: character maps to undefined Looks right to me. I don't see why. I wrote the file from the console (cp850), read it in Python using the default encoding (which I would expect to match the console encoding), wrote it to sys.stdout (which I would expect to use the console encoding). How did the character end up not being encodable, when I've only used one encoding throughout? (And if my assumptions about the encodings used are wrong at some point, that's what I'm suggesting is the error). Well, first try to understand what the error *is*: py unicodedata.name('\u0153') 'LATIN SMALL LIGATURE OE' py unicodedata.name('£') 'POUND SIGN' py ascii('£') '\\xa3' py ascii('£'.encode('cp850').decode('cp1252')) '\\u0153' So when Python reads the file, it uses cp1252. This is sensible - just that the console uses cp850 doesn't change the fact that the common encoding of files on your system is cp1252. It is an unfortunate fact of Windows that the console window uses a different encoding from the rest of the system (namely, the console uses the OEM code page, and everything else uses the ANSI code page). Furthermore, U+0153 does not exist in cp850 (i.e. the terminal doesn't support œ), hence the exception. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
This a very helpful explanation. Is it in the docs somewhere, or if it isn't, could it be? I actually don't know. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
2009/1/28 Martin v. Löwis mar...@v.loewis.de: Well, first try to understand what the error *is*: py unicodedata.name('\u0153') 'LATIN SMALL LIGATURE OE' py unicodedata.name('£') 'POUND SIGN' py ascii('£') '\\xa3' py ascii('£'.encode('cp850').decode('cp1252')) '\\u0153' So when Python reads the file, it uses cp1252. This is sensible - just that the console uses cp850 doesn't change the fact that the common encoding of files on your system is cp1252. It is an unfortunate fact of Windows that the console window uses a different encoding from the rest of the system (namely, the console uses the OEM code page, and everything else uses the ANSI code page). Ah, I see. That is entirely obvious. The key bit of information is that the default io encoding is cp1252, not cp850. I know that in theory, I see the consequences often enough (:-)), but it isn't instinctive for me. And the simple default encoding is system dependent comment is not very helpful in terms of warning me that there could be an issue. I do think that more wording around encoding defaults would be useful - as I said, I'll think about how best it could be made into a doc patch. I suspect the best approach would be to have a section (maybe in the docs for the codecs module) explaining all the details, and then a cross-reference to that from the various places (open, io) where default encodings are mentioned. Paul. Furthermore, U+0153 does not exist in cp850 (i.e. the terminal doesn't support œ), hence the exception. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
Steven Bethard wrote: On Wed, Jan 28, 2009 at 10:29 AM, Martin v. Löwis mar...@v.loewis.de wrote: Notice that the determination of the specific encoding used is fairly elaborate: - if IO is to a terminal, Python tries to determine the encoding of the terminal. This is mostly relevant for Windows (which uses, by default, the OEM code page in the terminal). - if IO is to a file, Python tries to guess the common encoding for the system. On Unix, it queries the locale, and falls back to ascii if no locale is set. On Windows, it uses the ANSI code page. On OSX, it uses the system encoding. - if IO is binary, (clearly) no encoding is used. Network IO is always binary. - for file names, yet different algorithms apply. On Windows, it uses the Unicode API, so no need for an encoding. On Unix, it (again) uses the locale encoding. On OSX, it uses UTF-8 (just to be clear: this applies to the first argument of open(), not to the resulting file object) This a very helpful explanation. Is it in the docs somewhere, or if it isn't, could it be? Here is the current entry on encodings in the Lib ref, built-in types, file objects. file.encoding The encoding that this file uses. When strings are written to a file, they will be converted to byte strings using this encoding. In addition, when the file is connected to a terminal, the attribute gives the encoding that the terminal is likely to use (that information might be incorrect if the user has misconfigured the terminal). The attribute is read-only and may not be present on all file-like objects. It may also be None, in which case the file uses the system default encoding for converting strings. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
On Wed, 28 Jan 2009 18:52:41 +, Paul Moore p.f.mo...@gmail.com wrote: 2009/1/28 Martin v. Löwis mar...@v.loewis.de: Well, first try to understand what the error *is*: py unicodedata.name('\u0153') 'LATIN SMALL LIGATURE OE' py unicodedata.name('£') 'POUND SIGN' py ascii('£') '\\xa3' py ascii('£'.encode('cp850').decode('cp1252')) '\\u0153' So when Python reads the file, it uses cp1252. This is sensible - just that the console uses cp850 doesn't change the fact that the common encoding of files on your system is cp1252. It is an unfortunate fact of Windows that the console window uses a different encoding from the rest of the system (namely, the console uses the OEM code page, and everything else uses the ANSI code page). Ah, I see. That is entirely obvious. The key bit of information is that the default io encoding is cp1252, not cp850. I know that in theory, I see the consequences often enough (:-)), but it isn't instinctive for me. And the simple default encoding is system dependent comment is not very helpful in terms of warning me that there could be an issue. It probably didn't help that the exception raised told you that the error was in the charmap codec. This should have said cp850 instead. The fact that cp850 is implemented in terms of charmap isn't very interesting. The fact that while encoding some text using cp850 is. Jean-Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1
Michael Foord wrote: M.-A. Lemburg wrote: Why don't we just mark 3.0.x as experimental branch and keep updating/ fixing things that were not sorted out for the 3.0.0 release ?! I think that's a fair approach, given that the only way to get field testing for new open-source software is to release early and often. A 3.1 release should then be the first stable release of the 3.x series and mark the start of the usual deprecation mechanisms we have in the 2.x series. Needless to say, that rushing 3.1 out now would only cause yet another experimental release... major releases do take time to stabilize. +1 I don't think we do users any favours by being cautious in removing / fixing things in the 3.0 releases. I have two main reactions to 3.0. 1. It is great for my purpose -- coding algorithms. The cleaner object and text models are a mental relief for me. So it was a service to me to release it. I look forward to it becoming standard Python and have made my small contribution by helping clean up the 3.0 version of the docs. 2. It is something of a trial run that it should be fixed as soon as possible. I seem to remember sometning from Shakespear(?) If it twer done, tis best it twer done quickly. Guido said something over a year ago to the effect that he did not expect 3.0 to be used as a production release, so I do not think it should to treated as one. Label it developmental and people will not try to keep in use for years and years in the way that, say, 2.4 still is. tjr ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
On Wed, Jan 28, 2009 at 11:52 AM, Paul Moore p.f.mo...@gmail.com wrote: Ah, I see. That is entirely obvious. The key bit of information is that the default io encoding is cp1252, not cp850. I know that in theory, I see the consequences often enough (:-)), but it isn't instinctive for me. And the simple default encoding is system dependent comment is not very helpful in terms of warning me that there could be an issue. I do think that more wording around encoding defaults would be useful - as I said, I'll think about how best it could be made into a doc patch. I suspect the best approach would be to have a section (maybe in the docs for the codecs module) explaining all the details, and then a cross-reference to that from the various places (open, io) where default encodings are mentioned. It'd also help if the file repr gave the encoding: f = open('/dev/null') f io.TextIOWrapper object at 0x7ff4468d8a90 import sys sys.stdout io.TextIOWrapper object at 0x7ff4476126d0 Of course I can check .encoding manually, but it needs to be more intuitive. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
On Wed, Jan 28, 2009 at 1:42 PM, Adam Olsen rha...@gmail.com wrote: It'd also help if the file repr gave the encoding: +1 -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
[Adam Olsen] It'd also help if the file repr gave the encoding: +1 from me too. That will be a big help. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1
Terry Reedy wrote: Michael Foord wrote: M.-A. Lemburg wrote: Why don't we just mark 3.0.x as experimental branch and keep updating/ fixing things that were not sorted out for the 3.0.0 release ?! I think that's a fair approach, given that the only way to get field testing for new open-source software is to release early and often. A 3.1 release should then be the first stable release of the 3.x series and mark the start of the usual deprecation mechanisms we have in the 2.x series. Needless to say, that rushing 3.1 out now would only cause yet another experimental release... major releases do take time to stabilize. +1 I don't think we do users any favours by being cautious in removing / fixing things in the 3.0 releases. I have two main reactions to 3.0. 1. It is great for my purpose -- coding algorithms. The cleaner object and text models are a mental relief for me. So it was a service to me to release it. I look forward to it becoming standard Python and have made my small contribution by helping clean up the 3.0 version of the docs. 2. It is something of a trial run that it should be fixed as soon as possible. I seem to remember sometning from Shakespear(?) If it twer done, tis best it twer done quickly. Guido said something over a year ago to the effect that he did not expect 3.0 to be used as a production release, so I do not think it should to treated as one. Label it developmental and people will not try to keep in use for years and years in the way that, say, 2.4 still is. It might also be a good idea to take the download link off the front page of python.org: until that happens newbies are going to keep coming along and downloading it because it's the newest. regards Steve -- Steve Holden+1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1
Steve Holden wrote: 2.6 showed it in the inclusion (later recognizable as somewhat ill-advised so late in the day) of multiprocessing; Given the longstanding fork() bugs that were fixed as a result of that inclusion, I think that ill-advised is too strong... could it have done with a little more time to bed down multiprocessing in particular? Possibly. Was it worth holding up the whole release just for that? I don't think so - we'd already fixed up the problems that the test suite and python-dev were likely to find, so the cost/benefit ratio on a delay would have been pretty poor. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1
It might also be a good idea to take the download link off the front page of python.org: until that happens newbies are going to keep coming along and downloading it because it's the newest. It was (and probably still is) Guido's position that 3.0 *is* the version that newbies should be using. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1 (io-in-c)
2009/1/28 Raymond Hettinger pyt...@rcn.com: [Adam Olsen] It'd also help if the file repr gave the encoding: +1 from me too. That will be a big help. Definitely. People *are* going to get confused by encoding errors - let's give them all the help we can. Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1
Martin v. Löwis writes: It might also be a good idea to take the download link off the front page of python.org: until that happens newbies are going to keep coming along and downloading it because it's the newest. It was (and probably still is) Guido's position that 3.0 *is* the version that newbies should be using. Indeed. See Terry Reedy's post. Somebody who is looking for a platform for a production application is not going to download something because it's the newest. Sure, those advocating other platforms will carp about Python 3.0, but hey, where is Perl 6? The amazing thing about a dancing bear is *not* how well it dances. Let's not get too worried about the PR aspects; just fixing the bugs as we go along will fix that to the extent that people are not totally prejudiced anyway. I think there is definitely something to the notion that the 3.x vs. 3.0.y distinction should signal something, and I personally like MAL's suggestion that 3.0.x should be marked some sort of beta in perpetuity, or at least until 3.1 is ready to ship as stable and production-ready. (That's AIUI, MAL's intent may be somewhat different.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1
Stephen J. Turnbull wrote: Martin v. Löwis writes: It might also be a good idea to take the download link off the front page of python.org: until that happens newbies are going to keep coming along and downloading it because it's the newest. By that logic, I would suggest removing 2.6 ;-) See below. It was (and probably still is) Guido's position that 3.0 *is* the version that newbies should be using. Indeed. See Terry Reedy's post. When people ask on c.l.p, I recommend either 3.0 for the relative cleanliness or 2.5 (until now, at least) for the 3rd-party add-on availability (that will gradually improve for both 2.6 and more slowly, for 3.x). I expect that some newbies would find 2.6 a somewhat confusing mix of old and new. tjr ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1
Terry Reedy wrote: Stephen J. Turnbull wrote: Martin v. Löwis writes: It might also be a good idea to take the download link off the front page of python.org: until that happens newbies are going to keep coming along and downloading it because it's the newest. By that logic, I would suggest removing 2.6 ;-) See below. It was (and probably still is) Guido's position that 3.0 *is* the version that newbies should be using. Indeed. See Terry Reedy's post. When people ask on c.l.p, I recommend either 3.0 for the relative cleanliness or 2.5 (until now, at least) for the 3rd-party add-on availability (that will gradually improve for both 2.6 and more slowly, for 3.x). I expect that some newbies would find 2.6 a somewhat confusing mix of old and new. Fair point. At least we both agree that the current site doesn't best serve the punters. regards Steve -- Steve Holden+1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com