Re: A few questiosn about encoding
-- UTF-8, Unicode (consortium): 1 to 4 *Unicode Transformation Unit* UTF-8, ISO 10646: 1 to 6 *Unicode Transformation Unit* (still actual, unless tealy freshly modified) jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Changing filenames from Greeklish = Greek (subprocess complain)
- A coding scheme works with three sets. A *unique* set of CHARACTERS, a *unique* set of CODE POINTS and a *unique* set of ENCODED CODE POINTS, unicode or not. The relation between the set of characters and the set of the code points is a *human* table, created with a sheet of paper and a pencil, a deliberate choice of characters with integers as labels. The relation between the set of the code points and the set of encoded code points is a mathematical operation. In the case of an 8bits coding scheme, like iso-XXX, this operation is a no-op, the relation is an identity. Shortly: set of code points == set of encoded code points. In the case of unicode, The Unicode consortium endorses three such mathematical operations called UTF-8, UTF-16 and UTF-32 where UTF means Unicode Transformation Format, a confusing wording meaning at the same time, the process and the result of the process. This Unicode Transformation does not produce bytes, it produces words/chunks/tokens of *bits* with lengths 8, 16, 32, called Unicode Transformation Units (from this the names UTF-8, -16, -32). At this level, only a structure has been defined (there is no computing). Very important, an healthy coding scheme works conceptually only with this *unique set of encoded code points, not with bytes, characters or code points. The last step, the machine implementation: it is up to the processor, the compiler, the language to implement all these Unicode Transformation Units with of course their related specifities: char, w_char, int, long, endianess, rune (Go language), ... Not too over-simplified or not too over-complicated and enough to understand one, if not THE, design mistake of the flexible string representation. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Changing filenames from Greeklish = Greek (subprocess complain)
On 5 juin, 19:43, Νικόλαος Κούρας nikos.gr...@gmail.com wrote: Ôç ÔåôÜñôç, 5 Éïõíßïõ 2013 8:56:36 ð.ì. UTC+3, ï ÷ñÞóôçò Steven D'Aprano Ýãñáøå: Somehow, I don't know how because I didn't see it happen, you have one or more files in that directory where the file name as bytes is invalid when decoded as UTF-8, but your system is set to use UTF-8. So to fix this you need to rename the file using some tool that doesn't care quite so much about encodings. Use the bash command line to rename each file in turn until the problem goes away. But renaming ia hsell access like 'mv 'Euxi tou Ihsou.mp3' 'Åõ÷Þ ôïõ Éçóïõ.mp3' leade to that unknown encoding of this bytestream '\305\365\367\336\ \364\357\365\ \311\347\363\357\375.mp3' But please tell me Steven what linux tool you think it can encode the weird filename to proper 'Åõ÷Þ ôïõ Éçóïõ.mp3' utf-8? or we cna write a script as i suggested to decode back the bytestream using all sorts of available decode charsets boiling down to the original greek letters. --- see http://bugs.python.org/issue13643, msg msg149949 - (view) Author: Antoine Pitrou (pitrou) Quote: So, you're complaining about something which works, kind of: $ touch héhé $ LANG=C python3 -c import os; print(os.listdir()) ['h\udcc3\udca9h\udcc3\udca9'] This makes robustly working with non-ascii filenames on different platforms needlessly annoying, given no modern nix should have problems just using UTF-8 in these cases. So why don't these supposedly modern systems at least set the appropriate environment variables for Python to infer the proper character encoding? (since these modern systems don't have a well-defined encoding...) Answer: because they are not modern at all, they are antiquated, inadapted and obsolete pieces of software designed and written by clueless Anglo-American people. Please report bugs against these systems. The culprit is not Python, it's the Unix crap and the utterly clueless attitude of its maintainers (filesystems are just bytes, yeah, whatever...). jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: PyWart: The problem with print
On 2 juin, 20:09, Rick Johnson rantingrickjohn...@gmail.com wrote: I never purposely inject ANY superfluous cycles in my code except in the case of testing or development. To me it's about professionalism. Let's consider a thought exercise shall we? The flexible string representation is the perfect example of this lack of professionalism. Wrong by design, a non understanding of the mathematical logic, of the coding of characters, of Unicode and of the usage of characters (everything is tight together). How is is possible to arrive to such a situation ? The answer if far beyond my understanding (although I have my opinion on the subject). jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: python b'...' notation
On 31 mai, 00:19, alcyon st...@terrafirma.us wrote: On Wednesday, May 29, 2013 3:19:42 PM UTC-7, Cameron Simpson wrote: On 29May2013 13:14, Ian Kelly ian.g.ke...@gmail.com wrote: | On Wed, May 29, 2013 at 12:33 PM, alcyon st...@terrafirma.us wrote: | This notation displays hex values except when they are 'printable', in which case it displays that printable character. How do I get it to force hex for all bytes? Thanks, Steve | | Is this what you want? | | ''.join('%02x' % x for x in b'hello world') | '68656c6c6f20776f726c64' Not to forget binascii.hexlify. -- Cameron Simpson c...@zip.com.au Every particle continues in its state of rest or uniform motion in a straight line except insofar as it doesn't. - Sir Arther Eddington Thanks for the binascii.hexlify tip. I was able to make it work but I did have to write a function to get it exactly the string I wanted. I wanted, for example, b'\n\x00' to display as 0x0A 0x00 or b'!\xff(\xc0' to display as 0x21 0xFF 0x28 0xC0. a = b'!\xff(\xc0\n\x00' z = ['0x{:02X}'.format(c) for c in b] z ['0x21', '0xFF', '0x28', '0xC0', '0x0A', '0x00'] s = ' '.join(z) s '0x21 0xFF 0x28 0xC0 0x0A 0x00' jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: How to get an integer from a sequence of bytes
On 30 mai, 20:42, Ian Kelly ian.g.ke...@gmail.com wrote: On Thu, May 30, 2013 at 12:26 PM, Mok-Kong Shen mok-kong.s...@t-online.de wrote: Am 27.05.2013 17:30, schrieb Ned Batchelder: On 5/27/2013 10:45 AM, Mok-Kong Shen wrote: From an int one can use to_bytes to get its individual bytes, but how can one reconstruct the int from the sequence of bytes? The next thing in the docs after int.to_bytes is int.from_bytes: http://docs.python.org/3.3/library/stdtypes.html#int.from_bytes I am sorry to have overlooked that. But one thing I yet wonder is why there is no direct possibilty of converting a byte to an int in [0,255], i.e. with a constrct int(b), where b is a byte. The bytes object can be viewed as a sequence of ints. So if b is a bytes object of non-zero length, then b[0] is an int in range(0, 256). Well, Python now speaks only integer, the rest is commodity and there is a good coherency. bin(255) '0b' oct(255) '0o377' 255 255 hex(255) '0xff' int('0b', 2) 255 int('0o377', 8) 255 int('255') 255 int('0xff', 16) 255 0b 255 0o377 255 255 255 0xff 255 type(0b) class 'int' type(0o377) class 'int' type(255) class 'int' type(0xff) class 'int' jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Harmonic distortion of a input signal
On 20 mai, 19:56, Christian Gollwitzer aurio...@gmx.de wrote: Oops, I thought we were posting to comp.dsp. Nevertheless, I think numpy.fft does mixed-radix (can't check it now) Am 20.05.13 19:50, schrieb Christian Gollwitzer: Am 20.05.13 19:23, schrieb jmfauth: Non sense. Dito. The discrete fft algorithm is valid only if the number of data points you transform does correspond to a power of 2 (2**n). Where did you get this? The DFT is defined for any integer point number the same way. Just if you want to get it fast, you need to worry about the length. For powers of two, there is the classic Cooley-Tukey. But there do exist FFT algorithms for any other length. For example, there is the Winograd transform for a set of small numbers, there is mixed-radix to reduce any length which can be factored, and there is finally Bluestein which works for any size, even for a prime. All of the aforementioned algorithms are O(log n) and are implemented in typical FFT packages. All of them should result (up to rounding differences) in the same thing as the naive DFT sum. Therefore, today Keywords to the problem: apodization, zero filling, convolution product, ... Not for a periodic signal of integer length. eg.http://en.wikipedia.org/wiki/Convolution How long do you read this group? Christian -- Forget what I wrote. I'm understanding what I wanted to say, it is badly formulated. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Harmonic distortion of a input signal
Non sense. The discrete fft algorithm is valid only if the number of data points you transform does correspond to a power of 2 (2**n). Keywords to the problem: apodization, zero filling, convolution product, ... eg. http://en.wikipedia.org/wiki/Convolution jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Diacretical incensitive search
The handling of diacriticals is especially a nice case study. One can use it to toy with some specific features of Unicode, normalisation, decomposition, ... ... and also to show how Unicode can be badly implemented. First and quick example that came to my mind (Py325 and Py332): timeit.repeat(ud.normalize('NFKC', ud.normalize('NFKD', 'ᶑḗḖḕḹ')), import unicodedata as ud) [2.929404406789672, 2.923327801150208, 2.923659417064755] timeit.repeat(ud.normalize('NFKC', ud.normalize('NFKD', 'ᶑḗḖḕḹ')), import unicodedata as ud) [3.8437222586746884, 3.829490737203514, 3.819266963414293] jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: PDF generator decision
On 14 mai, 17:05, Christian Jurk co...@commx.ws wrote: Hi folks, This questions may be asked several times already, but the development of relevant software continues day-for-day. For some time now I've been using xhtml2pdf [1] to generate PDF documents from HTML templates (which are rendered through my Django-based web application. This have been working for some time now but I'm constantly adding new templates and they are not looking like I want it (sometimes bold text is bold, sometimes not, layout issues, etc). I'd like to use something else than xhtml2pdf. So far I'd like to ask which is the (probably) best way to create PDFs in Python (3)? It is important for me that I am able to specify not only background graphics, paragaphs, tables and so on but also to specify page headers/footers. The reason is that I have a bunch of documents to be generated (including Invoice templates, Quotes - stuff like that). Any advice is welcome. Thanks. [1]https://github.com/chrisglass/xhtml2pdf - 1) Use Python to collect your data (db, pictures, texts, ...) and/or to create the material (text, graphics, ...) that will be the contents (source) of your your pdf's. 2) Put this source in .tex file (a plain text file). 3) Let it compile with a TeX engine. - I can not figure out something more versatile and basically simple (writing a text file). - Do not forget you are the only one who knows the content and the layout of your document(s). jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode humor
On 8 mai, 15:19, Roy Smith r...@panix.com wrote: Apropos to any of the myriad unicode threads that have been going on recently: http://xkcd.com/1209/ -- This reflects a lack of understanding of Unicode. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Why do Perl programmers make more money than Python programmers
On 6 mai, 09:49, Fábio Santos fabiosantos...@gmail.com wrote: On 6 May 2013 08:34, Chris Angelico ros...@gmail.com wrote: Well you see, it was 70 bytes back in the Python 2 days (I'll defer to Steven for data points earlier than that), but with Python 3, there were two versions: one was 140 bytes representing 70 characters, the other 280 bytes representing 70 characters. In Python 3.3, they were merged, and a trivial amount of overhead added, so now it's 80 bytes representing 70 characters. But you have an absolute guarantee that it's correct now. Of course, the entire code can be represented as a single int now. You used to have to use a long. ChrisA -- Thanks. You have made my day. I may rise the average pay of a Python programmer in Portugal. I have asked for a raise back in December, and was told that it wouldn't happen before this year. I have done well. I think I deserve better pay than a supermarket employee now. I am sure that my efforts were appreciated and I will be rewarded. I am being sarcastic. The above paragraph wouldn't be true if I programmed in perl, c++ or lisp. - 1) The memory gain for many of us (usually non ascii users) just become irrelevant. sys.getsizeof('maçã') 41 sys.getsizeof('abcd') 29 2) More critical, Py 3.3, just becomes non unicode compliant, (eg European languages or ascii typographers !) import timeit timeit.timeit('abcd'*1000 + 'a') 2.186670111428325 timeit.timeit('abcd'*1000 + '€') 2.9951699820528432 timeit.timeit('abcd'*1000 + 'œ') 3.0036780444886233 timeit.timeit('abcd'*1000 + 'ẞ') 3.004992278824048 timeit.timeit('maçã'*1000 + 'œ') 3.231025618708202 timeit.timeit('maçã'*1000 + '€') 3.215894398100758 timeit.timeit('maçã'*1000 + 'œ') 3.224407974255655 timeit.timeit('maçã'*1000 + '’') 3.2206342273566406 timeit.timeit('abcd'*1000 + '’') 2.991440344906 3) Python is pround to cover the whole unicode range, unfortunately it breaks the BMP range. Small GvR exemple (ascii) from the the bug list, but with non ascii characters. # Py 3.2, all chars timeit.repeat(a = 'hundred'; 'x' in a) [0.09087790617297742, 0.07456871885972305, 0.07449940353376405] timeit.repeat(a = 'maçãé€ẞ'; 'x' in a) [0.10088136800095526, 0.07488497003487282, 0.07497594640028638] # Py 3.3 ascii and non ascii chars timeit.repeat(a = 'hundred'; 'x' in a) [0.11426985953005442, 0.10040049292649655, 0.09920834808588097] timeit.repeat(a = 'maçãé€ẞ'; 'é' in a) [0.2345595188256766, 0.21637172864154763, 0.2179096624382737] There are plenty of good reasons to use Python. There are also plenty of good reasons to not use (or now to drop) Python and to realize that if you wish to process text seriously, you are better served by using corporate products or tools using Unicode properly. jmf -- http://mail.python.org/mailman/listinfo/python-list
Is Unicode support so hard...
In a previous post, http://groups.google.com/group/comp.lang.python/browse_thread/thread/6aec70817705c226# , Chris “Kwpolska” Warrick wrote: “Is Unicode support so hard, especially in the 21st century?” -- Unicode is not really complicate and it works very well (more than two decades of development if you take into account iso-14). But, - I can say, as usual - people prefer to spend their time to make a better Unicode than Unicode and it usually fails. Python does not escape to this rule. - I'm busy with TeX (unicode engine variant), fonts and typography. This gives me plenty of ideas to test the flexible string representation (FSR). I should recognize this FSR is failing particulary very well... I can almost say, a delight. jmf Unicode lover -- http://mail.python.org/mailman/listinfo/python-list
Re: While loop help
On 9 avr, 15:32, thomasancill...@gmail.com wrote: I'm new to learning python and creating a basic program to convert units of measurement which I will eventually expand upon but im trying to figure out how to loop the entire program. When I insert a while loop it only loops the first 2 lines. Can someone provide a detailed beginner friendly explanation. Here is my program. #!/usr/bin/env python restart = true while restart == true: #Program starts here print To start the Unit Converter please type the number next to the conversion you would like to perform choice = input(\n1:Inches to Meter\n2:Millileters to Pint\n3:Acres to Square-Miles\n) #If user enters 1:Program converts inches to meters if choice == 1: number = int(raw_input(\n\nType the amount in Inches you would like to convert to Meters.\n)) operation = Inches to Meters calc = round(number * .0254, 2) print \n,number,Inches =,calc,Meters restart = raw_input(If you would like to perform another conversion type: true\n #If user enters 2:Program converts millimeters to pints elif choice == 2: number = int(raw_input(\n\nType the amount in Milliliters you would like to convert to Pints.\n)) operation = Milliliters to Pints calc = round(number * 0.0021134,2) print \n,number,Milliliters =,calc,Pints restart = raw_input(If you would like to perform another conversion type: true\n) #If user enter 3:Program converts kilometers to miles elif choice == 3: number = int(raw_input(\n\nType the amount in Kilometers you would like to convert to Miles.\n)) operation = Kilometers to Miles calc = round(number * 0.62137,2) print \n,number,Kilometers =,calc,Miles restart = raw_input(If you would like to perform another conversion type: true\n) - More (very) important: meter: lower case m kilometre: lower case k milli: lower case m http://www.bipm.org/en/home/ Less important: Start with something simple and increase the complexity eg: # Py 3.2 while True: ... s = input('km: ') ... if s == 'q': ... break ... a = float(s) ... print('{} [kilometre] == {} [metre]'.format(a, a * 1000)) ... km: 1 1.0 [kilometre] == 1000.0 [metre] km: 1.3456 1.3456 [kilometre] == 1345.6 [metre] km: q jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: In defence of 80-char lines
On 4 avr, 03:36, Steven D'Aprano steve +comp.lang.pyt...@pearwood.info wrote: Although PEP 8 is only compulsory for the Python standard library, many users like to stick to PEP 8 for external projects. http://www.python.org/dev/peps/pep-0008/ With perhaps one glaring exception: many people hate, or ignore, PEP 8's recommendation to limit lines to 80 characters. (Strictly speaking, 79 characters.) Here is a good defence of 80 char lines: http://wrongsideofmemphis.com/2013/03/25/80-chars-per-line-is-great/ -- Steven - With unicode fonts, where even the monospaced fonts present char widths with a variable width depending on the unicode block (obvious reasons), speaking of a text width in chars has not even a sense. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
This FSR is wrong by design. A naive way to embrace Unicode. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
On 2 avr, 01:43, Neil Hodgson nhodg...@iinet.net.au wrote: Mark Lawrence: You've given many examples of the same type of micro benchmark, not many examples of different types of benchmark. Trying to work out what jmfauth is on about I found what appears to be a performance regression with '' string comparisons on Windows 64-bit. Its around 30% slower on a 25 character string that differs in the last character and 70-100% on a 100 character string that differs at the end. Can someone else please try this to see if its reproducible? Linux doesn't show this problem. c:\python32\python -u charwidth.py 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']176 [0.7116295577956576, 0.7055591343157613, 0.7203483026429418] a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']176 [0.7664397841378787, 0.7199902325464409, 0.713719289812504] a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']176 [0.7341851791817691, 0.6994205901833599, 0.7106807593741005] a=['C:/Users/Neil/Documents/ ','C:/Users/Neil/Documents/']180 [0.7346812372666784, 0.699543377914, 0.7064768417728411] c:\python33\python -u charwidth.py 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']108 [0.9913326076446045, 0.9455845241056282, 0.9459076605341776] a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']192 [1.0472289217234318, 1.0362342484091207, 1.0197109728048384] a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']192 [1.0439643704533834, 0.9878581050301687, 0.9949265834034335] a=['C:/Users/Neil/Documents/ ','C:/Users/Neil/Documents/']312 [1.0987483965446412, 1.0130257167690004, 1.024832248526499] Here is the code: # encoding:utf-8 import os, sys, timeit print(sys.version) examples = [ a=['$b','$z'], a=['$λ','$η'], a=['$b','$η'], a=['$\U0002','$\U00020001']] baseDir = C:/Users/Neil/Documents/ #~ baseDir = C:/Users/Neil/Documents/Visual Studio 2012/Projects/Sigma/QtReimplementation/HLFKBase/Win32/x64/Debug for t in examples: t = t.replace($, baseDir) # Using os.write as simple way get UTF-8 to stdout os.write(sys.stdout.fileno(), t.encode(utf-8)) print(sys.getsizeof(t)) print(timeit.repeat(a[0] a[1],t,number=500)) print() For a more significant performance difference try replacing the baseDir setting with (may be wrapped): baseDir = C:/Users/Neil/Documents/Visual Studio 2012/Projects/Sigma/QtReimplementation/HLFKBase/Win32/x64/Debug Neil Hi, c:\python32\pythonw -u charwidth.py 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] a=['D:\jm\jmpy\py3app\stringbenchb','D:\jm\jmpy\py3app \stringbenchz']168 [0.8343414906182101, 0.8336184057396241, 0.8330473419738562] a=['D:\jm\jmpy\py3app\stringbenchλ','D:\jm\jmpy\py3app \stringbenchη']168 [0.818378092261062, 0.8180854713107406, 0.8192279926793571] a=['D:\jm\jmpy\py3app\stringbenchb','D:\jm\jmpy\py3app \stringbenchη']168 [0.8131353330542339, 0.8126985677326912, 0.8122744051977042] a=['D:\jm\jmpy\py3app\stringbenchð €€','D:\jm\jmpy\py3app \stringbenchð €']172 [0.8271094603211102, 0.82704053883214, 0.8265781741004083] Exit code: 0 c:\Python33\pythonw -u charwidth.py 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)] a=['D:\jm\jmpy\py3app\stringbenchb','D:\jm\jmpy\py3app \stringbenchz']94 [1.3840254166697845, 1.3933888932429768, 1.391664674507438] a=['D:\jm\jmpy\py3app\stringbenchλ','D:\jm\jmpy\py3app \stringbenchη']176 [1.6217970707185678, 1.6279369907932706, 1.6207041728220117] a=['D:\jm\jmpy\py3app\stringbenchb','D:\jm\jmpy\py3app \stringbenchη']176 [1.5150522562729396, 1.5130369919353992, 1.5121890607025037] a=['D:\jm\jmpy\py3app\stringbenchð €€','D:\jm\jmpy\py3app \stringbenchð €']316 [1.6135375194801664, 1.6117739170366434, 1.6134331526540109] Exit code: 0 - win7 32-bits - The file is in utf-8 - Do not be afraid by this output, it is just a copy/paste for your excellent editor, the coding output pane is configured to use the locale coding. - Of course and as expected, similar behaviour from a console. (Which btw show, how good is you application). == Something different. From a previous msg, on this thread. --- Sure. And over a different set of samples, it is less compact. If you write a lot of Latin-1, Python will use one byte per character, while UTF-8 will use two bytes per character. I think you mean writing a lot of Latin-1 characters outside ASCII. However, even people writing texts in, say, French will find that only a small proportion of their text is outside ASCII and so the cost of UTF-8 is correspondingly small. The counter-problem is that a French document that needs to include one mathematical symbol (or emoji) outside Latin-1 will double in size as a Python string
Re: Performance of int/long in Python 3
On 2 avr, 10:03, Chris Angelico ros...@gmail.com wrote: On Tue, Apr 2, 2013 at 6:24 PM, jmfauth wxjmfa...@gmail.com wrote: An editor may reflect very well the example a gave. You enter thousand ascii chars, then - boum - as you enter a non ascii char, your editor (assuming is uses a mechanism like the FSR), has to internally reencode everything! That assumes that the editor stores the entire buffer as a single Python string. Frankly, I think this unlikely; the nature of insertions and deletions makes this impractical. (I've known editors that do function this way. They're utterly unusable on large files.) ChrisA No, no, no, no, ... as we say in French (this is a kindly form). The length of a string may have its importance. This bad behaviour may happen on every char. The most complicated chars are the chars with diacritics and ligatured [1, 2] chars, eg chars used in Arabic script [2]. It is somehow funny to see, the FSR fails precisely on problems Unicode will solve/handle, eg normalization or sorting [3]. No really a problem for those you are endorsing the good work Unicode does [5]. [1] A point which was not, in my mind, very well understood when I read the PEP393 discussion. [2] Take a unicode TeX compliant engine and toy with the decomposed form of these chars. A very good way, to understand what can be really a char, when you wish to process text seriously. [3] I only test and tested these chars blindly with the help of the doc I have. Btw, when I test complicated Arabic chars, I noticed, Py33 crashes, it does not really crash, it get stucked in some king of infinite loop (or is it due to timeit?). [4] Am I the only one who test this kind of stuff? [5] Unicode is a fascinating construction. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
On 2 avr, 10:35, Steven D'Aprano steve +comp.lang.pyt...@pearwood.info wrote: On Tue, 02 Apr 2013 19:03:17 +1100, Chris Angelico wrote: So what? Who cares if it takes 0.2 second to insert a character instead of 0.1 second? That's still a hundred times faster than you can type. - This not the problem. The interesting point is that they are good and less good Unicode implementations. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
On 2 avr, 16:03, Steven D'Aprano steve +comp.lang.pyt...@pearwood.info wrote: On Tue, 02 Apr 2013 11:58:11 +0100, Steve Simmons wrote: I'm sure you didn't intend to be insulting, but some of us *have* taken JMF seriously, at least at first. His repeated overblown claims of how Python is destroying Unicode ... Sorrry I never claimed this, I'm just seeing on how Python is becoming less Unicode friendly. This feature is a *memory optimization*, not a speed optimization, I totaly agree, and utf-8 is doing that with a great art. (see Neil Hodgson comment). (Do not interpret this as if i'm saying Python should use utf-8, as I'have read). jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
On 2 avr, 18:57, rusi rustompm...@gmail.com wrote: On Apr 2, 8:17 pm, Ethan Furman et...@stoneleaf.us wrote: Simmons (too many Steves!), I know you're new so don't have all the history with jmf that many of us do, but consider that the original post was about numbers, had nothing to do with characters or unicode *in any way*, and yet jmf still felt the need to bring unicode up. Just for reference, here is the starting para of Chris' original mail that started this thread. The Python 3 merge of int and long has effectively penalized small-number arithmetic by removing an optimization. As we've seen from PEP 393 strings (jmf aside), there can be huge benefits from having a single type with multiple representations internally. Is there value in making the int type have a machine-word optimization in the same way? ie it mentions numbers, strings, PEP 393 *AND jmf.* So while it is true that jmf has been butting in with trollish behavior into completely unrelated threads with his unicode rants, that cannot be said for this thread. - That's because you did not understand the analogy, int/long - FSR. One another illustration, def AddOne(i): ... if 0 i = 100: ... return i + 10 + 10 + 10 - 10 - 10 - 10 + 1 ... elif 100 i = 1000: ... return i + 100 + 100 + 100 + 100 - 100 - 100 - 100 - 100 + 1 ... else: ... return i + 1 ... Do it work? yes. Is is correct? this can be discussed. Now replace i by a char, a representent of each subset of the FSR, select a method where this FST behave badly and take a look of what happen. timeit.repeat('a' * 1000 + 'z') [0.6532032148133153, 0.6407248807756699, 0.6407264561239894] timeit.repeat('a' * 1000 + '9') [0.6429508479509245, 0.6242782443215589, 0.6240490311410927] timeit.repeat('a' * 1000 + '€') [1.095694927496563, 1.0696347279235603, 1.0687741939041082] timeit.repeat('a' * 1000 + 'ẞ') [1.0796421281222877, 1.0348612767961853, 1.035325216876231] timeit.repeat('a' * 1000 + '\u2345') [1.0855414137412112, 1.0694677410017164, 1.0688096392412945] timeit.repeat('œ' * 1000 + '\U00010001') [1.237314015362017, 1.2226262553064657, 1.21994619397816] timeit.repeat('œ' * 1000 + '\U00010002') [1.245773635836997, 1.2303978424029651, 1.2258257877430765] Where does it come from? Simple, the FSR breaks the simple rules used in all coding schemes (unicode or not). 1) a unique set of chars 2) the same algorithm for all chars. And again that's why utf-8 is working very smoothly. The corporates which understood this very well and wanted to incorporate, let say, the used characters of the French language had only the choice to create new coding schemes (eg mac-roman, cp1252). In unicode, the latin-1 range is real plague. After years of experience, I'm still fascinated to see the corporates has solved this issue easily and the free software is still relying on latin-1. I never succeed to find an explanation. Even, the TeX folks, when they shifted to the Cork encoding in 199?, were aware of this and consequently provides special package(s). No offense, this is in my mind why corporate software will always be corporate software and hobbyist software will always stay at the level of hobbyist software. A French windows user, understanding nothing in the coding of characters, assuming he is aware of its existence (!), has certainly no problem. Fascinating how it is possible to use Python to teach, to illustrate, to explain the coding of the characters. No? jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
- I'm not whining or and I'm not complaining (and never did). I always exposed facts. I'm not especially interested in Python, I'm interested in Unicode. Usualy when I posted examples, there are confirmed. What I see is this (std download-abled Python's on Windows 7 (and other Windows/platforms/machines): Py32 import timeit timeit.repeat('a' * 1000 + 'ẞ') [0.7005365263669056, 0.6810694766790423, 0.6811978680727229] timeit.repeat('a' * 1000 + 'z') [0.7105829560031083, 0.6904999426964764, 0.6938637184431968] Py33 import timeit timeit.repeat('a' * 1000 + 'ẞ') [1.1484035160337613, 1.1233738895227505, 1.1215708962703874] timeit.repeat('a' * 1000 + 'z') [0.6640958193635527, 0.6469043692851528, 0.645896142397] I have systematically such a behaviour, in 99.9% of my tests. When there is something better, it is usually because something else (3.2/3.3) has been modified. I have my idea where this is coming from. Question: When it is claimed, that this has been tested, do you mean stringbench.py as proposed many times by Terry? (Thanks for an answer). jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
On 1 avr, 21:28, Chris Angelico ros...@gmail.com wrote: On Tue, Apr 2, 2013 at 6:15 AM, jmfauth wxjmfa...@gmail.com wrote: Py32 import timeit timeit.repeat('a' * 1000 + 'ẞ') [0.7005365263669056, 0.6810694766790423, 0.6811978680727229] timeit.repeat('a' * 1000 + 'z') [0.7105829560031083, 0.6904999426964764, 0.6938637184431968] Py33 import timeit timeit.repeat('a' * 1000 + 'ẞ') [1.1484035160337613, 1.1233738895227505, 1.1215708962703874] timeit.repeat('a' * 1000 + 'z') [0.6640958193635527, 0.6469043692851528, 0.645896142397] This is what's called a microbenchmark. Can you show me any instance in production code where an operation like this is done repeatedly, in a time-critical place? It's a contrived example, and it's usually possible to find regressions in any system if you fiddle enough with the example. Do you have, for instance, a web server that can handle 1000 tps on 3.2 and only 600 tps on 3.3, all other things being equal? ChrisA - Of course this is an example, as many I gave. Examples you may find in apps. Can you point and give at least a bunch of examples, showing there is no regression, at least to contradict me. The only one I succeed to see (in month), is the one given by Steven, a status quo. I will happily accept them. The only think I read is this is faster, it has been tested, ... jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
-- Neil Hodgson: The counter-problem is that a French document that needs to include one mathematical symbol (or emoji) outside Latin-1 will double in size as a Python string. Serious developers/typographers/users know that you can not compose a text in French with latin-1. This is now also the case with German (Germany). --- Neil's comment is correct, sys.getsizeof('a' * 1000 + 'z') 1026 sys.getsizeof('a' * 1000 + '€') 2040 This is not really the problem. Serious users may notice sooner or later, Python and Unicode are walking in opposite directions (technically and in spirit). timeit.repeat('a' * 1000 + 'ẞ') [1.1088995672090292, 1.0842266613261913, 1.1010779011941594] timeit.repeat('a' * 1000 + 'z') [0.6362570846925735, 0.6159128762502917, 0.6200501673623791] (Just an opinion) jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 07:12, Ethan Furman et...@stoneleaf.us wrote: On 03/27/2013 08:49 PM, rusi wrote: In particular You are a liar is as bad as You are an idiot The same statement can be made non-abusively thus: ... is not true because ... I don't agree. With all the posts and micro benchmarks and other drivel that jmf has inflicted on us, I find it /very/ hard to believe that he forgot -- which means he was deliberately lying. At some point we have to stop being gentle / polite / politically correct and call a shovel a shovel... er, spade. -- ~Ethan~ --- The problem is elsewhere. Nobody understand the examples I gave on this list, because nobody understand Unicode. These examples are not random examples, they are well thought. If you were understanding the coding of the characters, Unicode and what this flexible representation does, it would not be a problem for you to create analog examples. So, we are turning into circles. This flexible representation succeeds to cumulate in one shoot all the design mistakes it is possible to do, when one wishes to implements Unicode. Example of a good Unicode understanding. If you wish 1) to preserve memory, 2) to cover the whole range of Unicode, 3) to keep maximum performance while preserving the good work Unicode.org as done (normalization, sorting), there is only one solution: utf-8. For this you have to understand, what is really a unicode transformation format. Why all the actors, active in the text field, like MicroSoft, Apple, Adobe, the unicode compliant TeX engines, the foundries, the organisation in charge of the OpenType font specifications, are able to handle all this stuff correctly (understanding + implementation) and Python not?, I should say this is going beyond my understanding. Python has certainly and definitvely not revolutionize Unicode. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 11:30, Chris Angelico ros...@gmail.com wrote: On Thu, Mar 28, 2013 at 8:03 PM, jmfauth wxjmfa...@gmail.com wrote: - You really REALLY need to sort out in your head the difference between correctness and performance. I still haven't seen one single piece of evidence from you that Python 3.3 fails on any point of Unicode correctness. That's because you are not understanding unicode. Unicode takes you from the character to the unicoded transformed fomat via the code point, working with a unique set of characters with a contigoous range of code points. Then it is up to the implementors (languages, compilers, ...) to implement this utf. Covering the whole range of Unicode has never been a problem. ... for all those, who are following the scheme explained above. And it magically works smoothly. Of course, there are some variations due to the Character Encoding Form wich is later influenced by the Character Encoding Scheme (the serialization of the character Encoding Scheme). Rough explanation in other words. I does not matter if you are using utf-8, -16, -32, ucs2 or ucs4. All the single characters are handled in the same way with the same algorithm. --- The flexible string representation takes the problem from the other side, it attempts to work with the characters by using their representations and it (can only) fails... PS I never propose to use utf-8. I only spoke about utf-8 as an example. If you start to discuss indexing, you are off-topic. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 14:01, Steven D'Aprano steve +comp.lang.pyt...@pearwood.info wrote: On Thu, 28 Mar 2013 23:11:55 +1100, Neil Hodgson wrote: Ian Foote: One benefit of UTF-8 over Python's flexible representation is that it is, on average, more compact over a wide set of samples. Sure. And over a different set of samples, it is less compact. If you write a lot of Latin-1, Python will use one byte per character, while UTF-8 will use two bytes per character. This flexible string representation is so absurd that not only it does not know you can not write Western European Languages with latin-1, it penalizes you by just attempting to optimize latin-1. Shown in my multiple examples. (This is a similar case of the long and short int question/dicussion Chris Angelico opened). PS1: I received plenty of private mails. I'm suprise, how the dev do not understand unicode. PS2: Question I received once from a registrated French Python Developper (in another context). What are those French characters you can handle with cp1252 and not with latin-1? jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 15:38, Chris Angelico ros...@gmail.com wrote: On Fri, Mar 29, 2013 at 1:12 AM, jmfauth wxjmfa...@gmail.com wrote: This flexible string representation is so absurd that not only it does not know you can not write Western European Languages with latin-1, it penalizes you by just attempting to optimize latin-1. Shown in my multiple examples. PEP393 strings have two optimizations, or kinda three: 1a) ASCII-only strings 1b) Latin1-only strings 2) BMP-only strings 3) Everything else Options 1a and 1b are almost identical - I'm not sure what the detail is, but there's something flagging those strings that fit inside seven bits. (Something to do with optimizing encodings later?) Both are optimized down to a single byte per character. Option 2 is optimized to two bytes per character. Option 3 is stored in UTF-32. Once again, jmf, you are forgetting that option 2 is a safe and bug-free optimization. ChrisA As long as you are attempting to devide a set of characters in chunks and try to handle them seperately, it will never work. Read my previous post about the unicode transformation format. I know what pep393 does. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 16:14, jmfauth wxjmfa...@gmail.com wrote: On 28 mar, 15:38, Chris Angelico ros...@gmail.com wrote: On Fri, Mar 29, 2013 at 1:12 AM, jmfauth wxjmfa...@gmail.com wrote: This flexible string representation is so absurd that not only it does not know you can not write Western European Languages with latin-1, it penalizes you by just attempting to optimize latin-1. Shown in my multiple examples. PEP393 strings have two optimizations, or kinda three: 1a) ASCII-only strings 1b) Latin1-only strings 2) BMP-only strings 3) Everything else Options 1a and 1b are almost identical - I'm not sure what the detail is, but there's something flagging those strings that fit inside seven bits. (Something to do with optimizing encodings later?) Both are optimized down to a single byte per character. Option 2 is optimized to two bytes per character. Option 3 is stored in UTF-32. Once again, jmf, you are forgetting that option 2 is a safe and bug-free optimization. ChrisA As long as you are attempting to devide a set of characters in chunks and try to handle them seperately, it will never work. Read my previous post about the unicode transformation format. I know what pep393 does. jmf Addendum. This was you correctly percieved in one another thread. You qualified it as a switch. Now you have to understand from where this switch is coming from. jmf by toy with -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
Chris, Your problem with int/long, the start of this thread, is very intersting. This is not a demonstration, a proof, rather an illustration. Assume you have a set of integers {0...9} and an operator, let say, the addition. Idea. Just devide this set in two chunks, {0...4} and {5...9} and work hardly to optimize the addition of 2 operands in the sets {0...4}. The problems. - When optimizing {0...4}, your algorithm will most probably weaken {5...9}. - When using {5...9}, you do not benefit from your algorithm, you will be penalized just by the fact you has optimized {0...4} - And the first mistake, you are just penalized and impacted by the fact you have to select in which subset you operands are when working with {0...9}. Very interestingly, working with the representation (bytes) of these integers will not help. You have to consider conceptually {0..9} as numbers. Now, replace numbers by characters, bytes by encoded code points, and you have qualitatively the flexible string representation. In Unicode, there is one more level of abstraction: one conceptually neither works with characters, nor with encoded code points, but with unicode transformed formated entities. (see my previous post). That means you can work very hardly on the bytes levels, you will never solves the problem which is one level higher in the unicode hierarchy: character - code point - utf - bytes (implementation) with the important fact that this construct can only go from left to right. --- In fact, by proposing a flexible representation of ints, you may just fall in the same trap the flexible string representation presents. All this stuff is explained in good books about the coding of the characters and/or unicode. The unicode.org documention explains it too. It is a little bit harder to discover, because the doc is presenting always this stuff from a technical perspective. You get it when reading a large part of the Unicode doc. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote: On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote: The flexible string representation takes the problem from the other side, it attempts to work with the characters by using their representations and it (can only) fails... This is false. As I've pointed out to you before, the FSR does not divide characters up by representation. It divides them up by codepoint -- more specifically, by the *bit-width* of the codepoint. We call the internal format of the string ASCII or Latin-1 or UCS-2 for conciseness and a point of reference, but fundamentally all of the FSR formats are simply byte arrays of *codepoints* -- you know, those things you keep harping on. The major optimization performed by the FSR is to consistently truncate the leading zero bytes from each codepoint when it is possible to do so safely. But regardless of to what extent this truncation is applied, the string is *always* internally just an array of codepoints, and the same algorithms apply for all representations. - You know, we can discuss this ad nauseam. What is important is Unicode. You have transformed Python back in an ascii oriented product. If Python had imlemented Unicode correctly, there would be no difference in using an a, é, € or any character, what the narrow builds did. If I am practically the only one, who speakes /discusses about this, I can ensure you, this has been noticed. Now, it's time to prepare the Asparagus, the jambon cru and a good bottle a dry white wine. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 18:55, Chris Angelico ros...@gmail.com wrote: On Fri, Mar 29, 2013 at 4:48 AM, jmfauth wxjmfa...@gmail.com wrote: If Python had imlemented Unicode correctly, there would be no difference in using an a, é, € or any character, what the narrow builds did. I'm not following your grammar perfectly here, but if Python were implementing Unicode correctly, there would be no difference between any of those characters, which is the way a *wide* build works. With a narrow build, there is a difference between BMP and non-BMP characters. ChrisA The wide build (I never used) is in my mind as correct as the narrow build. It just covers a different range in unicode (the whole range). Claiming that the narrow build is buggy, because it does not cover the whole unicode is not correct. Unicode does not stipulate, one has to cover the whole range. Unicode expects that every character in a range behaves the same way. This is clearly not realized with the flexible string representation. An user should not be somehow penalized simply because it not an ascii user. If you take the fonts in consideration (btw a problem nobody is speaking about) and you ensure your application, toolkit, ... is MES-X or WGL4 compliant, your are also deliberately (and correctly) working with a restriced unicode range. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 21:29, Benjamin Kaplan benjamin.kap...@case.edu wrote: On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wxjmfa...@gmail.com wrote: On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote: On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote: The flexible string representation takes the problem from the other side, it attempts to work with the characters by using their representations and it (can only) fails... This is false. As I've pointed out to you before, the FSR does not divide characters up by representation. It divides them up by codepoint -- more specifically, by the *bit-width* of the codepoint. We call the internal format of the string ASCII or Latin-1 or UCS-2 for conciseness and a point of reference, but fundamentally all of the FSR formats are simply byte arrays of *codepoints* -- you know, those things you keep harping on. The major optimization performed by the FSR is to consistently truncate the leading zero bytes from each codepoint when it is possible to do so safely. But regardless of to what extent this truncation is applied, the string is *always* internally just an array of codepoints, and the same algorithms apply for all representations. - You know, we can discuss this ad nauseam. What is important is Unicode. You have transformed Python back in an ascii oriented product. If Python had imlemented Unicode correctly, there would be no difference in using an a, é, € or any character, what the narrow builds did. If I am practically the only one, who speakes /discusses about this, I can ensure you, this has been noticed. Now, it's time to prepare the Asparagus, the jambon cru and a good bottle a dry white wine. jmf You still have yet to explain how Python's string representation is wrong. Just how it isn't optimal for one specific case. Here's how I understand it: 1) Strings are sequences of stuff. Generally, we talk about strings as either sequences of bytes or sequences of characters. 2) Unicode is a format used to represent characters. Therefore, Unicode strings are character strings, not byte strings. 2) Encodings are functions that map characters to bytes. They typically also define an inverse function that converts from bytes back to characters. 3) UTF-8 IS NOT UNICODE. It is an encoding- one of those functions I mentioned in the previous point. It happens to be one of the five standard encodings that is defined for all characters in the Unicode standard (the others being the little and big endian variants of UTF-16 and UTF-32). 4) The internal representation of a character string DOES NOT MATTER. All that matters is that the API represents it as a string of characters, regardless of the representation. We could implement character strings by putting the Unicode code-points in binary-coded decimal and it would be a Unicode character string. 5) The String type that .NET and Java (and unicode type in Python narrow builds) use is not a character string. It is a string of shorts, each of which corresponds to a UTF-16 code point. I know this is the case because in all of these, the length of \u1f435 is 2 even though it only consists of one character. 6) The new string representation in Python 3.3 can successfully represent all characters in the Unicode standard. The actual number of bytes that each character consumes is invisible to the user. -- I shew enough examples. As soon as you are using non latin-1 chars your optimization just became irrelevant and not only this, you are penalized. I'm sorry, saying Python now is just covering the whole unicode range is not a valuable excuse. I prefer a correct version with a narrower range of chars, especially if this range represents the daily used chars. I can go a step further, if I wish to write an application for Western European users, I'm better served if I'm using a coding scheme covering all thesee languages/scripts. What about cp1252 [*]? Does this not remind somthing? Python can do better, it only succeeds to do worth! [*] yes, I kwnow, internally jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 22:11, jmfauth wxjmfa...@gmail.com wrote: On 28 mar, 21:29, Benjamin Kaplan benjamin.kap...@case.edu wrote: On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wxjmfa...@gmail.com wrote: On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote: On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote: The flexible string representation takes the problem from the other side, it attempts to work with the characters by using their representations and it (can only) fails... This is false. As I've pointed out to you before, the FSR does not divide characters up by representation. It divides them up by codepoint -- more specifically, by the *bit-width* of the codepoint. We call the internal format of the string ASCII or Latin-1 or UCS-2 for conciseness and a point of reference, but fundamentally all of the FSR formats are simply byte arrays of *codepoints* -- you know, those things you keep harping on. The major optimization performed by the FSR is to consistently truncate the leading zero bytes from each codepoint when it is possible to do so safely. But regardless of to what extent this truncation is applied, the string is *always* internally just an array of codepoints, and the same algorithms apply for all representations. - You know, we can discuss this ad nauseam. What is important is Unicode. You have transformed Python back in an ascii oriented product. If Python had imlemented Unicode correctly, there would be no difference in using an a, é, € or any character, what the narrow builds did. If I am practically the only one, who speakes /discusses about this, I can ensure you, this has been noticed. Now, it's time to prepare the Asparagus, the jambon cru and a good bottle a dry white wine. jmf You still have yet to explain how Python's string representation is wrong. Just how it isn't optimal for one specific case. Here's how I understand it: 1) Strings are sequences of stuff. Generally, we talk about strings as either sequences of bytes or sequences of characters. 2) Unicode is a format used to represent characters. Therefore, Unicode strings are character strings, not byte strings. 2) Encodings are functions that map characters to bytes. They typically also define an inverse function that converts from bytes back to characters. 3) UTF-8 IS NOT UNICODE. It is an encoding- one of those functions I mentioned in the previous point. It happens to be one of the five standard encodings that is defined for all characters in the Unicode standard (the others being the little and big endian variants of UTF-16 and UTF-32). 4) The internal representation of a character string DOES NOT MATTER. All that matters is that the API represents it as a string of characters, regardless of the representation. We could implement character strings by putting the Unicode code-points in binary-coded decimal and it would be a Unicode character string. 5) The String type that .NET and Java (and unicode type in Python narrow builds) use is not a character string. It is a string of shorts, each of which corresponds to a UTF-16 code point. I know this is the case because in all of these, the length of \u1f435 is 2 even though it only consists of one character. 6) The new string representation in Python 3.3 can successfully represent all characters in the Unicode standard. The actual number of bytes that each character consumes is invisible to the user. -- I shew enough examples. As soon as you are using non latin-1 chars your optimization just became irrelevant and not only this, you are penalized. I'm sorry, saying Python now is just covering the whole unicode range is not a valuable excuse. I prefer a correct version with a narrower range of chars, especially if this range represents the daily used chars. I can go a step further, if I wish to write an application for Western European users, I'm better served if I'm using a coding scheme covering all thesee languages/scripts. What about cp1252 [*]? Does this not remind somthing? Python can do better, it only succeeds to do worth! [*] yes, I kwnow, internally jmf - Addendum. And you kwow what? Py34 will suffer from the same desease. You are spending your time in improving chunks of bytes, when the problem is elsewhere. In fact you are working for peanuts, eg the replacing method. If you are not satisfied with my examples, just pick up the examples of GvR (ascii-string) on the bug tracker, timeit them and you will see there is already a problem. Better, timeit them afeter having replaced his ascii-strings with non ascii characters... jmf and you will see, there is -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
On 26 mar, 22:08, Grant Edwards inva...@invalid.invalid wrote: I think we all agree that jmf is a character. -- The characters are also intrisic characteristics of a group in the Group Theory. If you are not a mathematician, but eg a scientist in need of these characters, they are available in precalculated tables, one shorly calls ... Tables of characters ! (My booklet of the tables is titled Tables for Group Theory) Example in chemistry, mainly quantum chemistry: Group Theory and its Application to Chemistry http://chemwiki.ucdavis.edu/Physical_Chemistry/Symmetry/Group_Theory%3A_Application (Copied link from Firefox). jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
On 25 mar, 22:51, Chris Angelico ros...@gmail.com wrote: The Python 3 merge of int and long has effectively penalized small-number arithmetic by removing an optimization. As we've seen from PEP 393 strings (jmf aside), there can be huge benefits from having a single type with multiple representations internally ... -- A character is not an integer (short form). jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
On 26 mar, 20:03, Chris Angelico ros...@gmail.com wrote: On Wed, Mar 27, 2013 at 5:50 AM, jmfauth wxjmfa...@gmail.com wrote: On 25 mar, 22:51, Chris Angelico ros...@gmail.com wrote: The Python 3 merge of int and long has effectively penalized small-number arithmetic by removing an optimization. As we've seen from PEP 393 strings (jmf aside), there can be huge benefits from having a single type with multiple representations internally ... -- A character is not an integer (short form). So? ChrisA A character is not an integer. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: monty python
On 23 mar, 17:17, Mark Lawrence breamore...@yahoo.co.uk wrote: On 23/03/2013 09:24, jmfauth wrote: On 20 mar, 22:02, Tim Delaney tim.dela...@aptare.com wrote: On 21 March 2013 06:40, jmfauth wxjmfa...@gmail.com wrote: [snip usual rant from jmf] It has been acknowledged as a real regression, but he keeps hijacking every thread where strings are mentioned to harp on about it. He has shown no inclination to attempt to *fix* the regression and is rapidly coming to be regarded as a troll by most participants in this list. - I can not help to fix it, because it is unfixable. It is unfixable, because this flexible string representation is wrong by design. jmf Of course it's fixable. All you need do is write a PEP clearing stating what is wrong with the implementation detailed in PEP393 and your own proposed design. I'm looking forward to reading this PEP. Note that going backwards to buggier unicode implementations that existed in Python prior to version 3.3 is simply not an option. -- Cheers. Mark Lawrence -- The problem here is that this PEP 393 should not have been created. The first time I read it, I quickly understood, it can not work! This is illustrated by all the examples I give on this list. In all the cases, I can explain why. I never saw somebody beeing able to argue these examples are wrong and/or explaining why they are wrong, except arguing the flexible string representation exists! jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: monty python
On 20 mar, 22:02, Tim Delaney tim.dela...@aptare.com wrote: On 21 March 2013 06:40, jmfauth wxjmfa...@gmail.com wrote: [snip usual rant from jmf] It has been acknowledged as a real regression, but he keeps hijacking every thread where strings are mentioned to harp on about it. He has shown no inclination to attempt to *fix* the regression and is rapidly coming to be regarded as a troll by most participants in this list. - I can not help to fix it, because it is unfixable. It is unfixable, because this flexible string representation is wrong by design. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: monty python
On 21 mar, 04:12, rusi rustompm...@gmail.com wrote: On Mar 21, 12:40 am, jmfauth wxjmfa...@gmail.com wrote: Courageous people can try to do something with the unicode collation algorithm (see unicode.org). Some time ago, for the fun, I wrote something (not perfect) with a reduced keys table (see unicode.org), only a keys subset for some scripts hold in memory. It works with Py32 and Py33. In an attempt to just see the performance and how it can react, I did an horrible mistake, I forgot Py33 is now optimized for ascii user, it is no more unicode compliant and I stupidely tested/sorted lists of French words... Now lets take this piece by piece… I did an horrible mistake : I am sorry. Did you get bruised? Break some bones? And is 'h' a vowel in french? I forgot Py33 is now optimized for ascii user Ok. it is no more unicode compliant I asked earlier and I ask again -- What do you mean by (non)compliant? -- One aspect of Unicode (note the capitalized U). py32 timeit.repeat('abc需'.find('a')) [0.27941279564856814, 0.26568106110789813, 0.265546366757917] timeit.repeat('abcdef'.find('a')) [0.2891812867801491, 0.26698153112010914, 0.26738994644529157] py33 timeit.repeat('abc需'.find('a')) [0.5941777382531654, 0.5829193385634426, 0.5519412133990045] timeit.repeat('abcdef'.find('a')) [0.44333188136533863, 0.4232506078969891, 0.4225164843046514] --- In French, depending of the word, a leading h, behaves as a vowel or as a consonant. (From this - this typical mistake) jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Help. HOW TO guide for PyQt installation
On 20 mar, 11:38, Phil Thompson p...@riverbankcomputing.com wrote: On Wed, 20 Mar 2013 03:29:35 -0700 (PDT), jmfauth wxjmfa...@gmail.com wrote: On 20 mar, 10:30, Phil Thompson p...@riverbankcomputing.com wrote: On Wed, 20 Mar 2013 02:09:06 -0700 (PDT), jmfauth wxjmfa...@gmail.com wrote: On 20 mar, 01:12, D. Xenakis gouzouna...@hotmail.com wrote: Hi there, Im searching for an installation guide for PyQt toolkit. To be honest im very confused about what steps should i follow for a complete and clean installation. Should i better choose to install the 32bit or the 64bit windows version? Or maybe both? Any chance one of them is more/less bug-crashy than the other? I know both are availiable on the website but just asking.. If i installed this package on windows 8, should i have any problems? From what i read PyQt supports only xp and win7. I was thinking about installing the newer version of PyQt along with the QT5. I have zero expirience on PyQt so either way, everything is going to be new to me, so i dont care that much about the learning curve diference between new and old PyQt - Qt version. I did not find any installer so i guess i should customly do everything. Any guide for this plz? Id also like to ask.. Commercial licence of PyQt can only be bought on riverbank's website? I think i noticed somewhere an other reseller cheaper one or maybe i didnt know what the hell i was reading :). Maybe something about Qt and not PyQt. Please help this noob, Regards Short answer without explanation. It does not work. jmf Well it works for me. Care to elaborate? Phil No problem. Yesterday, I downloaded PyQt4-4.10-gpl-Py3.3-Qt5.0.1-x32-2.exe and installed it on my Windows 7 Pro box after having removed a previous version. No problem with the installation. I quickly tested it with one of my interactive Python interpreters and got an error from PyQt4 import QtGui, QtCore saying, that the DLL cannot be found. Something similar to what Detlev Offenbach reported on the PyQt mailing list. Although, I'm not using Qsci. Strangely, I had not problem (if I recall correctly) with a very basic application (QMainWindow + QLineEdit). I had no problem with the demo (I only lauched it). I did not spend to much time in investigating further. It's the first time I see such an error; usually, no problem. The only time that I've seen a problem like that is when running from a shell that was started before running the PyQt installer (ie. one with an out of date PATH). Phil -- The PATH could be the cause. I stupidly forgot to check it before removing PyQt... I repeated the experiment (app == eta26.py). With and without PyQt in the system PATH. (Btw, why is it necessary?) D:\jm\jmpy\eta\eta26c:\python32\python eta26.py PyQt: 4.8.6, Qt: 4.7.4 Python 3.2.3 No problem. D:\jm\jmpy\eta\eta26c:\python33\python eta26.py Traceback (most recent call last): File eta26.py, line 32, in module from PyQt4 import QtGui, QtCore ImportError: DLL load failed: Le module spécifié est introuvable. (Translation: The specified module can no be found.) D:\jm\jmpy\eta\eta26c:\python33\python eta26.py PyQt: 4.10, Qt: 4.8.4 Python 3.3.0 No problem. No idea. It is mysterious for me. eta26 is only importing QtGui and QtCore. It however uses a sophisticated widget like QPlainTextEdit. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Help. HOW TO guide for PyQt installation
On 20 mar, 11:29, jmfauth wxjmfa...@gmail.com wrote: On 20 mar, 10:30, Phil Thompson p...@riverbankcomputing.com wrote: - Strangely, I had not problem (if I recall correctly) with a very basic application (QMainWindow + QLineEdit). ADDENDUM, CORRECTION It fails too. I forgot to rename PySide -- PyQt4 ! I tried to collect other experiences via Google. No luck. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Help. HOW TO guide for PyQt installation
On 20 mar, 01:12, D. Xenakis gouzouna...@hotmail.com wrote: Hi there, Im searching for an installation guide for PyQt toolkit. To be honest im very confused about what steps should i follow for a complete and clean installation. Should i better choose to install the 32bit or the 64bit windows version? Or maybe both? Any chance one of them is more/less bug-crashy than the other? I know both are availiable on the website but just asking.. If i installed this package on windows 8, should i have any problems? From what i read PyQt supports only xp and win7. I was thinking about installing the newer version of PyQt along with the QT5. I have zero expirience on PyQt so either way, everything is going to be new to me, so i dont care that much about the learning curve diference between new and old PyQt - Qt version. I did not find any installer so i guess i should customly do everything. Any guide for this plz? Id also like to ask.. Commercial licence of PyQt can only be bought on riverbank's website? I think i noticed somewhere an other reseller cheaper one or maybe i didnt know what the hell i was reading :). Maybe something about Qt and not PyQt. Please help this noob, Regards Short answer without explanation. It does not work. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Help. HOW TO guide for PyQt installation
On 20 mar, 10:30, Phil Thompson p...@riverbankcomputing.com wrote: On Wed, 20 Mar 2013 02:09:06 -0700 (PDT), jmfauth wxjmfa...@gmail.com wrote: On 20 mar, 01:12, D. Xenakis gouzouna...@hotmail.com wrote: Hi there, Im searching for an installation guide for PyQt toolkit. To be honest im very confused about what steps should i follow for a complete and clean installation. Should i better choose to install the 32bit or the 64bit windows version? Or maybe both? Any chance one of them is more/less bug-crashy than the other? I know both are availiable on the website but just asking.. If i installed this package on windows 8, should i have any problems? From what i read PyQt supports only xp and win7. I was thinking about installing the newer version of PyQt along with the QT5. I have zero expirience on PyQt so either way, everything is going to be new to me, so i dont care that much about the learning curve diference between new and old PyQt - Qt version. I did not find any installer so i guess i should customly do everything. Any guide for this plz? Id also like to ask.. Commercial licence of PyQt can only be bought on riverbank's website? I think i noticed somewhere an other reseller cheaper one or maybe i didnt know what the hell i was reading :). Maybe something about Qt and not PyQt. Please help this noob, Regards Short answer without explanation. It does not work. jmf Well it works for me. Care to elaborate? Phil No problem. Yesterday, I downloaded PyQt4-4.10-gpl-Py3.3-Qt5.0.1-x32-2.exe and installed it on my Windows 7 Pro box after having removed a previous version. No problem with the installation. I quickly tested it with one of my interactive Python interpreters and got an error from PyQt4 import QtGui, QtCore saying, that the DLL cannot be found. Something similar to what Detlev Offenbach reported on the PyQt mailing list. Although, I'm not using Qsci. Strangely, I had not problem (if I recall correctly) with a very basic application (QMainWindow + QLineEdit). I had no problem with the demo (I only lauched it). I did not spend to much time in investigating further. It's the first time I see such an error; usually, no problem. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: monty python
Courageous people can try to do something with the unicode collation algorithm (see unicode.org). Some time ago, for the fun, I wrote something (not perfect) with a reduced keys table (see unicode.org), only a keys subset for some scripts hold in memory. It works with Py32 and Py33. In an attempt to just see the performance and how it can react, I did an horrible mistake, I forgot Py33 is now optimized for ascii user, it is no more unicode compliant and I stupidely tested/sorted lists of French words... jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: String performance regression from python 3.2 to 3.3
-- utf-32 is already here. You are all most probably [*] using it without noticing it. How? By using OpenType fonts, without counting the text processing applications using them. Why? Because there is no other way to do it. [*] depending of the font, the internal table(s), eg cmap table, are in utf-16 or utf-32. jmf -- http://mail.python.org/mailman/listinfo/python-list
A reply for rusi (FSR)
As a reply to rusi's comment: http://groups.google.com/group/comp.lang.python/browse_thread/thread/a7689b158fdca29e# From string creation to the itertools usage. A medley. Some timings. Important: The real/absolute values of these experiments are not important. I do not care and I'm not complaining at all. These values are expected, I expected such values and they are only confirming (*FOR ME*) my understanding of the coding of the characters (and Unicode). #~ py323 py330 #~ test 1: 0.0153577374128190.019290216142579 #~ test 2: 0.0156988016671980.020386269052436 #~ test 3: 0.0156133386842880.018769561472500 #~ test 4: 0.0232352977085290.032253414679390 #~ test 5: 0.0233270621095340.029621391108935 #~ test 6: 1.1199581270767601.095467665651482 #~ test 7: 0.4201584727883110.565518010043673 #~ test 8: 0.6494442346159741.061556978013171 #~ test 9: 0.7123351440720791.211614222458175 #~ test 10: 0.7046229960013571.160909074081441 #~ test 11: 0.6146745849236211.053985430333688 #~ test 12: 0.6603362357927641.059443246081010 #~ test 13: 4.8214359277715705.795325214218677 #~ test 14: 0.4940126682134030.729330462512273 #~ test 15: 0.5048944295857880.879966255906103 #~ test 16: 0.6930933700811031.132884304782264 #~ test 17: 0.7490767437894613.013804437852462 #~ test 18: 7.467055989281286 13.387841650089342 #~ test 19: 7.581776062566778 13.593412812594643 #~ test 20: 9.477877493343140 15.235388291413805 #~ test 21: 0.0226146080261960.020984116094176 #~ test 22: 6.685022041178975 12.687538276191944 #~ test 23: 6.946794763994170 12.986701250949636 #~ test 24: 0.0977968273147600.156285014715777 #~ test 25: 0.0249158071466770.034190706904894 #~ test 26: 0.0249965440660130.032191582014335 #~ test 27: 0.0006939436676840.001315421027272 #~ test 28: 0.0006797654769670.001305968900141 #~ test 29: 0.0016143445481520.025543979763000 #~ test 30: 0.0002040084108120.000286714523313 #~ test 31: 0.0002134605379640.000301286552656 #~ test 32: 0.0002040084108190.000291440586878 #~ test 33: 0.2496929043275390.497374474766957 #~ test 34: 0.2487504484837400.513947598194790 #~ test 35: 0.0998101303960320.249129715085319 jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression problem
On 11 mar, 03:06, Terry Reedy tjre...@udel.edu wrote: ... By teaching 'speed before correctness, this site promotes bad programming habits and thinking (and the use of low-level but faster languages). ... This is exactly what your flexible string representation does! And away from technical aspects, you even succeeded to somehow lose unicode compliance. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Controlling number of zeros of exponent in scientific notation
On 6 mar, 15:03, Roy Smith r...@panix.com wrote: In article c2184b42-41be-4930-9501-361296df7...@googlegroups.com, fa...@squashclub.org wrote: Instead of: 1.8e-04 I need: 1.8e-004 So two zeros before the 4, instead of the default 1. Just out of curiosity, what's the use case here? -- from vecmat6 import * from svdecomp6 import * from vmio6 import * mm = NewMat(3, 2) mm[0][0] = 1.0; mm[0][1] = 2.0e-178 mm[1][0] = 3.0; mm[1][1] = 4.0e-1428 mm[2][0] = 5.0; mm[2][1] = 6.0 pr(mm, 'mm =') mm = ( 1.0e+000 2.0e-178 ) ( 3.0e+000 0.0e+000 ) ( 5.0e+000 6.0e+000 ) aa, vv, bbt = SVDecompFull(mm) pr(aa, 'aa =') aa = ( 3.04128e-001 -8.66366e-002 ) ( 9.12385e-001 -2.59910e-001 ) ( -2.73969e-001 -9.61739e-001 ) pr(bbt, 'bbt =') bbt = ( 7.12974e-001 -7.01190e-001 ) ( -7.01190e-001 -7.12974e-001 ) rr = MatMulMatMulMat(aa, vv, bbt) pr(rr, 'rr =') rr = ( 1.0e+000 -1.38778e-015 ) ( 3.0e+000 -4.44089e-016 ) ( 5.0e+000 6.0e+000 ) jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Nuitka now supports Python 3.2
Fascinating software. Some are building, some are destroying. Py33 timeit.repeat({1:'abc需'}) [0.2573893570572636, 0.24261832285651508, 0.24259548003601594] Py323 timeit.repeat({1:'abc需'}) [0.11000708521282831, 0.0994753634273593, 0.09901023634051853] jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Nuitka now supports Python 3.2
On 27 fév, 09:21, jmfauth wxjmfa...@gmail.com wrote: Fascinating software. Some are building, some are destroying. Py33 timeit.repeat({1:'abc需'}) [0.2573893570572636, 0.24261832285651508, 0.24259548003601594] Py323 timeit.repeat({1:'abc需'}) [0.11000708521282831, 0.0994753634273593, 0.09901023634051853] jmf Oops. My bad. (This google). You should read abc需 jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Python Speed
On 27 fév, 23:24, Terry Reedy tjre...@udel.edu wrote: On 2/27/2013 3:21 AM, jmfauth hijacked yet another thread: Some are building, some are destroying. We are still waiting for you to help build a better 3.3+, instead of trying to 'destroy' it with mostly irrelevant cherry-picked benchmarks. Py33 timeit.repeat({1:'abc需'}) [0.2573893570572636, 0.24261832285651508, 0.24259548003601594] On my win system, I get a lower time for this: [0.16579443757208878, 0.1475787649924598, 0.14970205670637426] Py323 timeit.repeat({1:'abc需'}) [0.11000708521282831, 0.0994753634273593, 0.09901023634051853] While I get the same time for 3.2.3. [0.11759353304428544, 0.0948244802968, 0.09532802044164157] It seems that something about Jim's machine does not like 3.3. *nix will probably see even less of a difference. Times are in microseconds, so few programs will ever notice the difference. In the meanwhile ... Effort was put into reducing startup time for 3.3 by making sure that every module imported during startup actual needed to be imported, and into speeding up imports. The startup process is getting a deeper inspection for 3.4http://python.org/dev/peps/pep-0432/ 'Simplifying the CPython startup sequence' with some expectation for further speedup. Also, a real-world benchmark project has been established.http://speed.python.org/ Some work has already been done to port benchmarks to 3.x, but I suspect there is more to do and more volunteers needed. -- Terry Jan Reedy - Terry, As long as you are attempting to work with a composite scheme not working with a unique set of characters, not only it will not work (properly/with efficiency), it can not work. This not even a unicode problem. This is true for every coding scheme. That's why we have, today, all these coding schemes, coding scheme: == set of characters; != set of encoded characters. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Correct handling of case in unicode and regexps
On 23 fév, 15:26, Devin Jeanpierre jeanpierr...@gmail.com wrote: Hi folks, I'm pretty unsure of myself when it comes to unicode. As I understand it, you're generally supposed to compare things in a case insensitive manner by case folding, right? So instead of a.lower() == b.lower() (the ASCII way), you do a.casefold() == b.casefold() However, I'm struggling to figure out how regular expressions should treat case. Python's re module doesn't work properly to my understanding, because: a = 'ss' b = 'ß' a.casefold() == b.casefold() True re.match(re.escape(a), b, re.UNICODE | re.IGNORECASE) # oh dear! In addition, it seems improbable that this ever _could_ work. Because if it did work like that, then what would the value be of re.match('s', 'ß', re.UNICODE | re.IGNORECASE).end() ? 0.5? I'd really like to hear the thoughts of people more experienced with unicode. What is the ideal correct behavior here? Or do I misunderstand things? - I'm just wondering if there is a real issue here. After all, this is only a question of conventions. Unicode has some conventions, re modules may (has to) use some conventions too. It seems to me, the safest way is to preprocess the text, which has to be examinated. Proposed case study: How should be ss/ß/SS/ẞ interpreted? 'Richard-Strauss-Straße' 'Richard-Strauss-Strasse' 'RICHARD-STRAUSS-STRASSE' 'RICHARD-STRAUSS-STRAẞE' There is more or less the same situation with sorting. Unicode can not do all and it may be mandatory to preprocess the input. Eg. This fct I wrote once for the fun. It sorts French words (without unicodedata and locale). import libfrancais z = ['oeuf', 'œuf', 'od', 'of'] zo = libfrancais.sortedfr(z) zo ['od', 'oeuf', 'œuf', 'of'] jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Python Newbie
On 23 fév, 16:43, Steve Simmons square.st...@gmail.com wrote: On 22/02/2013 22:37, piterrr.dolin...@gmail.com wrote: So far I am getting the impression ... My main message to you would be : don't approach Python with a negative attitude, give it a chance and I'm sure you'll come to enjoy it. Until you realize this: Py32: timeit.timeit('abc需') 0.032749386495456466 sys.getsizeof('abc需') 42 Py33: timeit.timeit('abc需') 0.04104208536801017 sys.getsizeof('abc需') 50 Very easy to explain: wrong, incorrect, naive unicode handling. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Python Newbie
On 23 fév, 20:08, Ethan Furman et...@stoneleaf.us wrote: On 02/23/2013 10:44 AM, jmfauth wrote: [snip various stupidities] jmf Peter, jmfauth is one of our resident trolls. Feel free to ignore him. -- ~Ethan~ Sorry, what can say? More memory and slow down! If you see a progress, I'm seeing a regression. Did you test Devanagari canonical decomposition? Probably not. I did it. I wrote probably more tests than any core developper and tests doing precisely what this flexible representation does (not like the tests I saw). That's the good point of all this story. It is not every day that, one has two implementations of the same product, if one wishes to explain, to teach, to illustrate unicode or the coding of the characters in general. Unicode is not different from the other coding schemes and it behaves exactly in the same way. The solely and basic difference lies in the set of the *characters* which is broader. Unicode, the Consortium, uses the term, Abstract Character Repertoire. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: string.replace doesn't removes :
On 13 fév, 21:24, 8 Dihedral dihedral88...@googlemail.com wrote: Rick Johnson於 2013年2月14日星期四UTC+8上午12時34分11秒寫道: On Wednesday, February 13, 2013 1:10:14 AM UTC-6, jmfauth wrote: d = {ord('a'): 'A', ord('b'): '2', ord('c'): 'C'} 'abcdefgabc'.translate(d) 'A2CdefgA2C' def jmTranslate(s, table): ... table = {ord(k):table[k] for k in table} ... return s.translate(table) ... d = {'a': 'A', 'b': '2', 'c': 'C'} jmTranslate('abcdefgabc', d) 'A2CdefgA2C' d = {'a': None, 'b': None, 'c': None} jmTranslate('abcdefgabc', d) 'defg' d = {'a': '€', 'b': '', 'c': ''} jmTranslate('abcdefgabc', d) '€defg€' In python the variables of value types, and the variables of lists and dictionaries are passed to functions somewhat different. This should be noticed by any serious programmer in python. - The purpose of my quick and dirty fct was to show it's possible to create a text replacement fct which is using exclusively text / strings via a dict. (Even if in my exemple, I'm using - and can use - None as an empty string !) You are right. It is also arguable, that beeing forced to have to use a number in order to replace a character, may not be a so good idea. This should be noticed by any serious language designer. More seriously. .translate() is a very nice and underestimated method. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: string.replace doesn't removes :
On 13 fév, 06:26, Rick Johnson rantingrickjohn...@gmail.com wrote: On Tuesday, February 12, 2013 10:44:09 PM UTC-6, Rick Johnson wrote: REFERENCES: [1]: Should string.replace handle list, tuple and dict arguments in addition to strings? py string.replace(('a', 'b', 'c'), 'abcdefgabc') 'defg' [...] And here is a fine example of how a global function architecture can seriously warp your mind! Let me try that again! Hypothetical Examples: py 'abcdefgabc'.replace(('a', 'b', 'c'), ) 'defg' py 'abcdefgabc'.replace(['a', 'b', 'c'], ) 'defg' py 'abcdefgabc'.replace({'a':'A', 'b':'2', 'c':'C'}) 'A2CdefgA2C' Or, an alternative to passing dict where both old and new arguments accept the sequence: py d = {'a':'A', 'b':'2', 'c':'C'} py 'abcdefgabc'.replace(d.keys(), d.values()) 'A2CdefgA2C' Nice thing about dict is you can control both sub-string and replacement-string on a case-by-case basis. But there is going to be a need to apply a single replacement string to a sequence of substrings; like the null string example provided by the OP. (hopefully there's no mistakes this time) d = {ord('a'): 'A', ord('b'): '2', ord('c'): 'C'} 'abcdefgabc'.translate(d) 'A2CdefgA2C' def jmTranslate(s, table): ... table = {ord(k):table[k] for k in table} ... return s.translate(table) ... d = {'a': 'A', 'b': '2', 'c': 'C'} jmTranslate('abcdefgabc', d) 'A2CdefgA2C' d = {'a': None, 'b': None, 'c': None} jmTranslate('abcdefgabc', d) 'defg' d = {'a': '€', 'b': '', 'c': ''} jmTranslate('abcdefgabc', d) '€defg€' jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Curious to see alternate approach on a search/replace via regex
On 7 fév, 04:04, Steven D'Aprano steve +comp.lang.pyt...@pearwood.info wrote: On Wed, 06 Feb 2013 13:55:58 -0800, Demian Brecht wrote: Well, an alternative /could/ be: ... py s = 'http://alongnameofasite1234567.com/q?sports=runa=1b=1' py assert u2f(s) == mangle(s) py py from timeit import Timer py setup = 'from __main__ import s, u2f, mangle' py t1 = Timer('mangle(s)', setup) py t2 = Timer('u2f(s)', setup) py py min(t1.repeat(repeat=7)) 7.2962000370025635 py min(t2.repeat(repeat=7)) 10.981598854064941 py py (10.98-7.29)/10.98 0.33606557377049184 (Timings done using Python 2.6 on my laptop -- your speeds may vary.) [OT] Sorry, but I find all these timeit I see here and there more and more ridiculous. Maybe it's the language itself, which became ridiculous. code: r = repeat(('WHERE IN THE WORLD IS CARMEN?'*10).lower()) print('1:', r) r = repeat(('WHERE IN THE WORLD IS HÉLÈNE?'*10).lower()) print('2:', r) t = Timer(re.sub('CARMEN', 'CARMEN', 'WHERE IN THE WORLD IS CARMEN?'*10), import re) r = t.repeat() print('3:', r) t = Timer(re.sub('HÉLÈNE', 'HÉLÈNE', 'WHERE IN THE WORLD IS HÉLÈNE?'*10), import re) r = t.repeat() print('4:', r) result: c:\python32\pythonw -u vitesse3.py 1: [2.578785478740226, 2.5738459157233833, 2.5739002658825543] 2: [2.57605654937141, 2.5784755252962572, 2.5775366066044896] 3: [11.856728254324088, 11.856321809655501, 11.857456073846905] 4: [12.111787643688231, 12.102743462128771, 12.098514783440208] Exit code: 0 c:\Python33\pythonw -u vitesse3.py 1: [0.6063335264470632, 0.6104798922133946, 0.6078580877959869] 2: [4.080205081267272, 4.079303183698418, 4.0786836706522145] 3: [18.093742209318215, 18.07999618095, 18.07107661757692] 4: [18.852576768615222, 18.841418050790622, 18.840745369110437] Exit code: 0 The future is bright for ... ascii users. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Py3.3 unicode literal and input()
Mea culpa. I had not my head on my shoulders. Inputing if working fine, it returns text correctly. However, and this is something different, I'm a little bit surprised, input() does not handle escaped characters (\u, \U). Workaround: encode() and decode() as raw-unicode-escape. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Py3.3 unicode literal and input()
On Jun 20, 1:21 am, Steven D'Aprano steve +comp.lang.pyt...@pearwood.info wrote: On Mon, 18 Jun 2012 07:00:01 -0700, jmfauth wrote: On 18 juin, 12:11, Steven D'Aprano steve +comp.lang.pyt...@pearwood.info wrote: On Mon, 18 Jun 2012 02:30:50 -0700, jmfauth wrote: On 18 juin, 10:28, Benjamin Kaplan benjamin.kap...@case.edu wrote: The u prefix is only there to make it easier to port a codebase from Python 2 to Python 3. It doesn't actually do anything. It does. I shew it! Incorrect. You are assuming that Python 3 input eval's the input like Python 2 does. That is wrong. All you show is that the one-character string a is not equal to the four-character string u'a', which is hardly a surprise. You wouldn't expect the string 3 to equal the string int('3') would you? -- Steven A string is a string, a piece of text, period. I do not see why a unicode literal and an (well, I do not know how the call it) a normal class str should behave differently in code source or as an answer to an input(). They do not. As you showed earlier, in Python 3.3 the literal strings u'a' and 'a' have the same meaning: both create a one-character string containing the Unicode letter LOWERCASE-A. Note carefully that the quotation marks are not part of the string. They are delimiters. Python 3.3 allows you to create a string by using delimiters: ' ' u' ' u plus triple-quoted versions of the same. The delimiter is not part of the string. They are only there to mark the start and end of the string in source code so that Python can tell the difference between the string a and the variable named a. Note carefully that quotation marks can exist inside strings: my_string = This string has 'quotation marks'. The at the start and end of the string literal are delimiters, not part of the string, but the internal ' characters *are* part of the string. When you read data from a file, or from the keyboard using input(), Python takes the data and returns a string. You don't need to enter delimiters, because there is no confusion between a string (all data you read) and other programming tokens. For example: py s = input(Enter a string: ) Enter a string: 42 py print(s, type(s)) 42 class 'str' Because what I type is automatically a string, I don't need to enclose it in quotation marks to distinguish it from the integer 42. py s = input(Enter a string: ) Enter a string: This string has 'quotation marks'. py print(s, type(s)) This string has 'quotation marks'. class 'str' What you type is exactly what you get, no more, no less. If you type 42, you get the two character string 42 and not the int 42. If you type [1, 2, 3], then you get the nine character string [1, 2, 3] and not a list containing integers 1, 2 and 3. If you type 3**0.5 then you get the six character string 3**0.5 and not the float 1.7320508075688772. If you type u'a' then you get the four character string u'a' and not the single character 'a'. There is nothing new going on here. The behaviour of input() in Python 3, and raw_input() in Python 2, has not changed. Should a user write two derived functions? input_for_entering_text() and input_if_you_are_entering_a_text_as_litteral() If you, the programmer, want to force the user to write input in Python syntax, then yes, you have to write a function to do so. input() is very simple: it just reads strings exactly as typed. It is up to you to process those strings however you wish. -- Steven Python 3.3.0a4 (v3.3.0a4:7c51388a3aa7+, May 31 2012, 20:15:21) [MSC v. 1600 32 bit (Intel)] on win32 --- running smidzero.py... ...smidzero has been executed --- input(':') :éléphant 'éléphant' --- input(':') :u'éléphant' 'éléphant' --- input(':') :u'\u00e9l\xe9phant' 'éléphant' --- input(':') :u'\U00e9léphant' 'éléphant' --- input(':') :\U00e9léphant 'éléphant' --- --- # this is expected --- input(':') :b'éléphant' b'éléphant' --- len(input(':')) :b'éléphant' 11 --- Good news on the ru''/ur'' front: http://bugs.python.org/issue15096 --- Finally I'm just wondering if this unicode_literal reintroduction is not a bad idea. b'these_are_bytes' u'this_is_a_unicode_string' I wrote all my Py2 code in a unicode mode since ... Py2.3 (?). jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Py3.3 unicode literal and input()
On Jun 20, 11:22 am, Christian Heimes li...@cheimes.de wrote: Am 18.06.2012 20:45, schrieb Terry Reedy: The simultaneous reintroduction of 'ur', but with a different meaning than in 2.7, *was* a problem and it should be removed in the next release. FYI:http://hg.python.org/cpython/rev/8e47e9af826e Christian I saw this, not the latest version. Anyway, thanks for the info. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Python equivalent to the A or a output conversions in C
On Jun 19, 9:54 pm, Edward C. Jones edcjo...@comcast.net wrote: On 06/19/2012 12:41 PM, Hemanth H.M wrote: float.hex(x) '0x1.5p+3' Some days I don't ask the brightest questions. Suppose x was a numpy floating scalar (types numpy.float16, numpy.float32, numpy.float64, or numpy.float128). Is there an easy way to write x in binary or hex? I'm not aware about a buitin fct. May be the module struct — Interpret bytes as packed binary data can help. jmf -- http://mail.python.org/mailman/listinfo/python-list
Py3.3 unicode literal and input()
What is input() supposed to return? u'a' == 'a' True r1 = input(':') :a r2 = input(':') :u'a' r1 == r2 False type(r1), len(r1) (class 'str', 1) type(r2), len(r2) (class 'str', 4) --- sys.argv? jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Py3.3 unicode literal and input()
On 18 juin, 10:28, Benjamin Kaplan benjamin.kap...@case.edu wrote: On Mon, Jun 18, 2012 at 1:19 AM, jmfauth wxjmfa...@gmail.com wrote: What is input() supposed to return? u'a' == 'a' True r1 = input(':') :a r2 = input(':') :u'a' r1 == r2 False type(r1), len(r1) (class 'str', 1) type(r2), len(r2) (class 'str', 4) --- sys.argv? jmf Python 3 made several backwards-incompatible changes over Python 2. First of all, input() in Python 3 is equivalent to raw_input() in Python 2. It always returns a string. If you want the equivalent of Python 2's input(), eval the result. Second, Python 3 is now unicode by default. The str class is a unicode string. There is a separate bytes class, denoted by b, for byte strings. The u prefix is only there to make it easier to port a codebase from Python 2 to Python 3. It doesn't actually do anything. It does. I shew it! Related: http://groups.google.com/group/comp.lang.python/browse_thread/thread/3aefd602507d2fbe# http://mail.python.org/pipermail/python-dev/2012-June/120341.html jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Py3.3 unicode literal and input()
On 18 juin, 12:11, Steven D'Aprano steve +comp.lang.pyt...@pearwood.info wrote: On Mon, 18 Jun 2012 02:30:50 -0700, jmfauth wrote: On 18 juin, 10:28, Benjamin Kaplan benjamin.kap...@case.edu wrote: The u prefix is only there to make it easier to port a codebase from Python 2 to Python 3. It doesn't actually do anything. It does. I shew it! Incorrect. You are assuming that Python 3 input eval's the input like Python 2 does. That is wrong. All you show is that the one-character string a is not equal to the four-character string u'a', which is hardly a surprise. You wouldn't expect the string 3 to equal the string int('3') would you? -- Steven A string is a string, a piece of text, period. I do not see why a unicode literal and an (well, I do not know how the call it) a normal class str should behave differently in code source or as an answer to an input(). Should a user write two derived functions? input_for_entering_text() and input_if_you_are_entering_a_text_as_litteral() --- Side effect from the unicode litteral reintroduction. I do not mind about this, but I expect it does work logically and correctly. And it does not. PS English is not my native language. I never know to reply to an (interro)-negative sentence. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Py3.3 unicode literal and input()
Thinks are very clear to me. I wrote enough interactive interpreters with all available toolkits for Windows since I know Python (v. 1.5.6). I do not see why the semantic may vary differently in code source or in an interactive interpreter, esp. if Python allow it! If you have to know by advance what an end user is supposed to type and/or check it ('str' or unicode literal) in order to know if the answer has to be evaluated or not, then it is better to reintroduce input() and raw_input(). jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Py3.3 unicode literal and input()
We are turning in circles. You are somehow legitimating the reintroduction of unicode literals and I shew, not to say proofed, it may be a source of problems. Typical Python desease. Introduce a problem, then discuss how to solve it, but surely and definitivly do not remove that problem. As far as I know, Python 3.2 is working very well. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Py3.3 unicode literal and input()
On Jun 18, 8:45 pm, Terry Reedy tjre...@udel.edu wrote: On 6/18/2012 12:39 PM, jmfauth wrote: We are turning in circles. You are, not we. Please stop. You are somehow legitimating the reintroduction of unicode literals We are not 'reintroducing' unicode literals. In Python 3, string literals *are* unicode literals. Other developers reintroduced a now meaningless 'u' prefix for the purpose of helping people write 23 code that runs on both Python 2 and Python 3. Read about it herehttp://python.org/dev/peps/pep-0414/ In Python 3.3, 'u' should *only* be used for that purpose and should be ignored by anyone not writing or editing 23 code. If you are not writing such code, ignore it. and I shew, not to say proofed, it may be a source of problems. You are the one making it be a problem. Typical Python desease. Introduce a problem, then discuss how to solve it, but surely and definitivly do not remove that problem. The simultaneous reintroduction of 'ur', but with a different meaning than in 2.7, *was* a problem and it should be removed in the next release. As far as I know, Python 3.2 is working very well. Except that many public libraries that we would like to see ported to Python 3 have not been. The purpose of reintroducing 'u' is to encourage more porting of Python 2 code. Period. -- Terry Jan Reedy It's a matter of perspective. I expected to have finally a clean Python, the goal is missed. I have nothing to object. It is your (core devs) project, not mine. At least, you understood my point of view. I'm a more than two decades TeX user. At the release of XeTeX (a pure unicode TeX-engine), the devs had, like Python2/3, to make anything incompatible. A success. It did not happen a week without seeing a updated package or a refreshed documentation. Luckily for me, Xe(La)TeX is more important than Python. As a scientist, Python is perfect. From an educational point of view, I'm becoming more and more skeptical about this language, a moving target. Note that I'm not complaining, only desappointed. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Python 3.3.0a4, please add ru'...'
On 17 juin, 13:30, Christian Heimes li...@cheimes.de wrote: Am 16.06.2012 19:36, schrieb jmfauth: Please consistency. Python 3.3 supports the ur syntax just as Python 2.x: $ ./python Python 3.3.0a4+ (default:4c704dc97496, Jun 16 2012, 00:06:09) [GCC 4.6.3] on linux Type help, copyright, credits or license for more information. ur '' [73917 refs] Neither Python 2 nor Python 3 supports ru. I'm a bit astonished that rb works in Python 3 as it doesn't work in Python 2.7. But br works everywhere. Christian I noticed this at the 3.3.0a0 realease. The main motivation for this came from this: http://bugs.python.org/issue13748 PS I saw the dev-list message. PS2 Opinion, if not really useful, consistency nver hurts. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Python 3.3.0a4, please add ru'...'
On 17 juin, 15:48, Christian Heimes li...@cheimes.de wrote: Am 17.06.2012 14:11, schrieb jmfauth: I noticed this at the 3.3.0a0 realease. The main motivation for this came from this: http://bugs.python.org/issue13748 PS I saw the dev-list message. PS2 Opinion, if not really useful, consistency nver hurts. We are must likely drop the ur syntax as it's not compatible with Python 2.x's raw unicode notation.http://bugs.python.org/issue15096 Christian Yes, but on the other side, you (core developers) have reintroduced the messs of the unicode literal, now *assume* it (logiccally). If the core developers have introduced rb'' or br' (Py2)' because they never know if the have to type rb or br (me too), what a beginner should thing about ur and ru? Finally, the ultimate argument: what it is Python 3 supposed to be? A Python 2 derivative for lazy (ascii) programmers or an appealing clean and coherent language? jmf -- http://mail.python.org/mailman/listinfo/python-list
Python 3.3.0a4, please add ru'...'
Please consistency. sys.version '3.3.0a4 (v3.3.0a4:7c51388a3aa7+, May 31 2012, 20:15:21) [MSC v.1600 32 bit (Intel)]' 'a' 'a' b'a' b'a' br'a' b'a' rb'a' b'a' u'a' 'a' ur'a' 'a' ru'a' SyntaxError: invalid syntax jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: python3 raw strings and \u escapes
On 30 mai, 13:54, Thomas Rachel nutznetz-0c1b6768-bfa9-48d5- a470-7603bd3aa...@spamschutz.glglgl.de wrote: Am 30.05.2012 08:52 schrieb ru...@yahoo.com: This breaks a lot of my code because in python 2 re.split (ur'[\u3000]', u'A\u3000A') == [u'A', u'A'] but in python 3 (the result of running 2to3), re.split (r'[\u3000]', 'A\u3000A' ) == ['A\u3000A'] I can remove the r prefix from the regex string but then if I have other regex backslash symbols in it, I have to double all the other backslashes -- the very thing that the r-prefix was invented to avoid. Or I can leave the r prefix and replace something like r'[ \u3000]' with r'[ ]'. But that is confusing because one can't distinguish between the space character and the ideographic space character. It also a problem if a reader of the code doesn't have a font that can display the character. Was there a reason for dropping the lexical processing of \u escapes in strings in python3 (other than to add another annoyance in a long list of python3 annoyances?) Probably it is more consequent. Alas, it makes the whole stuff incompatible to Py2. But if you think about it: why allow for \u if \r, \n etc. are disallowed as well? And is there no choice for me but to choose between the two poor choices I mention above to deal with this problem? There is a 3rd one: use r'[ ' + '\u3000' + ']'. Not very nice to read, but should do the trick... Thomas I suggest to take the problem differently. Python 3 succeeded to put order in the missmatch of the coding of the characters Python 2 was proposing. In your case, the import unicodedata as ud ud.name('\u3000') 'IDEOGRAPHIC SPACE' character (in fact a unicode code point), is just a character as a ud.name('a') 'LATIN SMALL LETTER A' The code point / unicode logic, Python 3 proposes and follows, becomes just straightforward. s = 'a\u3000é\u3000€' s.split('\u3000') ['a', 'é', '€'] import re re.split('\u3000', s) ['a', 'é', '€'] The backslash, used as real backslash, remains what it really was in Python 2. Note, the absence of r'...' . s = 'a\\b\\c' print(s) a\b\c s.split('\\') ['a', 'b', 'c'] re.split('', s) ['a', 'b', 'c'] hex(ord('\\')) '0x5c' re.split('\u005c\u005c', s) ['a', 'b', 'c'] jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: python3 raw strings and \u escapes
On 30 mai, 08:52, ru...@yahoo.com ru...@yahoo.com wrote: In python2, \u escapes are processed in raw unicode strings. That is, ur'\u3000' is a string of length 1 consisting of the IDEOGRAPHIC SPACE unicode character. In python3, \u escapes are not processed in raw strings. r'\u3000' is a string of length 6 consisting of a backslash, 'u', '3' and three '0' characters. This breaks a lot of my code because in python 2 re.split (ur'[\u3000]', u'A\u3000A') == [u'A', u'A'] but in python 3 (the result of running 2to3), re.split (r'[\u3000]', 'A\u3000A' ) == ['A\u3000A'] I can remove the r prefix from the regex string but then if I have other regex backslash symbols in it, I have to double all the other backslashes -- the very thing that the r-prefix was invented to avoid. Or I can leave the r prefix and replace something like r'[ \u3000]' with r'[ ]'. But that is confusing because one can't distinguish between the space character and the ideographic space character. It also a problem if a reader of the code doesn't have a font that can display the character. Was there a reason for dropping the lexical processing of \u escapes in strings in python3 (other than to add another annoyance in a long list of python3 annoyances?) And is there no choice for me but to choose between the two poor choices I mention above to deal with this problem? I suggest to take the problem differently. Python 3 succeeded to put order in the missmatch of the coding of the characters Python 2 was proposing. The 'IDEOGRAPHIC SPACE' and 'REVERSE SOLIDUS' (backslash) characters (in fact unicode code points) are just (normal) characters. The backslash, used as an escaping command, keeps its function. Note the absence of r'...' s = 'a\u3000é\u3000€' s.split('\u3000') ['a', 'é', '€'] import re re.split('\u3000', s) ['a', 'é', '€'] s = 'a\\b\\c' print(s) a\b\c s.split('\\') ['a', 'b', 'c'] re.split('', s) ['a', 'b', 'c'] hex(ord('\\')) '0x5c' re.split('\u005c\u005c', s) ['a', 'b', 'c'] jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: str.isnumeric and Cuneiforms
On 17 mai, 21:32, Marco marc...@nsgmail.com wrote: Is it normal the str.isnumeric() returns False for these Cuneiforms? '\U00012456' '\U00012457' '\U00012432' '\U00012433' They are all in the Nl category. Indeed there are, but Unicode (ver. 5.0.0) does not assign numeric values to these code points. Do not ask me, why? jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: str.isnumeric and Cuneiforms
On 18 mai, 17:08, Marco Buttu name.surn...@gmail.com wrote: On 05/17/2012 09:32 PM, Marco wrote: Is it normal the str.isnumeric() returns False for these Cuneiforms? '\U00012456' '\U00012457' '\U00012432' '\U00012433' They are all in the Nl category. Marco It's ok, I found that they don't have a number assigned in theftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txtdatabase. -- Marco Good. I was about to send this information. I have all this (not updated) stuff locally on my hd. -- http://mail.python.org/mailman/listinfo/python-list
Re: str.isnumeric and Cuneiforms
On 18 mai, 17:08, Marco Buttu name.surn...@gmail.com wrote: On 05/17/2012 09:32 PM, Marco wrote: Is it normal the str.isnumeric() returns False for these Cuneiforms? '\U00012456' '\U00012457' '\U00012432' '\U00012433' They are all in the Nl category. Marco It's ok, I found that they don't have a number assigned in theftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txtdatabase. -- Marco Non official but really practical: http://www.fileformat.info/info/unicode/index.htm -- http://mail.python.org/mailman/listinfo/python-list
Re: Difference between str.isdigit() and str.isdecimal() in Python 3
On 16 mai, 17:48, Marco marc...@nsgmail.com wrote: Hi all, because There should be one-- and preferably only one --obvious way to do it, there should be a difference between the two methods in the subject, but I can't find it: '123'.isdecimal(), '123'.isdigit() (True, True) print('\u0660123') ٠123 '\u0660123'.isdigit(), '\u0660123'.isdecimal() (True, True) print('\u216B') Ⅻ '\u216B'.isdecimal(), '\u216B'.isdigit() (False, False) Can anyone give me some help? Regards, Marco It seems to me that it is correct, and the reason lies in this: import unicodedata as ud ud.category('\u216b') 'Nl' ud.category('1') 'Nd' # Note ud.numeric('\u216b') 12.0 jmf -- http://mail.python.org/mailman/listinfo/python-list
On u'Unicode string literals' (Py3)
For those who do not know: The u'' string literal trick has never worked in Python 2. sys.version '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' print u'Un oeuf à zéro EURO uro' Un uf à zéro uro jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: On u'Unicode string literals' reintroduction (Py3)
On 29 fév, 14:45, jmfauth wxjmfa...@gmail.com wrote: For those who do not know: The u'' string literal trick has never worked in Python 2. sys.version '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' print u'Un oeuf à zéro EURO uro' Un uf à zéro uro jmf Sorry, I just wanted to show a small example. I semms Google as changed again. You should read (2nd attempt) u'Un œuf à zéro €' with the *correct* typed glyphs 'LATIN SMALL LIGATURE OE' in œuf and 'EURO SIGN' in '€uro'. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Python math is off by .000000000000045
On 25 fév, 23:51, Steven D'Aprano steve +comp.lang.pyt...@pearwood.info wrote: On Sat, 25 Feb 2012 13:25:37 -0800, jmfauth wrote: (2.0).hex() '0x1.0p+1' (4.0).hex() '0x1.0p+2' (1.5).hex() '0x1.8p+0' (1.1).hex() '0x1.1999ap+0' jmf What's your point? I'm afraid my crystal ball is out of order and I have no idea whether you have a question or are just demonstrating your mastery of copy and paste from the Python interactive interpreter. It should be enough to indicate the right direction for casual interested readers. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python math is off by .000000000000045
(2.0).hex() '0x1.0p+1' (4.0).hex() '0x1.0p+2' (1.5).hex() '0x1.8p+0' (1.1).hex() '0x1.1999ap+0' jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: distutils bdist_wininst failure on Linux
On 23 fév, 15:06, Steven D'Aprano steve +comp.lang.pyt...@pearwood.info wrote: Following instructions here: http://docs.python.org/py3k/distutils/builtdist.html#creating-windows... I am trying to create a Windows installer for a pure-module distribution using Python 3.2. I get a LookupError: unknown encoding: mbcs Here is the full output of distutils and the traceback: [steve@ando pyprimes]$ python3.2 setup.py bdist_wininst running bdist_wininst running build running build_py creating build/lib copying src/pyprimes.py - build/lib installing to build/bdist.linux-i686/wininst running install_lib creating build/bdist.linux-i686/wininst creating build/bdist.linux-i686/wininst/PURELIB copying build/lib/pyprimes.py - build/bdist.linux-i686/wininst/PURELIB running install_egg_info Writing build/bdist.linux-i686/wininst/PURELIB/pyprimes-0.1.1a-py3.2.egg-info creating '/tmp/tmp3utw4_.zip' and adding '.' to it adding 'PURELIB/pyprimes.py' adding 'PURELIB/pyprimes-0.1.1a-py3.2.egg-info' creating dist Warning: Can't read registry to find the necessary compiler setting Make sure that Python modules winreg, win32api or win32con are installed. Traceback (most recent call last): File setup.py, line 60, in module License :: OSI Approved :: MIT License, File /usr/local/lib/python3.2/distutils/core.py, line 148, in setup dist.run_commands() File /usr/local/lib/python3.2/distutils/dist.py, line 917, in run_commands self.run_command(cmd) File /usr/local/lib/python3.2/distutils/dist.py, line 936, in run_command cmd_obj.run() File /usr/local/lib/python3.2/distutils/command/bdist_wininst.py, line 179, in run self.create_exe(arcname, fullname, self.bitmap) File /usr/local/lib/python3.2/distutils/command/bdist_wininst.py, line 262, in create_exe cfgdata = cfgdata.encode(mbcs) LookupError: unknown encoding: mbcs How do I fix this, and is it a bug in distutils? -- Steven Because the 'mbcs' codec is missing in your Linux, :-) 'abc需'.encode('cp1252') b'abc\xe9\x9c\x80' 'abc需'.encode('missing') Traceback (most recent call last): File eta last command, line 1, in module LookupError: unknown encoding: missing jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: format a measurement result and its error in scientific way
On 16 fév, 01:18, Daniel Fetchinson fetchin...@googlemail.com wrote: Hi folks, often times in science one expresses a value (say 1.03789291) and its error (say 0.00089) in a short way by parentheses like so: 1.0379(9) Before swallowing any Python solution, you should realize, the values (value, error) you are using are a non sense : 1.03789291 +/- 0.00089 You express more precision in the value than in the error. --- As ex, in a 1.234(5) notation, the () is usually used to indicate the accuracy of the digit in (). Eg 1.345(7) Typographically, the () is sometimes replaced by a bold digit ou a subscripted digit. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: format a measurement result and its error in scientific way
On 17 fév, 11:03, Daniel Fetchinson fetchin...@googlemail.com wrote: Hi folks, often times in science one expresses a value (say 1.03789291) and its error (say 0.00089) in a short way by parentheses like so: 1.0379(9) Before swallowing any Python solution, you should realize, the values (value, error) you are using are a non sense : 1.03789291 +/- 0.00089 You express more precision in the value than in the error. My impression is that you didn't understand the original problem: given an arbitrary value to arbitrary digits and an arbitrary error, find the relevant number of digits for the value that makes sense for the given error. So what you call non sense is part of the problem to be solved. I do not know where these numbers (value, error) are coming from. But, when the value and the error have not the same precision, there is already something wrong somewhere. And this, *prior* to any representation of these values/numbers. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Python usage numbers
On 13 fév, 04:09, Terry Reedy tjre...@udel.edu wrote: * The new internal unicode scheme for 3.3 is pretty much a mixture of the 3 storage formats (I am of course, skipping some details) by using the widest one needed for each string. The advantage is avoiding problems with each of the three. The disadvantage is greater internal complexity, but that should be hidden from users. They will not need to care about the internals. They will be able to forget about 'narrow' versus 'wide' builds and the possible requirement to code differently for each. There will only be one scheme that works the same on all platforms. Most apps should require less space and about the same time. -- Python 2 was built for ascii users. Now, Python 3(.3) is *optimized* for the ascii users. And the rest of the crowd? Not so sure, French users (among others) who can not write their texts will iso-8859-1/latin1 will be very happy. No doubts, it will work. Is this however the correct approach? jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Python usage numbers
There is so much to say on the subject, I do not know where to start. Some points. Today, Sunday, 12 February 2012, 90%, if not more, of the Python applications supposed to work with text and I'm toying with are simply not working. Two reasons: 1) Most of the devs understand nothing or not enough on the field of the coding of the characters. 2) In gui applications, most of the devs understand nothing or not enough in the keyboard keys/chars handling. --- I know Python since version 1.5.2 or 1.5.6 (?). Among the applications I wrote, my fun is in writing GUI interactive interpreters with Python 2 or 3, tkinter, Tkinter, wxPython, PySide, PyQt4 on Windows. Believe or not, my interactive interpreters are the only ones where I can enter text and where text is displayed correctly. IDLE, wxPython/PyShell, DrPython, ... all are failing. (I do not count console applications). Python popularity? I have no popularity-meter. What I know: I can not type French text in IDLE on Windows. It is like this since ~ten years and I never saw any complain about this. (The problem in bad programmation). Ditto for PyShell in wxPython. I do not count, the number of corrections I proposed. In one version, it takes me 18 months until finally decided to propose a correction. During this time, I never heard of the problem. (Now, it is broken again). --- Is there a way to fix this actual status? - Yes, and *very easily*. Will it be fixed? - No, because there is no willingness to solve it. --- Roy Smith's quote: ... that we'll all just be using UTF-32, ... Considering PEP 393, Python is not taking this road. --- How many devs know, one can not write text in French with the iso-8859-1 coding? (see pep 393) How can one explain, corporates like MS or Apple with their cp1252 or mac-roman codings succeeded to know this? Ditto for foundries (Adobe, LinoType, ...) --- Python is 20 years old. It was developped with ascii in mind. Python was not born, all this stuff was already a no problem with Windows and VB. Even a step higher, Windows was no born, this was a no problem at DOS level (eg TurboPascal), 30 years ago! Design mistake. --- Python 2 introduced the unicode type. Very nice. Problem. The introduction of the automatic coercion ascii-unicode, which somehow breaks everything. Very bad design mistake. (In my mind, the biggest one). --- One day, I fell on the web on a very old discussion about Python related to the introduction of unicode in Python 2. Something like: Python core dev (it was VS or AP): ... lets go with ucs-4 and we have no problem in the future Look at the situation today. --- And so one. --- Conclusion. A Windows programmer is better served by downloading VB.NET Express. A end Windows user is better served with an application developped with VB.NET Express. I find somehow funny, Python is able to produce this: (1.1).hex() '0x1.1999ap+0' and on the other side, Python, Python applications, are not able to deal correctly with text entering and text displaying. Probably, the two most important tasks a computer has to do! jmf PS I'm not a computer scientist, only a computer user. -- http://mail.python.org/mailman/listinfo/python-list
Re: changing sys.path
On 2 fév, 11:03, Andrea Crotti andrea.crott...@gmail.com wrote: On 02/02/2012 12:51 AM, Steven D'Aprano wrote: On Wed, 01 Feb 2012 17:47:22 +, Andrea Crotti wrote: Yes they are exactly the same, because in that file I just write exactly the same list, but when modifying it at run-time it doesn't work, while if at the application start there is this file everything works correctly... That's what really puzzles me.. What could that be then? Are you using IDLE or WingIDE or some other IDE which may not be honouring sys.path? If so, that's a BAD bug in the IDE. Are you changing the working directory manually, by calling os.chdir? If so, that could be interfering with the import somehow. It shouldn't, but you never know... Are you adding absolute paths or relative paths? No, no and absolute paths.. You say that you get an ImportError, but that covers a lot of things going wrong. Here's a story. Could it be correct? I can't tell because you haven't posted the traceback. When you set site-packages/my_paths.pth you get a sys path that looks like ['a', 'b', 'fe', 'fi', 'fo', 'fum']. You then call import spam which locates b/spam.py and everything works. But when you call sys.path.extend(['a', 'b']) you get a path that looks like ['fe', 'fi', 'fo', 'fum', 'a', 'b']. Calling import spam locates some left over junk file, fi/spam.py or fi/spam.pyc, which doesn't import, and you get an ImportError. And no the problem is not that I already checked inspecting at run-time.. This is the traceback and it might be related to the fact that it runs from the .exe wrapper generated by setuptools: Traceback (most recent call last): File c:\python25\scripts\dev_main-script.py, line 8, in module load_entry_point('psi.devsonly==0.1', 'console_scripts', 'dev_main')() File h:\git_projs\psi\psi.devsonly\psi\devsonly\bin\dev_main.py, line 152, in main Develer(ns).full_run() File h:\git_projs\psi\psi.devsonly\psi\devsonly\bin\dev_main.py, line 86, in full_run run(project_name, test_only=self.ns.test_only) File h:\git_projs\psi\psi.devsonly\psi\devsonly\environment.py, line 277, in run from psi.devsonly.run import Runner File h:\git_projs\psi\psi.devsonly\psi\devsonly\run.py, line 7, in module from psi.workbench.api import Workbench, set_new_dev_main ImportError: No module named workbench.api Another thing which might matter is that I'm launching Envisage applications, which heavily rely on the use of entry points, so I guess that if something is not in the path the entry point is not loaded automatically (but it can be forced I guess somehow). I solved in another way now, since I also need to keep a dev_main.pth in site-packages to make Eclipse happy, just respawning the same process on ImportError works already perfectly.. There is something strange here. I can not figure out how correct code will fail with the sys.path. It seems to me, the lib you are using is somehow not able to recognize its own structure (his own sys.path). Idea. Are you sure you are modifying the sys.path at the right place, understand at the right time when Python processes? I'm using this sys.path tweaking at run time very often; eg to test or to run different versions of the same lib residing in different dirs, and this, in *any* dir and independently of *any* .pth file. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: changing sys.path
On 1 fév, 17:15, Andrea Crotti andrea.crott...@gmail.com wrote: So suppose I want to modify the sys.path on the fly before running some code which imports from one of the modules added. at run time I do sys.path.extend(paths_to_add) but it still doesn't work and I get an import error. If I take these paths and add them to site-packages/my_paths.pth everything works, but at run-time the paths which I actually see before importing are exactly the same. So there is something I guess that depends on the order, but what can I reset/reload to make these paths available (I thought I didn't need anything in theory)? import mod Traceback (most recent call last): File eta last command, line 1, in module ImportError: No module named mod sys.path.append(r'd:\\jm\\junk') import mod mod module 'mod' from 'd:\\jm\\junk\mod.py' mod.hello() fct hello in mod.py sys.path? Probably, the most genious Python idea. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: sys.argv as a list of bytes
In short: if you need to write system scripts on Unix, and you need them to work reliably, you need to stick with Python 2.x. I think, understanding the coding of the characters helps a bit. I can not figure out how the example below could not be done on other systems. D:\tmpchcp Page de codes active : 1252 D:\tmpc:\python32\python.exe sysarg.py a b é € \u0430 \u03b1 z arg: 1 unicode name: LATIN SMALL LETTER A arg: 2 unicode name: LATIN SMALL LETTER B arg: 3 unicode name: LATIN SMALL LETTER E WITH ACUTE arg: 4 unicode name: EURO SIGN arg: 5 unicode name: CYRILLIC SMALL LETTER A arg: 6 unicode name: GREEK SMALL LETTER ALPHA arg: 7 unicode name: LATIN SMALL LETTER Z jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: NaN, Null, and Sorting
On 13 jan, 20:04, Ethan Furman et...@stoneleaf.us wrote: With NaN, it is possible to get a list that will not properly sort: -- NaN = float('nan') -- spam = [1, 2, NaN, 3, NaN, 4, 5, 7, NaN] -- sorted(spam) [1, 2, nan, 3, nan, 4, 5, 7, nan] I'm constructing a Null object with the semantics that if the returned object is Null, it's actual value is unknown. Short answer. - NaN != NA() - I find the actual implementation (Py3.2) quite satisfying. (M. Dickinson's work) jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: UnicodeEncodeError in compile
On 11 jan, 01:56, Terry Reedy tjre...@udel.edu wrote: On 1/10/2012 8:43 AM, jmfauth wrote: D:\c:\python32\python.exe Python 3.2.2 (default, Sep 4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)] on win 32 Type help, copyright, credits or license for more information. '\u5de5'.encode('utf-8') b'\xe5\xb7\xa5' '\u5de5'.encode('mbcs') Traceback (most recent call last): File stdin, line 1, inmodule UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: inval id character D:\c:\python27\python.exe Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win 32 Type help, copyright, credits or license for more information. u'\u5de5'.encode('utf-8') '\xe5\xb7\xa5' u'\u5de5'.encode('mbcs') '?' mbcs encodes according to the current codepage. Only the chinese codepage(s) can encode the chinese char. So the unicode error is correct and 2.7 has a bug in that it is doing errors='replace' when it supposedly is doing errors='strict'. The Py3 fix was done inhttp://bugs.python.org/issue850997 2.7 was intentionally left alone because of back-compatibility considerations. (None of this addresses the OP's question.) -- Ok. I was not aware of this. PS Prev. post gets lost. -- http://mail.python.org/mailman/listinfo/python-list
Re: UnicodeEncodeError in compile
On 11 jan, 01:56, Terry Reedy tjre...@udel.edu wrote: On 1/10/2012 8:43 AM, jmfauth wrote: ... mbcs encodes according to the current codepage. Only the chinese codepage(s) can encode the chinese char. So the unicode error is correct and 2.7 has a bug in that it is doing errors='replace' when it supposedly is doing errors='strict'. The Py3 fix was done inhttp://bugs.python.org/issue850997 2.7 was intentionally left alone because of back-compatibility considerations. (None of this addresses the OP's question.) -- win7, cp1252 Ok. I was not aware of this. '\N{CYRILLIC SMALL LETTER A}'.encode('mbcs') Traceback (most recent call last): File eta last command, line 1, in module UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character '\N{GREEK SMALL LETTER ALPHA}'.encode('mbcs') Traceback (most recent call last): File eta last command, line 1, in module UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: UnicodeEncodeError in compile
1) If I copy/paste these CJK chars from Google Groups in two of my interactive interpreters (no dos/cmd console), I have no problem. import unicodedata as ud ud.name('工') 'CJK UNIFIED IDEOGRAPH-5DE5' ud.name('具') 'CJK UNIFIED IDEOGRAPH-5177' hex(ord(('工'))) '0x5de5' hex(ord('具')) '0x5177' 2) It semms the mbcs codec has some difficulties with these chars. '\u5de5'.encode('mbcs') Traceback (most recent call last): File eta last command, line 1, in module UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character '\u5de5'.encode('utf-8') b'\xe5\xb7\xa5' '\u5de5'.encode('utf-32-be') b'\x00\x00]\xe5' 3) On the usage of mbcs in files IO interaction -- core devs. My conclusion. The bottle neck is on the mbcs side. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: UnicodeEncodeError in compile
On 10 jan, 11:53, 8 Dihedral dihedral88...@googlemail.com wrote: Terry Reedy於 2012年1月10日星期二UTC+8下午4時08分40秒寫道: I get the same error running 3.2.2 under IDLE but not when pasting into Command Prompt. However, Command Prompt may be cheating by replacing the Chinese chars with '??' upon pasting, so that Python never gets them -- whereas they appear just fine in IDLE. -- Tested with *my* Windows GUI interactive intepreters. It seems to me there is a problem with the mbcs codec. hex(ord('工')) '0x5de5' '\u5de5' '工' '\u5de5'.encode('mbcs') Traceback (most recent call last): File eta last command, line 1, in module UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character '\u5de5'.encode('utf-8') b'\xe5\xb7\xa5' '\u5de5'.encode('utf-32-be') b'\x00\x00]\xe5' sys.version '3.2.2 (default, Sep 4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)]' '\u5de5'.encode('mbcs', 'replace') b'?' -- u'\u5de5'.encode('mbcs', 'replace') '?' repr(u'\u5de5'.encode('utf-8')) '\\xe5\\xb7\\xa5' repr(u'\u5de5'.encode('utf-32-be')) '\\x00\\x00]\\xe5' sys.version '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: UnicodeEncodeError in compile
On 10 jan, 13:28, jmfauth wxjmfa...@gmail.com wrote: Addendum, Python console (dos box) D:\c:\python32\python.exe Python 3.2.2 (default, Sep 4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)] on win 32 Type help, copyright, credits or license for more information. '\u5de5'.encode('utf-8') b'\xe5\xb7\xa5' '\u5de5'.encode('mbcs') Traceback (most recent call last): File stdin, line 1, in module UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: inval id character ^Z D:\c:\python27\python.exe Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win 32 Type help, copyright, credits or license for more information. u'\u5de5'.encode('utf-8') '\xe5\xb7\xa5' u'\u5de5'.encode('mbcs') '?' ^Z D:\ jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: How to support a non-standard encoding?
On 6 jan, 11:03, Ivan i...@llaisdy.com wrote: Dear All I'm developing a python application for which I need to support a non-standard character encoding (specifically ISO 6937/2-1983, Addendum 1-1989). Here are some of the properties of the encoding and its use in the application: - I need to read and write data to/from files. The file format includes two sections in different character encodings (so I shan't be able to use codecs.open()). - iso-6937 sections include non-printing control characters - iso-6937 is a variable width encoding, e.g. A = [41], Ä = [0xC8, 0x41]; all non-spacing diacritical marks are in the range 0xC0-0xCF. By any chance is there anyone out there working on iso-6937? Otherwise, I think I need to write a new codec to support reading and writing this data. Does anyone know of any tutorials or blog posts on implementing a codec for a non-standard characeter encoding? Would anyone be interested in reading one? Take a look at the files, Python modules, in the ...\Lib\encodings. This is the place where all codecs are centralized. Python is magically using these a long there are present in that dir. I remember, long time ago, for the fun, I created such a codec quite easily. I picked up one of the file as template and I modified its table. It was a byte - byte table. For multibytes coding scheme, it may be a litte bit more complicated; you may take a look, eg, at the mbcs.py codec. The distibution of such a codec may be a problem. Another simple approach, os independent. You probably do not write your code in iso-6937, but you only need to encode/decode some bytes sequence on the fly. In that case, work with bytes, create a couple of coding / decoding functions with a created dict [*] as helper. It's not so complicate. Use unicode Py2 or str Py3 (the recommended way ;-) ) as pivot encoding. [*] I also created once a such a dict from # http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt I never checked if it does correpond to the official cp1252 codec. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Python 2 or 3
On 3 déc, 04:54, Antti J Ylikoski antti.yliko...@tkk.fi wrote: Helsinki, Finland, the EU sys.version '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' 'éléphant' '\xe9l\xe9phant' sys.version '3.2.2 (default, Sep 4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)]' 'éléphant' 'éléphant' jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: encoding problem with BeautifulSoup - problem when writing parsed text to file
On 6 oct, 06:39, Greg gregor.hochsch...@googlemail.com wrote: Brilliant! It worked. Thanks! Here is the final code for those who are struggling with similar problems: ## open and decode file # In this case, the encoding comes from the charset argument in a meta tag # e.g. meta charset=iso-8859-2 fileObj = open(filePath,r).read() fileContent = fileObj.decode(iso-8859-2) fileSoup = BeautifulSoup(fileContent) ## Do some BeautifulSoup magic and preserve unicode, presume result is saved in 'text' ## ## write extracted text to file f = open(outFilePath, 'w') f.write(text.encode('utf-8')) f.close() or (Python2/Python3) import io with io.open('abc.txt', 'r', encoding='iso-8859-2') as f: ... r = f.read() ... repr(r) u'a\nb\nc\n' with io.open('def.txt', 'w', encoding='utf-8-sig') as f: ... t = f.write(r) ... f.closed True jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: How do I automate the removal of all non-ascii characters from my code?
On 12 sep, 23:39, Rhodri James rho...@wildebst.demon.co.uk wrote: Now read what Steven wrote again. The issue is that the program contains characters that are syntactically illegal. The engine can be perfectly correctly translating a character as a smart quote or a non breaking space or an e-umlaut or whatever, but that doesn't make the character legal! Yes, you are right. I did not understand in that way. However, a small correction/precision. Illegal character do not exit. One can only have an ill-formed encoded code points or an illegal encoded code point representing a character/glyph. Basically, in the present case. The issue is most probably a mismatch between the coding directive and the real coding, with no coding directive == 'ascii'. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: How do I automate the removal of all non-ascii characters from my code?
On 13 sep, 10:15, Steven D'Aprano steve +comp.lang.pyt...@pearwood.info wrote: The intrinsic coding of the characters is one thing, The usage of bytes stream supposed to represent a text is one another thing, jmf -- http://mail.python.org/mailman/listinfo/python-list