Re: Unicode issue with Python v3.3
Hello Cameron, Did you received my yesterday's mail? -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Τη Κυριακή, 14 Απριλίου 2013 12:28:32 μ.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε: On 13Apr2013 23:00, nagia.rets...@gmail.com nagia.rets...@gmail.com wrote: | root@nikos [/home/nikos/public_html/foo-py]# pwd | /home/nikos/public_html/foo-py | root@nikos [/home/nikos/public_html/foo-py]# cat foo.py | #!/bin/sh | exec 2/home/nikos/cgi.err.out | echo $0 $* 2 | id 2 | env | sort 2 | set -x | exec /full/path/to/foo-py ${1+$@} | | root@nikos [/home/nikos/public_html/foo-py]# python3 foo.py | File foo.py, line 2 | exec 2/home/nikos/cgi.err.out | ^ | SyntaxError: invalid syntax That is because foo.py isn't a python script anymore, it is a shell script. Its purpose is to divert stderr to a file and to recite various things about the environment to that file in addition to any error messages. Just run it directly: ./foo.py The #! line should cause it to be run by the shell. I also recommend you try to do all this as your normal user account. Root is for administration, such as stopping/starting apache and so on. Not test running scripts from the command line; consider: if the script has bugs, as root it can do an awful lot of damage. | root@nikos [/home/nikos/public_html/foo-py]# | As far as thr tail -f of the error_log: | root@nikos [/home/nikos/public_html]# touch /var/log/httpd/error_log That won't do you much good; apache has not opened it, and so it will not be writing to it. It was writing to a file of that name, but you removed that file. Apache probably still has its hooks in the old file (which now has no name). Restarting apache should open (or create if missing) this file for you. | root@nikos [/home/nikos/public_html]# tail -f /var/log/httpd/error_log | and its empty even when at the exact same time i run 'python3 | metrites.py' from another interactive prompt when it supposed to | give live feed of the error messages. No, _apache_ writes to that file. So only when you visit the web page will stuff appear there. If you just run things from the command line, error messages will appear on your terminal. Or, after this line of the wrapper script: exec 2/home/nikos/cgi.err.out the error messages will appear in cgi.err.out. | Cameron would it be too much to ask to provide you with root | access to my VPS server so you can have a look there too? | i can pay you if you like if you wait a few days to gather some money. I really do not recommend that: - it is nuts to blithely allow a stranger root access to your system - you won't learn anything about CGI scripts What you need for further debugging of your python issues is access to the error messages from the CGI script. That is the purpose of the wrapper script. Get the wrapper running on the command line and then test it via the browser. Cheers, -- Cameron Simpson c...@zip.com.au Lord grant me the serenity to accept the things I can not change, the courage to change the things that I can, and the wisdom to hide the bodies of those people I had to kill because they pissed me off. - Jeffrey Papen jpa...@asucla.ucla.edu cameron, can you help please or tell me what else i need to try? Hello -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Wed, Apr 17, 2013 at 4:56 PM, nagia.rets...@gmail.com wrote: can you help please or tell me what else i need to try? You need to try trimming quoted text in replies, not double-spacing, and paying for help. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Wed, Apr 17, 2013 at 4:56 PM, nagia.rets...@gmail.com wrote: can you help please or tell me what else i need to try? You need to try trimming quoted text in replies, not double-spacing, and paying for help. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On 14Apr2013 04:22, nagia.rets...@gmail.com nagia.rets...@gmail.com wrote: | | Cameron would it be too much to ask to provide you with root | | access to my VPS server so you can have a look there too? | | i can pay you if you like if you wait a few days to gather some money. | | I really do not recommend that: |- it is nuts to blithely allow a stranger root access to your system |- you won't learn anything about CGI scripts [...] | I insist that you will make the most of this if you access the VPS yourself. | it runs CentOS 6.4 | Please accept, i trust you. Very well. Let's take this off list to personal email (note that the reply-to on this message is just myself, not the list/group). We can return here after sorting CGI issues, should there be any further python specific issues. Reply to this message. I will email you my ssh public key. Please make me an _ordinary_ user account called cameron and send me the ssh details of your VPS. -- Cameron Simpson c...@zip.com.au TeX: When you pronounce it correctly to your computer, the terminal may become slightly moist. - D. E. Knuth. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Τη Πέμπτη, 18 Απριλίου 2013 2:00:48 π.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε: Reply to this message. I will email you my ssh public key. Please make me an _ordinary_ user account called cameron and send me the ssh details of your VPS. Thank you very much Cameron, i appreciate all your help and i'am willing to open you a free lifetime premium account at my webhosting as a token of appreciation. I have just mail you the login credentials. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Hello, can you still help me please? -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Τη Τετάρτη, 10 Απριλίου 2013 12:10:13 π.μ. UTC+3, ο χρήστης Νίκος Γκρ33κ έγραψε: Hello, iam still trying to alter the code form python 2.6 = 3.3 Everyrging its setup except that unicode error that you can see if you go to http://superhost.gr Can anyone help with this? I even tried to change print() with sys.stdout.buffer() but still i get the same unicode issue. I don't know what to try anymore. root@nikos [/home/nikos/public_html/foo-py]# pwd /home/nikos/public_html/foo-py root@nikos [/home/nikos/public_html/foo-py]# cat foo.py #!/bin/sh exec 2/home/nikos/cgi.err.out echo $0 $* 2 id 2 env | sort 2 set -x exec /full/path/to/foo-py ${1+$@} root@nikos [/home/nikos/public_html/foo-py]# python3 foo.py File foo.py, line 2 exec 2/home/nikos/cgi.err.out ^ SyntaxError: invalid syntax root@nikos [/home/nikos/public_html/foo-py]# As far as thr tail -f of the error_log: root@nikos [/home/nikos/public_html]# touch /var/log/httpd/error_log root@nikos [/home/nikos/public_html]# tail -f /var/log/httpd/error_log and its empty even when at the exact same time i run 'python3 metrites.py' from another interactive prompt when it supposed to give live feed of the error messages. Cameron would it be too much to ask to provide you with root access to my VPS server so you can have a look there too? i can pay you if you like if you wait a few days to gather some money. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On 13Apr2013 23:00, nagia.rets...@gmail.com nagia.rets...@gmail.com wrote: | root@nikos [/home/nikos/public_html/foo-py]# pwd | /home/nikos/public_html/foo-py | root@nikos [/home/nikos/public_html/foo-py]# cat foo.py | #!/bin/sh | exec 2/home/nikos/cgi.err.out | echo $0 $* 2 | id 2 | env | sort 2 | set -x | exec /full/path/to/foo-py ${1+$@} | | root@nikos [/home/nikos/public_html/foo-py]# python3 foo.py | File foo.py, line 2 | exec 2/home/nikos/cgi.err.out | ^ | SyntaxError: invalid syntax That is because foo.py isn't a python script anymore, it is a shell script. Its purpose is to divert stderr to a file and to recite various things about the environment to that file in addition to any error messages. Just run it directly: ./foo.py The #! line should cause it to be run by the shell. I also recommend you try to do all this as your normal user account. Root is for administration, such as stopping/starting apache and so on. Not test running scripts from the command line; consider: if the script has bugs, as root it can do an awful lot of damage. | root@nikos [/home/nikos/public_html/foo-py]# | As far as thr tail -f of the error_log: | root@nikos [/home/nikos/public_html]# touch /var/log/httpd/error_log That won't do you much good; apache has not opened it, and so it will not be writing to it. It was writing to a file of that name, but you removed that file. Apache probably still has its hooks in the old file (which now has no name). Restarting apache should open (or create if missing) this file for you. | root@nikos [/home/nikos/public_html]# tail -f /var/log/httpd/error_log | and its empty even when at the exact same time i run 'python3 | metrites.py' from another interactive prompt when it supposed to | give live feed of the error messages. No, _apache_ writes to that file. So only when you visit the web page will stuff appear there. If you just run things from the command line, error messages will appear on your terminal. Or, after this line of the wrapper script: exec 2/home/nikos/cgi.err.out the error messages will appear in cgi.err.out. | Cameron would it be too much to ask to provide you with root | access to my VPS server so you can have a look there too? | i can pay you if you like if you wait a few days to gather some money. I really do not recommend that: - it is nuts to blithely allow a stranger root access to your system - you won't learn anything about CGI scripts What you need for further debugging of your python issues is access to the error messages from the CGI script. That is the purpose of the wrapper script. Get the wrapper running on the command line and then test it via the browser. Cheers, -- Cameron Simpson c...@zip.com.au Lord grant me the serenity to accept the things I can not change, the courage to change the things that I can, and the wisdom to hide the bodies of those people I had to kill because they pissed me off. - Jeffrey Papen jpa...@asucla.ucla.edu -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Τη Κυριακή, 14 Απριλίου 2013 12:28:32 μ.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε: On 13Apr2013 23:00, nagia.rets...@gmail.com nagia.rets...@gmail.com wrote: | root@nikos [/home/nikos/public_html/foo-py]# pwd | /home/nikos/public_html/foo-py | root@nikos [/home/nikos/public_html/foo-py]# cat foo.py | #!/bin/sh | exec 2/home/nikos/cgi.err.out | echo $0 $* 2 | id 2 | env | sort 2 | set -x | exec /full/path/to/foo-py ${1+$@} | | root@nikos [/home/nikos/public_html/foo-py]# python3 foo.py | File foo.py, line 2 | exec 2/home/nikos/cgi.err.out | ^ | SyntaxError: invalid syntax That is because foo.py isn't a python script anymore, it is a shell script. Its purpose is to divert stderr to a file and to recite various things about the environment to that file in addition to any error messages. Just run it directly: ./foo.py The #! line should cause it to be run by the shell. I also recommend you try to do all this as your normal user account. Root is for administration, such as stopping/starting apache and so on. Not test running scripts from the command line; consider: if the script has bugs, as root it can do an awful lot of damage. | root@nikos [/home/nikos/public_html/foo-py]# | As far as thr tail -f of the error_log: | root@nikos [/home/nikos/public_html]# touch /var/log/httpd/error_log That won't do you much good; apache has not opened it, and so it will not be writing to it. It was writing to a file of that name, but you removed that file. Apache probably still has its hooks in the old file (which now has no name). Restarting apache should open (or create if missing) this file for you. | root@nikos [/home/nikos/public_html]# tail -f /var/log/httpd/error_log | and its empty even when at the exact same time i run 'python3 | metrites.py' from another interactive prompt when it supposed to | give live feed of the error messages. No, _apache_ writes to that file. So only when you visit the web page will stuff appear there. If you just run things from the command line, error messages will appear on your terminal. Or, after this line of the wrapper script: exec 2/home/nikos/cgi.err.out the error messages will appear in cgi.err.out. | Cameron would it be too much to ask to provide you with root | access to my VPS server so you can have a look there too? | i can pay you if you like if you wait a few days to gather some money. I really do not recommend that: - it is nuts to blithely allow a stranger root access to your system - you won't learn anything about CGI scripts What you need for further debugging of your python issues is access to the error messages from the CGI script. That is the purpose of the wrapper script. Get the wrapper running on the command line and then test it via the browser. Cheers, -- Cameron Simpson c...@zip.com.au Lord grant me the serenity to accept the things I can not change, the courage to change the things that I can, and the wisdom to hide the bodies of those people I had to kill because they pissed me off. - Jeffrey Papen jpa...@asucla.ucla.edu Well i trust you because you are the only one along with Lele that are helpimg me here: i tried what you said: root@nikos [/home/nikos/public_html/cgi-bin]# service httpd restart root@nikos [/home/nikos/public_html/cgi-bin]# python3 metrites.py root@nikos [/home/nikos/public_html]# cd foo-py/ root@nikos [/home/nikos/public_html/foo-py]# ls ./ ../ foo.py* root@nikos [/home/nikos/public_html/foo-py]# ./foo.py root@nikos [/home/nikos/public_html/foo-py]# cd .. root@nikos [/home/nikos/public_html]# cat cgi.err.out root@nikos [/home/nikos/public_html/cgi-bin]# cat /var/log/httpd/error_log root@nikos [/home/nikos/public_html/cgi-bin]# and i have run the script form browser but i still see nothing. I insist that you will make the most of this if you access the VPS yourself. it runs CentOS 6.4 Please accept, i trust you. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On 12Apr2013 21:50, nagia.rets...@gmail.com nagia.rets...@gmail.com wrote: | Ookey after that is corrected, i then tried the plain solution and i got this response back form the shell: | | Traceback (most recent call last): | File metrites.py, line 213, in lt;modulegt; | htmldata = f.read() | File /root/.local/lib/python2.7/lib/python3.3/encodings/iso8859_7.py, line 23, in decode | return codecs.charmap_decode(input,self.errors,decoding_table)[0] | UnicodeDecodeError: 'charmap' codec can't decode byte 0xae in position 47: character maps to lt;undefinedgt; | | then i switched to: | | with open('/home/nikos/www/' + page, encoding='utf-8') as f: | htmldata = f.read() | | and i got no error at all, just pure run *from the shell*! Ok, so you need to specify utf-8 to decode the file. Good. | But i get internal server error when i try to run the webpage from the browser(Chrome). That is standard for a CGI script that does not complete successfully. | So, can you tell me please where can i find the apache error log so to display here please? That depends on the install. Have a look in /var/log/apache or similar. Otherwise you need to find the httpd.conf for the apache and look for its log coniguration settings. | Apcher error_log is always better than running 'python3 metrites.py' because even if the python script has no error apache will also display more web related things? The error log is where error messages from CGI scripts go. And other error messages. It is very useful when testing CGI scripts. Of course, it's best to work out as much as possible from the command line first; you have much more direct control and access to errors there. That only gets you so far though; the environment the CGI script runs in is not the same as your command line, and some different behaviour can come from this. BTW, are you sure python3 is running your CGI script? Also, the CGI script may not be running as you, but as the apache user. In that case, it may fail if it does not has permission to access various files owned by you. Anyway, you need to see the error messages to work this out. If you can't find the error log you can divert stderr at the start of your python program: sys.stderr = open('/home/nikos/cgi.err.out', 'a') and watch that in a shell: tail -f cgi.err.out Cheers, -- Cameron Simpson c...@zip.com.au If you 'aint falling off, you ar'nt going hard enough. - Fred Gassit -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Τη Σάββατο, 13 Απριλίου 2013 1:28:07 μ.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε: On 12Apr2013 21:50, nagia.rets...@gmail.com nagia.rets...@gmail.com wrote: | Ookey after that is corrected, i then tried the plain solution and i got this response back form the shell: | | Traceback (most recent call last): | File metrites.py, line 213, in lt;modulegt; | htmldata = f.read() | File /root/.local/lib/python2.7/lib/python3.3/encodings/iso8859_7.py, line 23, in decode | return codecs.charmap_decode(input,self.errors,decoding_table)[0] | UnicodeDecodeError: 'charmap' codec can't decode byte 0xae in position 47: character maps to lt;undefinedgt; | | then i switched to: | | with open('/home/nikos/www/' + page, encoding='utf-8') as f: | htmldata = f.read() | | and i got no error at all, just pure run *from the shell*! Ok, so you need to specify utf-8 to decode the file. Good. | But i get internal server error when i try to run the webpage from the browser(Chrome). That is standard for a CGI script that does not complete successfully. | So, can you tell me please where can i find the apache error log so to display here please? That depends on the install. Have a look in /var/log/apache or similar. Otherwise you need to find the httpd.conf for the apache and look for its log coniguration settings. | Apcher error_log is always better than running 'python3 metrites.py' because even if the python script has no error apache will also display more web related things? The error log is where error messages from CGI scripts go. And other error messages. It is very useful when testing CGI scripts. Of course, it's best to work out as much as possible from the command line first; you have much more direct control and access to errors there. That only gets you so far though; the environment the CGI script runs in is not the same as your command line, and some different behaviour can come from this. BTW, are you sure python3 is running your CGI script? Also, the CGI script may not be running as you, but as the apache user. In that case, it may fail if it does not has permission to access various files owned by you. Anyway, you need to see the error messages to work this out. If you can't find the error log you can divert stderr at the start of your python program: sys.stderr = open('/home/nikos/cgi.err.out', 'a') and watch that in a shell: tail -f cgi.err.out Cheers, -- Cameron Simpson c...@zip.com.au If you 'aint falling off, you ar'nt going hard enough. - Fred Gassit root@macgyver [/home/nikos/public_html/cgi-bin]# ls ../cgi.err.out ../cgi.err.out root@macgyver [/home/nikos/public_html/cgi-bin]# cat ../cgi.err.out root@macgyver [/home/nikos/public_html/cgi-bin]# Also i have foudn the error log and i tried to view it but it was empty and then i removed it and then run the script both from shell and broswer but it didnt reappeared. root@macgyver [/home/nikos/public_html/cgi-bin]# cat /var/log/httpd/error_log cat: /var/log/httpd/error_log: No such file or directory root@macgyver [/home/nikos/public_html/cgi-bin]# Maybe its somehtign wron with my enviroment? Should we check the Apache and CGI enviroment somehow and also make sure as you say that *I* run the CGI scripts and not user 'Apache' ? Tell me what commands i should issues please and i will display the output to you. Thank you Cameron, for helpimg me. Somehow the script doesnt seem to be the issue in my VPS. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Sun, Apr 14, 2013 at 12:16 AM, nagia.rets...@gmail.com wrote: Also i have foudn the error log and i tried to view it but it was empty and then i removed it and then run the script both from shell and broswer but it didnt reappeared. root@macgyver [/home/nikos/public_html/cgi-bin]# cat /var/log/httpd/error_log cat: /var/log/httpd/error_log: No such file or directory root@macgyver [/home/nikos/public_html/cgi-bin]# https://www.google.com/search?q=log+file+rotation ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On 13Apr2013 07:16, nagia.rets...@gmail.com nagia.rets...@gmail.com wrote: | root@macgyver [/home/nikos/public_html/cgi-bin]# ls ../cgi.err.out | ../cgi.err.out I prefer ls -ld myself. | root@macgyver [/home/nikos/public_html/cgi-bin]# cat ../cgi.err.out | | Also i have foudn the error log and i tried to view it but it was | empty and then i removed it and then run the script both from shell | and broswer but it didnt reappeared. Never remove it. It is only created by the web server at startup or log rotation time. So now you need to restart the apache to get it back. Just open a spare terminal and run: tail -f /var/log/httpd/error_log | Should we check the Apache and CGI enviroment somehow and also | make sure as you say that *I* run the CGI scripts and not user | 'Apache' ? Well, it is helpful to know. if the CGI script tries to write any data to files, if it runs as a different user it will need different permissions on the files. | Tell me what commands i should issues please and i will display the output to you. I would be tempter to wrap the CGI script in a shell script. Suppose your script is named foo.py. You can move the script to foo-py and make a shell script called foo.py looking like this: #!/bin/sh exec 2/home/nikos/cgi.err.out echo $0 $* 2 id 2 env | sort 2 set -x exec /full/path/to/foo-py ${1+$@} and make sure it, like the original, is readable and executable: chmod a+rx foo.py foo-py Make sure cgi.err.out is publicly writable (in case the apache is not running the CGIs are you): chmod a+w cgi.err.out Then: tail -f cgi.err.out in a spare window. Then try the script. It should transcribe information about the script's user and environment and also catch errors. This should help in debugging. Cheers, -- I die. I have a terrible fever in my head and it gets hotter and hotter until my head is a fire, a forge, a star. I set the world on fire and all die. O the embarrassment. - Joe Haldeman, _A !Tangled Web_ -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Someone HEELP ME!! -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Fri, Apr 12, 2013 at 10:50 PM, nagia.rets...@gmail.com wrote: Someone HEELP ME!! http://youtu.be/VxMYwjp8t0o ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Τη Παρασκευή, 12 Απριλίου 2013 4:14:39 μ.μ. UTC+3, ο χρήστης Chris Angelico έγραψε: On Fri, Apr 12, 2013 at 10:50 PM, nagia.rets...@gmail.com wrote: Someone HEELP ME!! http://youtu.be/VxMYwjp8t0o ChrisA Well, instead of being a smartass it would be nice if you could actually help for once. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Fri, Apr 12, 2013 at 11:18 PM, nagia.rets...@gmail.com wrote: Τη Παρασκευή, 12 Απριλίου 2013 4:14:39 μ.μ. UTC+3, ο χρήστης Chris Angelico έγραψε: On Fri, Apr 12, 2013 at 10:50 PM, nagia.rets...@gmail.com wrote: Someone HEELP ME!! http://youtu.be/VxMYwjp8t0o ChrisA Well, instead of being a smartass it would be nice if you could actually help for once. Yeah, I'm done with that. Your whining ran through my patience a few posts ago. But you should feel special; I clipped that just for you. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Apr 12, 6:18 pm, nagia.rets...@gmail.com wrote: Τη Παρασκευή, 12 Απριλίου 2013 4:14:39 μ.μ. UTC+3, ο χρήστης Chris Angelico έγραψε: On Fri, Apr 12, 2013 at 10:50 PM, nagia.rets...@gmail.com wrote: Someone HEELP ME!! http://youtu.be/VxMYwjp8t0o ChrisA Well, instead of being a smartass it would be nice if you could actually help for once. Interesting! Among the things which you dont seem to know is the meaning of the word 'once'. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Τη Παρασκευή, 12 Απριλίου 2013 4:29:51 μ.μ. UTC+3, ο χρήστης rusi έγραψε: On Apr 12, 6:18 pm, nagia.rets...@gmail.com wrote: Τη Παρασκευή, 12 Απριλίου 2013 4:14:39 μ.μ. UTC+3, ο χρήστης Chris Angelico έγραψε: On Fri, Apr 12, 2013 at 10:50 PM, nagia.rets...@gmail.com wrote: Someone HEELP ME!! http://youtu.be/VxMYwjp8t0o ChrisA Well, instead of being a smartass it would be nice if you could actually help for once. Interesting! Among the things which you dont seem to know is the meaning of the word 'once'. Same applies for you too. Stop being smartasses. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Fri, Apr 12, 2013 at 8:36 AM, nagia.rets...@gmail.com wrote: Τη Παρασκευή, 12 Απριλίου 2013 4:29:51 μ.μ. UTC+3, ο χρήστης rusi έγραψε: On Apr 12, 6:18 pm, nagia.rets...@gmail.com wrote: Well, instead of being a smartass it would be nice if you could actually help for once. Interesting! Among the things which you dont seem to know is the meaning of the word 'once'. Same applies for you too. Stop being smartasses. Please keep in mind that this is a community of volunteers. Nobody here is being paid for their time to help you fix your website, and if you manage to irritate us in the process, we're likely to just walk away from it. I looked over the code that you have provided us with, and based on that I could not see any reason why the html would be in the form of a bytes instead of a str. Since nobody else here seems to have any further insight into the problem either, you're just going to have to find a a way to debug the code. If you cannot do that on your own, then I suggest that you find a contractor who can, hire them, and grant them the access they need to do a real debugging session. I would also recommend that in the future you should stop deploying untested code to your production website. Set up a development environment for yourself, make the changes there, and only deploy when you know that everything is working. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
In article mailman.533.1365792239.3114.python-l...@python.org, Ian Kelly ian.g.ke...@gmail.com wrote: I would also recommend that in the future you should stop deploying untested code to your production website. Set up a development environment for yourself, make the changes there, and only deploy when you know that everything is working. But that takes all the fun out of it :-) -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Τη Παρασκευή, 12 Απριλίου 2013 9:37:29 μ.μ. UTC+3, ο χρήστης Ian έγραψε: On Fri, Apr 12, 2013 at 8:36 AM, nagia.rets...@gmail.com wrote: Τη Παρασκευή, 12 Απριλίου 2013 4:29:51 μ.μ. UTC+3, ο χρήστης rusi έγραψε: On Apr 12, 6:18 pm, nagia.rets...@gmail.com wrote: Well, instead of being a smartass it would be nice if you could actually help for once. Interesting! Among the things which you dont seem to know is the meaning of the word 'once'. Same applies for you too. Stop being smartasses. Please keep in mind that this is a community of volunteers. Nobody here is being paid for their time to help you fix your website, and if you manage to irritate us in the process, we're likely to just walk away from it. I looked over the code that you have provided us with, and based on that I could not see any reason why the html would be in the form of a bytes instead of a str. Since nobody else here seems to have any further insight into the problem either, you're just going to have to find a a way to debug the code. If you cannot do that on your own, then I suggest that you find a contractor who can, hire them, and grant them the access they need to do a real debugging session. I would also recommend that in the future you should stop deploying untested code to your production website. Set up a development environment for yourself, make the changes there, and only deploy when you know that everything is working. I agree with what you say except form the fact that i try to irritate people. Look at the thread and you will see who's irritating whom first. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On 11Apr2013 09:55, Nikos nagia.rets...@gmail.com wrote: | Τη Πέμπτη, 11 Απριλίου 2013 1:45:22 μ.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε: | On 10Apr2013 21:50, nagia.rets...@gmail.com nagia.rets...@gmail.com wrote: | | the doctype is coming form the attempt of script metrites.py to open and read the 'index.html' file. | | But i don't know how to try to open it as a byte file instead of an tetxt file. Lele Gaifax showed one way: from codecs import open with open('index.html', encoding='utf-8') as f: content = f.read() But a plain open() should also do: with open('index.html') as f: content = f.read() if you're not taking tight control of the file encoding. The point here is to get _text_ (i.e. str) data from the file, not bytes. If the text turns out to be incorrectly decoded (i.e. incorrectly reading the file bytes and assembling them into text strings) because the default encoding is wrong, then you may need to read for Lele's more verbose open() example to select the correct encoding. But first ignore that and get text (str) instead of bytes. If you're already getting text from the file, something later is making bytes and handing it to print(). Another approach to try is to use sys.stdout.write() instead of print() The print() function will take _anything_ and write text of some form. The write() function will throw an exception if it gets the wrong type of data. If sys.stdout is opened in binary mode then write() will require bytes as data; strings will need to be explicitly turned into bytes via .encode() in order to not raise an exception. If sys.stdout is open in text mode, write() will require str data. The sys.stdout file itself will transcribe to bytes for you. If you take that route, at least you will not have confusion about str versus bytes. For an HTML output page I would advocate arranging that sys.stdout is in text mode; that way you can do the natural thing and .write() str data and lovely UTF-8 bytes will come out the other end. If the above test (using .write() instead of print()) shows it to be in binary mode we can fix that. But you need to find out. You will want access to the error messages from the CGI environment; do you have access to the web servers error_log? You can tail that in a terminal while you reload the page to see what's going on. | This works in the shell, but doesn't work on my website: | | $ cat utf8.txt | υλικό!Πρόκειται γ Ok, so your terminal is using UTF-8 as its output coding. (And so is your mail posting program, since we see it unmangled on my screen here.) | $ python3 | Python 3.2.3 (default, Oct 19 2012, 20:10:41) | [GCC 4.6.3] on linux2 | Type help, copyright, credits or license for more information. | data = open('utf8.txt').read() | print(data) | υλικό!Πρόκειται γ Likewise. However, in an exciting twist, I seem to recall that Python invoked interactively with aterminal as output will have the default terminal encoding in place on sys.stdout. Producing what you expect. _However_, python invoked in a batch environment where stdout is not a terminal (such as in the CGI environment producing your web page), that is _not_ necessarily the case. | print(data.encode('utf-8')) | b'\xcf\x85\xce\xbb\xce\xb9\xce\xba\xcf\x8c!\xce\xa0\xcf\x81\xcf\x8c\xce\xba\xce\xb5\xce\xb9\xcf\x84\xce\xb1\xce\xb9 \xce\xb3\n' | | See, the last line is what i'am getting on my website. The above line takes your Unicode text in data and transcribed it to bytes using UTF-8 as the encoding. And print() is then receiving that bytes object and printing its str() representation as b''. That str is itself unicode, and when print passes it to sys.stdout, _that_ transcribed the unicode b'...' string as bytes to your terminal. Using UTF-8 based on the previous examples above, but since all those characters are in the bottom 127 code range the byte sequence will be the same if it uses ASCII or ISO8859-1 or almost anything else:-) As you can see, there's a lot of encoding/decoding going on behind the scenes even in this superficially simple example. | If i remove | the encode('utf-8') part in metrites.py, the webpage will not show | anything at all... Ah, but data will be being output. The print() function _will_ be writing data out in some form. I suggest you remove the .encode() and then examine the _source_ text of the web page, not its visible form. So: remove .encode(), reload the web page, view page source (depends on your browser, it is ctrl-U in Firefox ((Cmd-U in firefox on a Mac))). I think a lot of the issue you have in this thread is that your page is too complex. Make another page to do the same thing, and start with nothing. Add stuff to it a single item at a time until the page behaves incorrectly. Then you will know the exact item of code that introduced the issue. And then that single item can be examined in detail for the decode/encode issues. The other issue in the thread is that people losing
Re: Unicode issue with Python v3.3
Τη Σάββατο, 13 Απριλίου 2013 4:41:57 π.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε: On 11Apr2013 09:55, Nikos nagia.rets...@gmail.com wrote: | Τη Πέμπτη, 11 Απριλίου 2013 1:45:22 μ.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε: | On 10Apr2013 21:50, nagia.rets...@gmail.com nagia.rets...@gmail.com wrote: | | the doctype is coming form the attempt of script metrites.py to open and read the 'index.html' file. | | But i don't know how to try to open it as a byte file instead of an tetxt file. Lele Gaifax showed one way: from codecs import open with open('index.html', encoding='utf-8') as f: content = f.read() But a plain open() should also do: with open('index.html') as f: content = f.read() if you're not taking tight control of the file encoding. The point here is to get _text_ (i.e. str) data from the file, not bytes. If the text turns out to be incorrectly decoded (i.e. incorrectly reading the file bytes and assembling them into text strings) because the default encoding is wrong, then you may need to read for Lele's more verbose open() example to select the correct encoding. But first ignore that and get text (str) instead of bytes. If you're already getting text from the file, something later is making bytes and handing it to print(). Another approach to try is to use sys.stdout.write() instead of print() The print() function will take _anything_ and write text of some form. The write() function will throw an exception if it gets the wrong type of data. If sys.stdout is opened in binary mode then write() will require bytes as data; strings will need to be explicitly turned into bytes via .encode() in order to not raise an exception. If sys.stdout is open in text mode, write() will require str data. The sys.stdout file itself will transcribe to bytes for you. If you take that route, at least you will not have confusion about str versus bytes. For an HTML output page I would advocate arranging that sys.stdout is in text mode; that way you can do the natural thing and .write() str data and lovely UTF-8 bytes will come out the other end. If the above test (using .write() instead of print()) shows it to be in binary mode we can fix that. But you need to find out. You will want access to the error messages from the CGI environment; do you have access to the web servers error_log? You can tail that in a terminal while you reload the page to see what's going on. | This works in the shell, but doesn't work on my website: | | $ cat utf8.txt | υλικό!Πρόκειται γ Ok, so your terminal is using UTF-8 as its output coding. (And so is your mail posting program, since we see it unmangled on my screen here.) | $ python3 | Python 3.2.3 (default, Oct 19 2012, 20:10:41) | [GCC 4.6.3] on linux2 | Type help, copyright, credits or license for more information. | data = open('utf8.txt').read() | print(data) | υλικό!Πρόκειται γ Likewise. However, in an exciting twist, I seem to recall that Python invoked interactively with aterminal as output will have the default terminal encoding in place on sys.stdout. Producing what you expect. _However_, python invoked in a batch environment where stdout is not a terminal (such as in the CGI environment producing your web page), that is _not_ necessarily the case. | print(data.encode('utf-8')) | b'\xcf\x85\xce\xbb\xce\xb9\xce\xba\xcf\x8c!\xce\xa0\xcf\x81\xcf\x8c\xce\xba\xce\xb5\xce\xb9\xcf\x84\xce\xb1\xce\xb9 \xce\xb3\n' | | See, the last line is what i'am getting on my website. The above line takes your Unicode text in data and transcribed it to bytes using UTF-8 as the encoding. And print() is then receiving that bytes object and printing its str() representation as b''. That str is itself unicode, and when print passes it to sys.stdout, _that_ transcribed the unicode b'...' string as bytes to your terminal. Using UTF-8 based on the previous examples above, but since all those characters are in the bottom 127 code range the byte sequence will be the same if it uses ASCII or ISO8859-1 or almost anything else:-) As you can see, there's a lot of encoding/decoding going on behind the scenes even in this superficially simple example. | If i remove | the encode('utf-8') part in metrites.py, the webpage will not show | anything at all... Ah, but data will be being output. The print() function _will_ be writing data out in some form. I suggest you remove the .encode() and then examine the _source_ text of the web page, not its visible form. So: remove .encode(), reload the web page, view page source (depends on your browser, it is ctrl-U in Firefox ((Cmd-U in firefox on a Mac))).
Re: Unicode issue with Python v3.3
Since now we k ow the problem maybe we can tell metrites.py to open index.html using utf-8 encoding rather as binary, dont you think? -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Thu, 11 Apr 2013 00:13:46 -0700, nagia.retsina wrote: Since now we k ow the problem maybe we can tell metrites.py to open index.html using utf-8 encoding rather as binary, dont you think? What makes you think it is UTF-8? Last time you tried decoding content as UTF-8, you got an error that it wasn't a legal UTF-8 file. Where does index.html come from? Whatever program generates that, you need to find out what encoding it is using. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Thu, 11 Apr 2013 07:50:19 +, Steven D'Aprano wrote: On Thu, 11 Apr 2013 00:13:46 -0700, nagia.retsina wrote: Since now we k ow the problem maybe we can tell metrites.py to open index.html using utf-8 encoding rather as binary, dont you think? What makes you think it is UTF-8? Last time you tried decoding content as UTF-8, you got an error that it wasn't a legal UTF-8 file. Oops, sorry, correction. It wasn't a legal UTF-8 string. It was an environment variable that was causing the decoding error, since it contained illegal bytes for a UTF-8 string. Where does index.html come from? Whatever program generates that, you need to find out what encoding it is using. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Τη Πέμπτη, 11 Απριλίου 2013 11:20:47 π.μ. UTC+3, ο χρήστης Steven D'Aprano έγραψε: On Thu, 11 Apr 2013 07:50:19 +, Steven D'Aprano wrote: On Thu, 11 Apr 2013 00:13:46 -0700, nagia.retsina wrote: Since now we k ow the problem maybe we can tell metrites.py to open index.html using utf-8 encoding rather as binary, dont you think? What makes you think it is UTF-8? Last time you tried decoding content as UTF-8, you got an error that it wasn't a legal UTF-8 file. Oops, sorry, correction. It wasn't a legal UTF-8 string. It was an environment variable that was causing the decoding error, since it contained illegal bytes for a UTF-8 string. Where does index.html come from? Whatever program generates that, you need to find out what encoding it is using. Hello steven, index.html was writenn by handcode from me utilizing html + css metrites.py tries to open that script so we must tell it to open as utf-8 text and not as a binary file. How can we do that? -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
nagia.rets...@gmail.com writes: metrites.py tries to open that script so we must tell it to open as utf-8 text and not as a binary file. One way is the following: from codecs import open with open('index.html', encoding='utf-8') as f: content = f.read() ciao, lele. -- nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia. l...@metapensiero.it | -- Fortunato Depero, 1929. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On 10Apr2013 21:50, nagia.rets...@gmail.com nagia.rets...@gmail.com wrote: | Firtly thank uou for taking a look into the code. | the doctype is coming form the attempt of script metrites.py to open and read the 'index.html' file. | But i don't know how to try to open it as a byte file instead of an tetxt file. I think you've got it backwards. It looks like metrites.py has opened the file as bytes instead of as text (probably utf8, but that remains to be seen). Because it has opened it in binary mode you're getting bytes when you read from the file. Can you show the relevant code that opens the files and reads from it, and the print statement that is putting it back out? You probably need to ensure that metrites.py is opening it as text, with the correct encoding. Note that the encoding is nothing to do with your _output_. It is the encoding of the data in the file you are reading, and that is dictated by the editor used to make the file. Anyway, code first. What does it look like? Cheers, -- Cameron Simpson c...@zip.com.au Six trillion RFID tags is four orders of magnitude bigger than any electronic item ever made. - overhead by WIRED at the Intelligent Printing conference Oct2006 -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Of course here is how it look like: if page.endswith('.html'): f = open( /home/nikos/www/ + page, encoding=utf-8 ) htmldata = f.read() htmldata = htmldata % (quote, music) counter = ''' center a href=mailto:supp...@superhost.gr; img src=/data/images/mail.png/a table border=2 cellpadding=2 bgcolor=black tdfont color=limeΑριθμός Επισκεπτών/td tda href=http://superhost.gr/?show=logpage=%s;font color=yellow %d /td /tablebr ''' % (page, data[0]) template = htmldata + counter print( template ) -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Τη Πέμπτη, 11 Απριλίου 2013 1:45:22 μ.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε: On 10Apr2013 21:50, nagia.rets...@gmail.com nagia.rets...@gmail.com wrote: | Firtly thank uou for taking a look into the code. | the doctype is coming form the attempt of script metrites.py to open and read the 'index.html' file. | But i don't know how to try to open it as a byte file instead of an tetxt file. I think you've got it backwards. It looks like metrites.py has opened the file as bytes instead of as text (probably utf8, but that remains to be seen). Because it has opened it in binary mode you're getting bytes when you read from the file. Can you show the relevant code that opens the files and reads from it, and the print statement that is putting it back out? You probably need to ensure that metrites.py is opening it as text, with the correct encoding. Note that the encoding is nothing to do with your _output_. It is the encoding of the data in the file you are reading, and that is dictated by the editor used to make the file. Webhost Weblog This works in the shell, but doesn't work on my website: $ cat utf8.txt υλικό!Πρόκειται γ $ python3 Python 3.2.3 (default, Oct 19 2012, 20:10:41) [GCC 4.6.3] on linux2 Type help, copyright, credits or license for more information. data = open('utf8.txt').read() print(data) υλικό!Πρόκειται γ print(data.encode('utf-8')) b'\xcf\x85\xce\xbb\xce\xb9\xce\xba\xcf\x8c!\xce\xa0\xcf\x81\xcf\x8c\xce\xba\xce\xb5\xce\xb9\xcf\x84\xce\xb1\xce\xb9 \xce\xb3\n' See, the last line is what i'am getting on my website. If i remove the encode('utf-8') part in metrites.py, the webpage will not show anything at all... -- http://mail.python.org/mailman/listinfo/python-list
Re: People in the python community [was Re: Unicode issue with Python v3.3]
On 04/10/2013 10:50 AM, Νίκος Γκρ33κ wrote: I'am not sure i follow you. How did my topic changed?! Is this possible? This is a mailing list/nntp newsgroup. The subject line can be changed arbitrarily by anyone replying to another message. Normally this is done to indicate a natural progression of the conversation in a new direction. In this case, Steven D'Aprano wrote a reply that did not answer your pleas, but instead made some observations, and so he changed the subject line to reflect that. If you read your messages using a threaded message display, this will make more sense to you. But if you use Gmail's (or Google's) broken conversation view, then this information about who is responding to whom does get lost--actually in conversation view a lot of information about the message flow is lost; it really is unfortunate that this way of communicating has become so widespread. How about the oce i posted at patebin.com. Did anyone by any chnace had a look into? It's only a single thing iam missing for the encoding and the the script will load properly with python 3.3 I'm truly sorry, but I simply do not have the time to do so. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Well, can somebody else propose somehting plz? i have paste the whole script and even the necessary snippet that perhaps causing this encoding confusion in 3.3 -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Apr 12, 2:36 pm, nagia.rets...@gmail.com wrote: Well, can somebody else propose somehting plz? Pay for a professional. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Apr 10, 10:06 am, rusi rustompm...@gmail.com wrote: An interesting case of two threads: On Apr 10, 9:46 am, Chris Angelico ros...@gmail.com wrote: On Wed, Apr 10, 2013 at 2:25 PM, Steven D'Aprano Obviously you know what the problem is much better than the Python interpreter. I just went to the page and it started playing sound. Between that and this arrogant refusal to believe either the interpreter or the people who are freely donating time to assist, I'm done. No more looking at Nikos's home page to try to figure out his problems. Have fun, Nikos. ChrisA Some swans are black Some homo sapiens have negative IQ Hmm I see some cut-paste goofup on my part. I was meaning to juxtapose this thread where we put up with inordinate amount of nonsense from OP along with the recent thread in which a newcomer who thinks he has found a bug in pdb is made fun of. Then thought better of it and deleted the stuff. However I did not do a good delete-job so I better now say what I avoided saying: If those who habitually post rubbish are given much of our time and effort, whereas newcomers and first-timers are treated rudely, the list begins to smell like a club of old farts. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
rusi rustompmody at gmail.com writes: Hmm I see some cut-paste goofup on my part. I was meaning to juxtapose this thread where we put up with inordinate amount of nonsense from OP along with the recent thread in which a newcomer who thinks he has found a bug in pdb is made fun of. Then thought better of it and deleted the stuff. However I did not do a good delete-job so I better now say what I avoided saying: If those who habitually post rubbish are given much of our time and effort, whereas newcomers and first-timers are treated rudely, the list begins to smell like a club of old farts. +1. If you think you have something intelligent to say to jmfauth, you might as well start a private discussion with him. As far as I'm concerned, python-list is *already* of club of old farts. Many regular posters are more interested in being right on the Internet rather than helping people out. (this is where the StackOverflow mechanics probably work better, sadly) Regards Antoine. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Τη Τετάρτη, 10 Απριλίου 2013 7:25:21 π.μ. UTC+3, ο χρήστης Steven D'Aprano έγραψε: What does os.environ['REMOTE_ADDR'] give? Until you answer that question, you won't make any progress. I insists stevv. Look at what 'python3 metrites.py' gives me !-- The above is a description of an error in a Python program, formatted for a Web browser because the 'cgitb' module was enabled. In case you are not reading this in a Web browser, here is the original traceback: Traceback (most recent call last): File metrites.py, line 34, in lt;modulegt; userinfo = os.environ['HTTP_USER_AGENT'] File /root/.local/lib/python2.7/lib/python3.3/os.py, line 669, in __getitem__ value = self._data[self.encodekey(key)] KeyError: b'HTTP_USER_AGENT' -- -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Here is the whole code for metrites.py in case someone wants to take allok. Everything is correct after altering it to meet python 3.3, everythign aprt from the weird unicode error thing. http://pastebin.com/5Mpjx5Fd please take a look. Thank you. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Tue, 09 Apr 2013 23:04:35 -0700, rusi wrote: Hmm I see some cut-paste goofup on my part. I was meaning to juxtapose this thread where we put up with inordinate amount of nonsense from OP along with the recent thread in which a newcomer who thinks he has found a bug in pdb is made fun of. Curious. Is this making fun of the newcomer? If you are able to supply more details, we might be able to follow up on the registration problem. And, as someone else suggested, you could post the details of the pdb problem here. Note, there are already a number of currently open issues with pdb reported on the bug tracker. If you haven't already, you could search for pdb and see if your problem has been reported. Thanks for bringing the problem(s) up! Or perhaps this is making fun of them? Post the 10-line program here, so others can verify whether it is a bug. I think it is quite unfair of you to mischaracterise the entire community response in this way. One person made a light-hearted, silly, unhelpful response. (As sarcasm, I'm afraid it missed the target.) Two people made good, sensible responses -- and you were not either of them. If you want to be helpful, how about leading by example and taking on some of the less coherent newbie questions, instead of just bitching that others don't? It's easy, and a pleasure, to give good answers to well- written, carefully thought out questions. It's much harder to do the same for those questions which are... shall we say... less optimal. We could do with a few more people who make an effort to be helpful and friendly, instead of scolds who just tell us off when we stumble. Then thought better of it and deleted the stuff. However I did not do a good delete-job so I better now say what I avoided saying: If those who habitually post rubbish are given much of our time and effort, whereas newcomers and first-timers are treated rudely, the list begins to smell like a club of old farts. It's often the newcomers who are posting rubbish. Should we ignore them for posting rubbish, or welcome them for being newcomers? -- Steven -- http://mail.python.org/mailman/listinfo/python-list
People in the python community [was Re: Unicode issue with Python v3.3]
On Wed, 10 Apr 2013 08:28:55 +, Steven D'Aprano wrote: If you want to be helpful, how about leading by example and taking on some of the less coherent newbie questions [...] On that note, I think I'll take the opportunity to give thanks to Peter Otten, who (if I remember correctly) has been here for longer than I have, and I've been here for a long time. In all that time, I don't think I've ever seen him snap at or be rude to anyone, not even those who deserved it, and he doesn't shy away from answering even the most poorly written questions. Peter, I don't know how you do it, but you're doing a fantastic job. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: People in the python community [was Re: Unicode issue with Python v3.3]
On 10/04/2013 09:34, Steven D'Aprano wrote: On that note, I think I'll take the opportunity to give thanks to Peter Otten, who (if I remember correctly) has been here for longer than I have, and I've been here for a long time. In all that time, I don't think I've ever seen him snap at or be rude to anyone, not even those who deserved it, and he doesn't shy away from answering even the most poorly written questions. Peter, I don't know how you do it, but you're doing a fantastic job. Seconded. For those who don't know Peter is always responding to queries on the tutor mailing list as well. Definite case of the patience of a saint. -- If you're using GoogleCrap™ please read this http://wiki.python.org/moin/GoogleGroupsPython. Mark Lawrence -- http://mail.python.org/mailman/listinfo/python-list
Re: People in the python community [was Re: Unicode issue with Python v3.3]
os.environ['HTTP_USER_AGENT'] is only set when running from browser. so i faked it by using: userinfo = os.environ.get('HTTP_USER_AGENT', 'some default') but the encoding issues are still there. -- http://mail.python.org/mailman/listinfo/python-list
Re: People in the python community [was Re: Unicode issue with Python v3.3]
Thank you just altered it but i still get the same encoding issues. please its only a matter of simple alternation that iam not able to see. When you have the time plz take a look. Thank you! -- http://mail.python.org/mailman/listinfo/python-list
Re: People in the python community [was Re: Unicode issue with Python v3.3]
Steven D'Aprano wrote: On Wed, 10 Apr 2013 08:28:55 +, Steven D'Aprano wrote: If you want to be helpful, how about leading by example and taking on some of the less coherent newbie questions [...] On that note, I think I'll take the opportunity to give thanks to Peter Otten, who (if I remember correctly) has been here for longer than I have, and I've been here for a long time. In all that time, I don't think I've ever seen him snap at or be rude to anyone, not even those who deserved it, and he doesn't shy away from answering even the most poorly written questions. Peter, I don't know how you do it, but you're doing a fantastic job. Thank you :) -- http://mail.python.org/mailman/listinfo/python-list
Re: People in the python community [was Re: Unicode issue with Python v3.3]
Mark Lawrence wrote: On 10/04/2013 09:34, Steven D'Aprano wrote: On that note, I think I'll take the opportunity to give thanks to Peter Otten, who (if I remember correctly) has been here for longer than I have, and I've been here for a long time. In all that time, I don't think I've ever seen him snap at or be rude to anyone, not even those who deserved it, and he doesn't shy away from answering even the most poorly written questions. Peter, I don't know how you do it, but you're doing a fantastic job. Seconded. For those who don't know Peter is always responding to queries on the tutor mailing list as well. Definite case of the patience of a saint. You're invited as a speaker to my funeral ;) -- http://mail.python.org/mailman/listinfo/python-list
Re: People in the python community [was Re: Unicode issue with Python v3.3]
Anyone please? -- http://mail.python.org/mailman/listinfo/python-list
Re: People in the python community [was Re: Unicode issue with Python v3.3]
On 10/04/2013 15:43, Νίκος Γκρ33κ wrote: Anyone please? I have already shown my support for Peter Otten on this thread. Are you asking for more people to do so? -- If you're using GoogleCrap™ please read this http://wiki.python.org/moin/GoogleGroupsPython. Mark Lawrence -- http://mail.python.org/mailman/listinfo/python-list
Re: People in the python community [was Re: Unicode issue with Python v3.3]
On Thu, Apr 11, 2013 at 1:15 AM, Mark Lawrence breamore...@yahoo.co.uk wrote: On 10/04/2013 15:43, Νίκος Γκρ33κ wrote: Anyone please? I have already shown my support for Peter Otten on this thread. Are you asking for more people to do so? Sure, I can! He's one of the people who keeps this list/ng productive and helpful. People can come here with Python problems and get Python solutions. (I wouldn't normally me too a thread, but hey, with that opening!) ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: People in the python community [was Re: Unicode issue with Python v3.3]
I'am not sure i follow you. How did my topic changed?! Is this possible? How about the oce i posted at patebin.com. Did anyone by any chnace had a look into? It's only a single thing iam missing for the encoding and the the script will load properly with python 3.3 -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Wed, 10 Apr 2013 00:23:46 -0700, nagia.retsina wrote: Look at what 'python3 metrites.py' gives me File /root/.local/lib/python2.7/lib/python3.3/os.py, line 669, ... ^^^ ^^^ -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Τη Τετάρτη, 10 Απριλίου 2013 9:08:38 μ.μ. UTC+3, ο χρήστης Nobody έγραψε: On Wed, 10 Apr 2013 00:23:46 -0700, nagia.retsina wrote: Look at what 'python3 metrites.py' gives me File /root/.local/lib/python2.7/lib/python3.3/os.py, line 669, ... ^^^ ^^^ Yes i see it in the traceback but i dont know what it means. Please explain to me. Tahnk you. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Wed, Apr 10, 2013 at 12:25 PM, Νίκος Γκρ33κ nikos.gr...@gmail.com wrote: Τη Τετάρτη, 10 Απριλίου 2013 9:08:38 μ.μ. UTC+3, ο χρήστης Nobody έγραψε: On Wed, 10 Apr 2013 00:23:46 -0700, nagia.retsina wrote: Look at what 'python3 metrites.py' gives me File /root/.local/lib/python2.7/lib/python3.3/os.py, line 669, ... ^^^ ^^^ Yes i see it in the traceback but i dont know what it means. Please explain to me. Tahnk you. It means that there is something very strange about the way that your Python 3.3 is installed, as the libraries appear to be installed under your Python 2.7 library directory. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On 10 April 2013 09:28, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Tue, 09 Apr 2013 23:04:35 -0700, rusi wrote: [...] I think it is quite unfair of you to mischaracterise the entire community response in this way. One person made a light-hearted, silly, unhelpful response. (As sarcasm, I'm afraid it missed the target.) Two people made good, sensible responses -- and you were not either of them. Enough already with the thought police. It was me who made the silly reply to the guy who was ranting about everything being broken, giving us nothing to help in on, ending his message in an edifying and in my judgement, largely rhetorical Suggestions?. So I gave him some silly suggestions (*not* intended to be sarcasm), and I'm not apologising for it. At least I'm not presuming to take the moral high ground at every half-opportunity. Recently I gave a very quick reply to someone who was wondering why he couldn't get the docstring from his descriptor - I didn't have the time to expand because two of my kids had jumped on my knees almost as soon as I'd got on the computer. I decided to post the reply anyway as I thought it would give the OP something to get started on and nobody else seemed to have replied so far - but I got remonstrated for not being complete enough in my reply! What is that about? AFAIK, this is not Python Customer Service, but a place for people who are interested in Python to discuss problems and *freely* exchange thoughts about the language and its ecosystem. Over the year I've posted the occasional silly message but I think my record is overwhelmingly that I've tried to be helpful, and when I've needed some help myself, I've got some great advice. My first question on this list was answered by Alex Martelli and nowadays I get most excellent and concise tips from Peter Otten - thanks, Peter! If there's one person on this list I don't want to offend, it's you! So here's to lots more good and bad humour on this list, and the occasional slightly un-pc remark even! Cheers, -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On 10Apr2013 01:06, Νίκος Γκρ33κ nikos.gr...@gmail.com wrote: | Here is the whole code for metrites.py in case someone wants to take allok. | | Everything is correct after altering it to meet python 3.3, | everythign aprt from the weird unicode error thing. | | http://pastebin.com/5Mpjx5Fd | | please take a look. From looking at the HTML source of the page: http://superhost.gr/ I see near the start: b'!DOCTYPE html I'd say you have a bytes object that you've fed to print(). In python2, str is effectively bytes. In python3, str is a sequence of Unicode code points, and bytes are arrays of small integers. If you feed a bytes object to print it will print a strig represenation of it, starting with b' The question is: where did the bytes object come from? A cursory glance through your pastebin code doesn't show me anthing very obvious. I'd start by asking: where does the string !DOCTYPE come from? Wherever that is, it seems to be bytes rather than str. Start with that. Cheers, -- Cameron Simpson c...@zip.com.au You don't have to live on the edge, but you have to know where it is. - Scott Lilliott, c...@swl.msd.ray.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Firtly thank uou for taking a look into the code. the doctype is coming form the attempt of script metrites.py to open and read the 'index.html' file. But i don't know how to try to open it as a byte file instead of an tetxt file. -- http://mail.python.org/mailman/listinfo/python-list
Unicode issue with Python v3.3
Hello, iam still trying to alter the code form python 2.6 = 3.3 Everyrging its setup except that unicode error that you can see if you go to http://superhost.gr Can anyone help with this? I even tried to change print() with sys.stdout.buffer() but still i get the same unicode issue. I don't know what to try anymore. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Tue, Apr 9, 2013 at 3:10 PM, Νίκος Γκρ33κ nikos.gr...@gmail.com wrote: Hello, iam still trying to alter the code form python 2.6 = 3.3 Everyrging its setup except that unicode error that you can see if you go to http://superhost.gr Can anyone help with this? I even tried to change print() with sys.stdout.buffer() but still i get the same unicode issue. I don't know what to try anymore. It seems to be failing on the line: host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0] So the obvious question to ask is: what are the contents of os.environ['REMOTE_ADDR'] when this line is reached? And why are you still trying to solve these sorts of problems on your production website? Do you not have a development or staging environment? -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
Τη Τετάρτη, 10 Απριλίου 2013 12:34:25 π.μ. UTC+3, ο χρήστης Ian έγραψε: On Tue, Apr 9, 2013 at 3:10 PM, Νίκος Γκρ33κ nikos.gr...@gmail.com wrote: Hello, iam still trying to alter the code form python 2.6 = 3.3 Everyrging its setup except that unicode error that you can see if you go to http://superhost.gr Can anyone help with this? I even tried to change print() with sys.stdout.buffer() but still i get the same unicode issue. I don't know what to try anymore. It seems to be failing on the line: host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0] So the obvious question to ask is: what are the contents of os.environ['REMOTE_ADDR'] when this line is reached? And why are you still trying to solve these sorts of problems on your production website? Do you not have a development or staging environment? No forget this line. this is not the problem. No i don't have a testing enviroment, i altered all the code form 2.6 to 3.3 in the live enviromtnt. i strongly believe there is somethign goind wrong with the prints(). Thoese are causing the unicode isu es much like as thes changes from: quote = random.choice( list( open( /home/nikos/www/data/private/quotes.txt, ) ) ) quote = random.choice( list( open( /home/nikos/www/data/private/quotes.txt, encoding=utf-8 ) ) ) in order for the open() to work. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Tue, 09 Apr 2013 20:16:12 -0700, nagia.retsina wrote: Τη Τετάρτη, 10 Απριλίου 2013 12:34:25 π.μ. UTC+3, ο χρήστης Ian έγραψε: On Tue, Apr 9, 2013 at 3:10 PM, Νίκος Γκρ33κ nikos.gr...@gmail.com wrote: Hello, iam still trying to alter the code form python 2.6 = 3.3 Everyrging its setup except that unicode error that you can see if you go to http://superhost.gr Can anyone help with this? I even tried to change print() with sys.stdout.buffer() but still i get the same unicode issue. I don't know what to try anymore. It seems to be failing on the line: host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0] So the obvious question to ask is: what are the contents of os.environ['REMOTE_ADDR'] when this line is reached? [...] No forget this line. this is not the problem. No i don't have a testing enviroment, i altered all the code form 2.6 to 3.3 in the live enviromtnt. i strongly believe there is somethign goind wrong with the prints(). Obviously you know what the problem is much better than the Python interpreter. I suggest you open a bug report: Errors printing bytes are wrongly claimed to be socket errors and see what happens. Or, you can listen to people who actually know what they are talking about, and look at the actual error, which has NOTHING to do with print. What does os.environ['REMOTE_ADDR'] give? Until you answer that question, you won't make any progress. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
On Wed, Apr 10, 2013 at 2:25 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Tue, 09 Apr 2013 20:16:12 -0700, nagia.retsina wrote: Τη Τετάρτη, 10 Απριλίου 2013 12:34:25 π.μ. UTC+3, ο χρήστης Ian έγραψε: On Tue, Apr 9, 2013 at 3:10 PM, Νίκος Γκρ33κ nikos.gr...@gmail.com wrote: Hello, iam still trying to alter the code form python 2.6 = 3.3 Everyrging its setup except that unicode error that you can see if you go to http://superhost.gr Can anyone help with this? I even tried to change print() with sys.stdout.buffer() but still i get the same unicode issue. I don't know what to try anymore. It seems to be failing on the line: host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0] So the obvious question to ask is: what are the contents of os.environ['REMOTE_ADDR'] when this line is reached? [...] No forget this line. this is not the problem. No i don't have a testing enviroment, i altered all the code form 2.6 to 3.3 in the live enviromtnt. i strongly believe there is somethign goind wrong with the prints(). Obviously you know what the problem is much better than the Python interpreter. I just went to the page and it started playing sound. Between that and this arrogant refusal to believe either the interpreter or the people who are freely donating time to assist, I'm done. No more looking at Nikos's home page to try to figure out his problems. Have fun, Nikos. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue with Python v3.3
An interesting case of two threads: On Apr 10, 9:46 am, Chris Angelico ros...@gmail.com wrote: On Wed, Apr 10, 2013 at 2:25 PM, Steven D'Aprano Obviously you know what the problem is much better than the Python interpreter. I just went to the page and it started playing sound. Between that and this arrogant refusal to believe either the interpreter or the people who are freely donating time to assist, I'm done. No more looking at Nikos's home page to try to figure out his problems. Have fun, Nikos. ChrisA Some swans are black Some homo sapiens have negative IQ -- http://mail.python.org/mailman/listinfo/python-list
[issue6077] Unicode issue with tempfile on Windows
Amaury Forgeot d'Arc amaur...@gmail.com added the comment: Fixed with r76593 (py3k) and r76594 (release31-maint) -- resolution: accepted - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6077 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6077] Unicode issue with tempfile on Windows
Changes by Antoine Pitrou pit...@free.fr: -- components: +IO -Library (Lib) priority: - normal stage: - patch review versions: +Python 3.1, Python 3.2 -Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6077 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6077] Unicode issue with tempfile on Windows
Antoine Pitrou pit...@free.fr added the comment: The patch looks ok to me. -- assignee: - amaury.forgeotdarc nosy: +pitrou resolution: - accepted stage: patch review - commit review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6077 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: unicode issue
En Thu, 01 Oct 2009 12:10:58 -0300, Walter Dörwald wal...@livinglogic.de escribió: On 01.10.09 16:09, Hyuga wrote: On Sep 30, 3:34 am, gentlestone tibor.b...@hotmail.com wrote: _MAP = { # LATIN u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A', u'Æ': 'AE', u'Ç':'C', [...long table...] } def downcode(name): downcode(uŽabovitá zmiešaná kaša) u'Zabovita zmiesana kasa' for key, value in _MAP.iteritems(): name = name.replace(key, value) return name import unicodedata def downcode(name): return unicodedata.normalize(NFD, name)\ .encode(ascii, ignore)\ .decode(ascii) This article [1] shows a mixed technique, decomposing characters when such info is available in the Unicode tables, and also allowing for a custom mapping when not. [1] http://effbot.org/zone/unicode-convert.htm -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
save in utf-8 the coding declaration also has to be utf-8 ok, I understand, but what's the problem? Unfortunately seems to be the Python interactive mode doesn't have unicode support. It recognize the latin-1 encoding only. So I have 2 options, how to write doctest: 1. Replace native charaters with their encoded representation like u\u017dabovit\xe1 zmie\u0161an\xe1 ka\u0161a instead of uŽabovitá zmiešaná kaša 2. Use latin-1 encoding, where the file is saved in utf-8 The first is bad because doctest is a great documenttion tool and it is propably the main reason I use python. And something like u\u017dabovit\xe1 zmie\u0161an\xe1 ka\u0161a is not a best documentation style. But the tests work. The second is bad, because the declaration is incorrect and if I use it in Django model declaration for example I got bad data in the application. So what is the solution? Back to Java? :-) -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
gentlestone wrote: save in utf-8 the coding declaration also has to be utf-8 ok, I understand, but what's the problem? Unfortunately seems to be the Python interactive mode doesn't have unicode support. It recognize the latin-1 encoding only. So I have 2 options, how to write doctest: 1. Replace native charaters with their encoded representation like u\u017dabovit\xe1 zmie\u0161an\xe1 ka\u0161a instead of uŽabovitá zmiešaná kaša 2. Use latin-1 encoding, where the file is saved in utf-8 The first is bad because doctest is a great documenttion tool and it is propably the main reason I use python. And something like u\u017dabovit\xe1 zmie\u0161an\xe1 ka\u0161a is not a best documentation style. But the tests work. The second is bad, because the declaration is incorrect and if I use it in Django model declaration for example I got bad data in the application. So what is the solution? Back to Java? :-) Wait -- don't give up yet. Since I'm one of the ones who (partially) steered you wrong, let me try to help. Key variable here is how your text editor behaves. Since I've never taken my (programming) text editor out of ASCII mode before this week, it took some experimenting (and more importantly a message from Piet on this thread) to make sense of things. I think I now know how to make my own editor (Komodo IDE) behave in this environment, and you probably can do as well or better. In fact, judging from your messages, you probably are doing much better on the editor front. When I tried this morning to re-open that test file from yesterday, many of the characters were all messed up. I was okay as long as the project was still open, but not today. The editor itself apparently looks to that encoding declaration when it's deciding how to interpret the bytes on disk. So I did the following, using Komodo IDE. I created a new file in the project. Before saving it, I used Edit-CurrentFileSettings-Properties-Encoding to set it to UTF-8. *NOW* I pasted the stuff from your email message. And added the #-*- coding: utf-8 -*- as the second line of the file. Notice it's *NOT* latin-1. At this point I save and run the file, and it seems to work fine. My guess is that I could set these as default settings in Komodo, if I were doing UTF-8 very often, and it would become painless. I know I have certain stuff in my python template, and could add that encoding line as well. Anyway, that gets us to the step of running the doctest. The trick here seems to be that we need to define the docstring as a Unicode docstring to have it interpreted correctly. Try adding the u in front of the triple quote as follows: def downcode(name): u downcode(uŽabovitá zmiešaná kaša) u'Zabovita zmiesana kasa' for key, value in _MAP.iteritems(): name = name.replace(key, value) return name Now, if the doctest passes, we seem to be in good shape. There's another problem, that hopefully somebody else can help with. That's if doctest needs to report an error. When I deliberately changed the expect string I get an error like the following. UnicodeEncodeError: 'ascii' codec can't encode character u'\u017d' in position 1 50: ordinal not in range(128) I get a similar error if running the -v option on doctest. (Note that I do *NOT* get the error when running inside Komodo. And what I've read implies that the same would be true if running inside IDLE.) The problem is similar to the one you'd have doing a simple: print u\u017d I think these are avoided if sys.stdout.encoding (and maybe sys.stderr.encoding) are set to utf-8. On my system they're set to None, which says to use the system default encoding. On my system that would be ASCII, so I get the error. But perhaps yours is already something better. I found links: http://drj11.wordpress.com/2007/05/14/python-how-is-sysstdoutencoding-chosen/ http://wiki.python.org/moin/PrintFails http://lists.macromates.com/textmate/2008-June/025735.html which indicate you may want to try: set LC_CTYPE=en_GB.utf-8 python at the command prompt before running python. This could be system specific; it didn't work for me on XP. The workaround that works for me (so far) is: if __name__ == __main__: import sys, codecs sys.stdout = codecs.getwriter('utf8')(sys.stdout) print uŽabovitá zmiešaná kaša import doctest doctest.testmod() The codecs line tells python that stdout should use utf-8. That doesn't make the characters look good on my console, but at least it avoids the errors. I'm guessing that on my system I should use latin1 here instead of utf8. But I don't want to confuse things. HTH DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
On Sep 30, 3:34 am, gentlestone tibor.b...@hotmail.com wrote: Why don't work this code on Python 2.6? Or how can I do this job? _MAP = { # LATIN u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A', u'Æ': 'AE', u'Ç':'C', u'È': 'E', u'É': 'E', u'Ê': 'E', u'Ë': 'E', u'Ì': 'I', u'Í': 'I', u'Î': 'I', u'Ï': 'I', u'Ð': 'D', u'Ñ': 'N', u'Ò': 'O', u'Ó': 'O', u'Ô': 'O', u'Õ': 'O', u'Ö':'O', u'Ő': 'O', u'Ø': 'O', u'Ù': 'U', u'Ú': 'U', u'Û': 'U', u'Ü': 'U', u'Ű': 'U', u'Ý': 'Y', u'Þ': 'TH', u'ß': 'ss', u'à':'a', u'á':'a', u'â': 'a', u'ã': 'a', u'ä':'a', u'å': 'a', u'æ': 'ae', u'ç': 'c', u'è': 'e', u'é': 'e', u'ê': 'e', u'ë': 'e', u'ì': 'i', u'í': 'i', u'î': 'i', u'ï': 'i', u'ð': 'd', u'ñ': 'n', u'ò': 'o', u'ó':'o', u'ô': 'o', u'õ': 'o', u'ö': 'o', u'ő': 'o', u'ø': 'o', u'ù': 'u', u'ú': 'u', u'û': 'u', u'ü': 'u', u'ű': 'u', u'ý': 'y', u'þ': 'th', u'ÿ': 'y', # LATIN_SYMBOLS u'©':'(c)', # GREEK u'α':'a', u'β':'b', u'γ':'g', u'δ':'d', u'ε':'e', u'ζ':'z', u'η':'h', u'θ':'8', u'ι':'i', u'κ':'k', u'λ':'l', u'μ':'m', u'ν':'n', u'ξ':'3', u'ο':'o', u'π':'p', u'ρ':'r', u'σ':'s', u'τ':'t', u'υ':'y', u'φ':'f', u'χ':'x', u'ψ':'ps', u'ω':'w', u'ά':'a', u'έ':'e', u'ί':'i', u'ό':'o', u'ύ':'y', u'ή':'h', u'ώ':'w', u'ς':'s', u'ϊ':'i', u'ΰ':'y', u'ϋ':'y', u'ΐ':'i', u'Α':'A', u'Β':'B', u'Γ':'G', u'Δ':'D', u'Ε':'E', u'Ζ':'Z', u'Η':'H', u'Θ':'8', u'Ι':'I', u'Κ':'K', u'Λ':'L', u'Μ':'M', u'Ν':'N', u'Ξ':'3', u'Ο':'O', u'Π':'P', u'Ρ':'R', u'Σ':'S', u'Τ':'T', u'Υ':'Y', u'Φ':'F', u'Χ':'X', u'Ψ':'PS', u'Ω':'W', u'Ά':'A', u'Έ':'E', u'Ί':'I', u'Ό':'O', u'Ύ':'Y', u'Ή':'H', u'Ώ':'W', u'Ϊ':'I', u'Ϋ':'Y', # TURKISH u'ş':'s', u'Ş':'S', u'ı':'i', u'İ':'I', u'ç':'c', u'Ç':'C', u'ü':'u', u'Ü':'U', u'ö':'o', u'Ö':'O', u'ğ':'g', u'Ğ':'G', # RUSSIAN u'а':'a', u'б':'b', u'в':'v', u'г':'g', u'д':'d', u'е':'e', u'ё':'yo', u'ж':'zh', u'з':'z', u'и':'i', u'й':'j', u'к':'k', u'л':'l', u'м':'m', u'н':'n', u'о':'o', u'п':'p', u'р':'r', u'с':'s', u'т':'t', u'у':'u', u'ф':'f', u'х':'h', u'ц':'c', u'ч':'ch', u'ш':'sh', u'щ':'sh', u'ъ':'', u'ы':'y', u'ь':'', u'э':'e', u'ю':'yu', u'я':'ya', u'А':'A', u'Б':'B', u'В':'V', u'Г':'G', u'Д':'D', u'Е':'E', u'Ё':'Yo', u'Ж':'Zh', u'З':'Z', u'И':'I', u'Й':'J', u'К':'K', u'Л':'L', u'М':'M', u'Н':'N', u'О':'O', u'П':'P', u'Р':'R', u'С':'S', u'Т':'T', u'У':'U', u'Ф':'F', u'Х':'H', u'Ц':'C', u'Ч':'Ch', u'Ш':'Sh', u'Щ':'Sh', u'Ъ':'', u'Ы':'Y', u'Ь':'', u'Э':'E', u'Ю':'Yu', u'Я':'Ya', # UKRAINIAN u'Є':'Ye', u'І':'I', u'Ї':'Yi', u'Ґ':'G', u'є':'ye', u'і':'i', u'ї':'yi', u'ґ':'g', # CZECH u'č':'c', u'ď':'d', u'ě':'e', u'ň':'n', u'ř':'r', u'š':'s', u'ť':'t', u'ů':'u', u'ž':'z', u'Č':'C', u'Ď':'D', u'Ě':'E', u'Ň':'N', u'Ř':'R', u'Š':'S', u'Ť':'T', u'Ů':'U', u'Ž':'Z', # POLISH u'ą':'a', u'ć':'c', u'ę':'e', u'ł':'l', u'ń':'n', u'ó':'o', u'ś':'s', u'ź':'z', u'ż':'z', u'Ą':'A', u'Ć':'C', u'Ę':'e', u'Ł':'L', u'Ń':'N', u'Ó':'o', u'Ś':'S', u'Ź':'Z', u'Ż':'Z', # LATVIAN u'ā':'a', u'č':'c', u'ē':'e', u'ģ':'g', u'ī':'i', u'ķ':'k', u'ļ':'l', u'ņ':'n', u'š':'s', u'ū':'u', u'ž':'z', u'Ā':'A', u'Č':'C', u'Ē':'E', u'Ģ':'G', u'Ī':'i', u'Ķ':'k', u'Ļ':'L', u'Ņ':'N', u'Š':'S', u'Ū':'u', u'Ž':'Z' } def downcode(name): downcode(uŽabovitá zmiešaná kaša) u'Zabovita zmiesana kasa' for key, value in _MAP.iteritems(): name = name.replace(key, value) return name Though C Python is pretty optimized under the hood for this sort of single-character replacement, this still seems pretty inefficient since you're calling replace for every character you want to map. I think that a better approach might be something like: def downcode(name): return ''.join(_MAP.get(c, c) for c in name) Or using string.translate: import string def downcode(name): table = string.maketrans( 'ÀÁÂÃÄÅ...', 'AA...') return name.translate(table) -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
On 01.10.09 16:09, Hyuga wrote: On Sep 30, 3:34 am, gentlestone tibor.b...@hotmail.com wrote: Why don't work this code on Python 2.6? Or how can I do this job? _MAP = { # LATIN u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A', u'Æ': 'AE', u'Ç':'C', u'È': 'E', u'É': 'E', u'Ê': 'E', u'Ë': 'E', u'Ì': 'I', u'Í': 'I', u'Î': 'I', u'Ï': 'I', u'Ð': 'D', u'Ñ': 'N', u'Ò': 'O', u'Ó': 'O', u'Ô': 'O', u'Õ': 'O', u'Ö':'O', u'Ő': 'O', u'Ø': 'O', u'Ù': 'U', u'Ú': 'U', u'Û': 'U', u'Ü': 'U', u'Ű': 'U', u'Ý': 'Y', u'Þ': 'TH', u'ß': 'ss', u'à':'a', u'á':'a', u'â': 'a', u'ã': 'a', u'ä':'a', u'å': 'a', u'æ': 'ae', u'ç': 'c', u'è': 'e', u'é': 'e', u'ê': 'e', u'ë': 'e', u'ì': 'i', u'í': 'i', u'î': 'i', u'ï': 'i', u'ð': 'd', u'ñ': 'n', u'ò': 'o', u'ó':'o', u'ô': 'o', u'õ': 'o', u'ö': 'o', u'ő': 'o', u'ø': 'o', u'ù': 'u', u'ú': 'u', u'û': 'u', u'ü': 'u', u'ű': 'u', u'ý': 'y', u'þ': 'th', u'ÿ': 'y', # LATIN_SYMBOLS u'©':'(c)', # GREEK u'α':'a', u'β':'b', u'γ':'g', u'δ':'d', u'ε':'e', u'ζ':'z', u'η':'h', u'θ':'8', u'ι':'i', u'κ':'k', u'λ':'l', u'μ':'m', u'ν':'n', u'ξ':'3', u'ο':'o', u'π':'p', u'ρ':'r', u'σ':'s', u'τ':'t', u'υ':'y', u'φ':'f', u'χ':'x', u'ψ':'ps', u'ω':'w', u'ά':'a', u'έ':'e', u'ί':'i', u'ό':'o', u'ύ':'y', u'ή':'h', u'ώ':'w', u'ς':'s', u'ϊ':'i', u'ΰ':'y', u'ϋ':'y', u'ΐ':'i', u'Α':'A', u'Β':'B', u'Γ':'G', u'Δ':'D', u'Ε':'E', u'Ζ':'Z', u'Η':'H', u'Θ':'8', u'Ι':'I', u'Κ':'K', u'Λ':'L', u'Μ':'M', u'Ν':'N', u'Ξ':'3', u'Ο':'O', u'Π':'P', u'Ρ':'R', u'Σ':'S', u'Τ':'T', u'Υ':'Y', u'Φ':'F', u'Χ':'X', u'Ψ':'PS', u'Ω':'W', u'Ά':'A', u'Έ':'E', u'Ί':'I', u'Ό':'O', u'Ύ':'Y', u'Ή':'H', u'Ώ':'W', u'Ϊ':'I', u'Ϋ':'Y', # TURKISH u'ş':'s', u'Ş':'S', u'ı':'i', u'İ':'I', u'ç':'c', u'Ç':'C', u'ü':'u', u'Ü':'U', u'ö':'o', u'Ö':'O', u'ğ':'g', u'Ğ':'G', # RUSSIAN u'а':'a', u'б':'b', u'в':'v', u'г':'g', u'д':'d', u'е':'e', u'ё':'yo', u'ж':'zh', u'з':'z', u'и':'i', u'й':'j', u'к':'k', u'л':'l', u'м':'m', u'н':'n', u'о':'o', u'п':'p', u'р':'r', u'с':'s', u'т':'t', u'у':'u', u'ф':'f', u'х':'h', u'ц':'c', u'ч':'ch', u'ш':'sh', u'щ':'sh', u'ъ':'', u'ы':'y', u'ь':'', u'э':'e', u'ю':'yu', u'я':'ya', u'А':'A', u'Б':'B', u'В':'V', u'Г':'G', u'Д':'D', u'Е':'E', u'Ё':'Yo', u'Ж':'Zh', u'З':'Z', u'И':'I', u'Й':'J', u'К':'K', u'Л':'L', u'М':'M', u'Н':'N', u'О':'O', u'П':'P', u'Р':'R', u'С':'S', u'Т':'T', u'У':'U', u'Ф':'F', u'Х':'H', u'Ц':'C', u'Ч':'Ch', u'Ш':'Sh', u'Щ':'Sh', u'Ъ':'', u'Ы':'Y', u'Ь':'', u'Э':'E', u'Ю':'Yu', u'Я':'Ya', # UKRAINIAN u'Є':'Ye', u'І':'I', u'Ї':'Yi', u'Ґ':'G', u'є':'ye', u'і':'i', u'ї':'yi', u'ґ':'g', # CZECH u'č':'c', u'ď':'d', u'ě':'e', u'ň':'n', u'ř':'r', u'š':'s', u'ť':'t', u'ů':'u', u'ž':'z', u'Č':'C', u'Ď':'D', u'Ě':'E', u'Ň':'N', u'Ř':'R', u'Š':'S', u'Ť':'T', u'Ů':'U', u'Ž':'Z', # POLISH u'ą':'a', u'ć':'c', u'ę':'e', u'ł':'l', u'ń':'n', u'ó':'o', u'ś':'s', u'ź':'z', u'ż':'z', u'Ą':'A', u'Ć':'C', u'Ę':'e', u'Ł':'L', u'Ń':'N', u'Ó':'o', u'Ś':'S', u'Ź':'Z', u'Ż':'Z', # LATVIAN u'ā':'a', u'č':'c', u'ē':'e', u'ģ':'g', u'ī':'i', u'ķ':'k', u'ļ':'l', u'ņ':'n', u'š':'s', u'ū':'u', u'ž':'z', u'Ā':'A', u'Č':'C', u'Ē':'E', u'Ģ':'G', u'Ī':'i', u'Ķ':'k', u'Ļ':'L', u'Ņ':'N', u'Š':'S', u'Ū':'u', u'Ž':'Z' } def downcode(name): downcode(uŽabovitá zmiešaná kaša) u'Zabovita zmiesana kasa' for key, value in _MAP.iteritems(): name = name.replace(key, value) return name Though C Python is pretty optimized under the hood for this sort of single-character replacement, this still seems pretty inefficient since you're calling replace for every character you want to map. I think that a better approach might be something like: def downcode(name): return ''.join(_MAP.get(c, c) for c in name) Or using string.translate: import string def downcode(name): table = string.maketrans( 'ÀÁÂÃÄÅ...', 'AA...') return name.translate(table) Or even simpler: import unicodedata def downcode(name): return unicodedata.normalize(NFD, name)\ .encode(ascii, ignore)\ .decode(ascii) Servus, Walter -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
On Thu, 01 Oct 2009 08:10:58 -0700, Walter Dörwald wal...@livinglogic.de wrote: On 01.10.09 16:09, Hyuga wrote: On Sep 30, 3:34 am, gentlestone tibor.b...@hotmail.com wrote: Why don't work this code on Python 2.6? Or how can I do this job? [snip _MAP] def downcode(name): downcode(uŽabovitá zmiešaná kaša) u'Zabovita zmiesana kasa' for key, value in _MAP.iteritems(): name = name.replace(key, value) return name Though C Python is pretty optimized under the hood for this sort of single-character replacement, this still seems pretty inefficient since you're calling replace for every character you want to map. I think that a better approach might be something like: def downcode(name): return ''.join(_MAP.get(c, c) for c in name) Or using string.translate: import string def downcode(name): table = string.maketrans( 'ÀÁÂÃÄÅ...', 'AA...') return name.translate(table) Or even simpler: import unicodedata def downcode(name): return unicodedata.normalize(NFD, name)\ .encode(ascii, ignore)\ .decode(ascii) Servus, Walter As I understand it, the ignore argument to str.encode *removes* the undecodable characters, rather than replacing them with an ASCII approximation. Is that correct? If so, wouldn't that rather defeat the purpose? -- Rami Chowdhury Never attribute to malice that which can be attributed to stupidity -- Hanlon's Razor 408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD) -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
On 01.10.09 17:50, Rami Chowdhury wrote: On Thu, 01 Oct 2009 08:10:58 -0700, Walter Dörwald wal...@livinglogic.de wrote: On 01.10.09 16:09, Hyuga wrote: On Sep 30, 3:34 am, gentlestone tibor.b...@hotmail.com wrote: Why don't work this code on Python 2.6? Or how can I do this job? [snip _MAP] def downcode(name): downcode(uŽabovitá zmiešaná kaša) u'Zabovita zmiesana kasa' for key, value in _MAP.iteritems(): name = name.replace(key, value) return name Though C Python is pretty optimized under the hood for this sort of single-character replacement, this still seems pretty inefficient since you're calling replace for every character you want to map. I think that a better approach might be something like: def downcode(name): return ''.join(_MAP.get(c, c) for c in name) Or using string.translate: import string def downcode(name): table = string.maketrans( 'ÀÁÂÃÄÅ...', 'AA...') return name.translate(table) Or even simpler: import unicodedata def downcode(name): return unicodedata.normalize(NFD, name)\ .encode(ascii, ignore)\ .decode(ascii) Servus, Walter As I understand it, the ignore argument to str.encode *removes* the undecodable characters, rather than replacing them with an ASCII approximation. Is that correct? If so, wouldn't that rather defeat the purpose? Yes, but any accented characters have been split into the base character and the combining accent via normalize() before, so only the accent gets removed. Of course non-decomposable characters will be removed completely, but it would be possible to replace .encode(ascii, ignore).decode(ascii) with something like this: u.join(c for c in name if unicodedata.category(c) == Mn) Servus, Walter -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
Rami Chowdhury wrote: On Thu, 01 Oct 2009 08:10:58 -0700, Walter Dörwald wal...@livinglogic.de wrote: On 01.10.09 16:09, Hyuga wrote: On Sep 30, 3:34 am, gentlestone tibor.b...@hotmail.com wrote: Why don't work this code on Python 2.6? Or how can I do this job? [snip _MAP] def downcode(name): downcode(uŽabovitá zmiešaná kaša) u'Zabovita zmiesana kasa' for key, value in _MAP.iteritems(): name = name.replace(key, value) return name Though C Python is pretty optimized under the hood for this sort of single-character replacement, this still seems pretty inefficient since you're calling replace for every character you want to map. I think that a better approach might be something like: def downcode(name): return ''.join(_MAP.get(c, c) for c in name) Or using string.translate: import string def downcode(name): table = string.maketrans( 'ÀÁÂÃÄÅ...', 'AA...') return name.translate(table) Or even simpler: import unicodedata def downcode(name): return unicodedata.normalize(NFD, name)\ .encode(ascii, ignore)\ .decode(ascii) Servus, Walter As I understand it, the ignore argument to str.encode *removes* the undecodable characters, rather than replacing them with an ASCII approximation. Is that correct? If so, wouldn't that rather defeat the purpose? You didn't take the normalization step into your consideration. Example: import unicodedata s = uÄ unicodedata.normalize(NFD, s) u'A\u0308' _.encode(ascii, ignore) 'A' -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
On Thu, 01 Oct 2009 09:03:38 -0700, Walter Dörwald wal...@livinglogic.de wrote: Yes, but any accented characters have been split into the base character and the combining accent via normalize() before, so only the accent gets removed. Of course non-decomposable characters will be removed completely, but it would be possible to replace .encode(ascii, ignore).decode(ascii) with something like this: u.join(c for c in name if unicodedata.category(c) == Mn) Servus, Walter Thank you for the clarification! -- Rami Chowdhury Never attribute to malice that which can be attributed to stupidity -- Hanlon's Razor 408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD) -- http://mail.python.org/mailman/listinfo/python-list
unicode issue
Why don't work this code on Python 2.6? Or how can I do this job? _MAP = { # LATIN u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A', u'Æ': 'AE', u'Ç':'C', u'È': 'E', u'É': 'E', u'Ê': 'E', u'Ë': 'E', u'Ì': 'I', u'Í': 'I', u'Î': 'I', u'Ï': 'I', u'Ð': 'D', u'Ñ': 'N', u'Ò': 'O', u'Ó': 'O', u'Ô': 'O', u'Õ': 'O', u'Ö':'O', u'Ő': 'O', u'Ø': 'O', u'Ù': 'U', u'Ú': 'U', u'Û': 'U', u'Ü': 'U', u'Ű': 'U', u'Ý': 'Y', u'Þ': 'TH', u'ß': 'ss', u'à':'a', u'á':'a', u'â': 'a', u'ã': 'a', u'ä':'a', u'å': 'a', u'æ': 'ae', u'ç': 'c', u'è': 'e', u'é': 'e', u'ê': 'e', u'ë': 'e', u'ì': 'i', u'í': 'i', u'î': 'i', u'ï': 'i', u'ð': 'd', u'ñ': 'n', u'ò': 'o', u'ó':'o', u'ô': 'o', u'õ': 'o', u'ö': 'o', u'ő': 'o', u'ø': 'o', u'ù': 'u', u'ú': 'u', u'û': 'u', u'ü': 'u', u'ű': 'u', u'ý': 'y', u'þ': 'th', u'ÿ': 'y', # LATIN_SYMBOLS u'©':'(c)', # GREEK u'α':'a', u'β':'b', u'γ':'g', u'δ':'d', u'ε':'e', u'ζ':'z', u'η':'h', u'θ':'8', u'ι':'i', u'κ':'k', u'λ':'l', u'μ':'m', u'ν':'n', u'ξ':'3', u'ο':'o', u'π':'p', u'ρ':'r', u'σ':'s', u'τ':'t', u'υ':'y', u'φ':'f', u'χ':'x', u'ψ':'ps', u'ω':'w', u'ά':'a', u'έ':'e', u'ί':'i', u'ό':'o', u'ύ':'y', u'ή':'h', u'ώ':'w', u'ς':'s', u'ϊ':'i', u'ΰ':'y', u'ϋ':'y', u'ΐ':'i', u'Α':'A', u'Β':'B', u'Γ':'G', u'Δ':'D', u'Ε':'E', u'Ζ':'Z', u'Η':'H', u'Θ':'8', u'Ι':'I', u'Κ':'K', u'Λ':'L', u'Μ':'M', u'Ν':'N', u'Ξ':'3', u'Ο':'O', u'Π':'P', u'Ρ':'R', u'Σ':'S', u'Τ':'T', u'Υ':'Y', u'Φ':'F', u'Χ':'X', u'Ψ':'PS', u'Ω':'W', u'Ά':'A', u'Έ':'E', u'Ί':'I', u'Ό':'O', u'Ύ':'Y', u'Ή':'H', u'Ώ':'W', u'Ϊ':'I', u'Ϋ':'Y', # TURKISH u'ş':'s', u'Ş':'S', u'ı':'i', u'İ':'I', u'ç':'c', u'Ç':'C', u'ü':'u', u'Ü':'U', u'ö':'o', u'Ö':'O', u'ğ':'g', u'Ğ':'G', # RUSSIAN u'а':'a', u'б':'b', u'в':'v', u'г':'g', u'д':'d', u'е':'e', u'ё':'yo', u'ж':'zh', u'з':'z', u'и':'i', u'й':'j', u'к':'k', u'л':'l', u'м':'m', u'н':'n', u'о':'o', u'п':'p', u'р':'r', u'с':'s', u'т':'t', u'у':'u', u'ф':'f', u'х':'h', u'ц':'c', u'ч':'ch', u'ш':'sh', u'щ':'sh', u'ъ':'', u'ы':'y', u'ь':'', u'э':'e', u'ю':'yu', u'я':'ya', u'А':'A', u'Б':'B', u'В':'V', u'Г':'G', u'Д':'D', u'Е':'E', u'Ё':'Yo', u'Ж':'Zh', u'З':'Z', u'И':'I', u'Й':'J', u'К':'K', u'Л':'L', u'М':'M', u'Н':'N', u'О':'O', u'П':'P', u'Р':'R', u'С':'S', u'Т':'T', u'У':'U', u'Ф':'F', u'Х':'H', u'Ц':'C', u'Ч':'Ch', u'Ш':'Sh', u'Щ':'Sh', u'Ъ':'', u'Ы':'Y', u'Ь':'', u'Э':'E', u'Ю':'Yu', u'Я':'Ya', # UKRAINIAN u'Є':'Ye', u'І':'I', u'Ї':'Yi', u'Ґ':'G', u'є':'ye', u'і':'i', u'ї':'yi', u'ґ':'g', # CZECH u'č':'c', u'ď':'d', u'ě':'e', u'ň':'n', u'ř':'r', u'š':'s', u'ť':'t', u'ů':'u', u'ž':'z', u'Č':'C', u'Ď':'D', u'Ě':'E', u'Ň':'N', u'Ř':'R', u'Š':'S', u'Ť':'T', u'Ů':'U', u'Ž':'Z', # POLISH u'ą':'a', u'ć':'c', u'ę':'e', u'ł':'l', u'ń':'n', u'ó':'o', u'ś':'s', u'ź':'z', u'ż':'z', u'Ą':'A', u'Ć':'C', u'Ę':'e', u'Ł':'L', u'Ń':'N', u'Ó':'o', u'Ś':'S', u'Ź':'Z', u'Ż':'Z', # LATVIAN u'ā':'a', u'č':'c', u'ē':'e', u'ģ':'g', u'ī':'i', u'ķ':'k', u'ļ':'l', u'ņ':'n', u'š':'s', u'ū':'u', u'ž':'z', u'Ā':'A', u'Č':'C', u'Ē':'E', u'Ģ':'G', u'Ī':'i', u'Ķ':'k', u'Ļ':'L', u'Ņ':'N', u'Š':'S', u'Ū':'u', u'Ž':'Z' } def downcode(name): downcode(uŽabovitá zmiešaná kaša) u'Zabovita zmiesana kasa' for key, value in _MAP.iteritems(): name = name.replace(key, value) return name -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
On Wed, Sep 30, 2009 at 9:34 AM, gentlestone tibor.b...@hotmail.com wrote: Why don't work this code on Python 2.6? Or how can I do this job? Please be more specific than it doesn't work: * What exactly are you doing * What were you expecting the result of that to be * What is the actual result? -- André Engels, andreeng...@gmail.com -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
On 30. Sep., 09:41 h., Andre Engels andreeng...@gmail.com wrote: On Wed, Sep 30, 2009 at 9:34 AM, gentlestone tibor.b...@hotmail.com wrote: Why don't work this code on Python 2.6? Or how can I do this job? Please be more specific than it doesn't work: * What exactly are you doing * What were you expecting the result of that to be * What is the actual result? -- André Engels, andreeng...@gmail.com * What exactly are you doing replace non-ascii characters - see doctest documentation * What were you expecting the result of that to be see doctest documentation * What is the actual result? the actual result is unchanged name -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
I get the feeling that the problem is with the Python interactive mode. It does not have full unicode support, so uŽabovitá zmiešaná kaša is changed to u'\x8eabovit\xe1 zmie\x9aan\xe1 ka\x9aa'. If you call your code from another program, it might work correctly. -- André Engels, andreeng...@gmail.com -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
On 30. Sep., 10:35 h., Andre Engels andreeng...@gmail.com wrote: I get the feeling that the problem is with the Python interactive mode. It does not have full unicode support, so uŽabovitá zmiešaná kaša is changed to u'\x8eabovit\xe1 zmie\x9aan\xe1 ka\x9aa'. If you call your code from another program, it might work correctly. -- André Engels, andreeng...@gmail.com thx a lot I spent 2 days of my life beacause of this so doctests are unuseable for non-engish users in python - seems to be -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
On 30. Sep., 10:43 h., gentlestone tibor.b...@hotmail.com wrote: On 30. Sep., 10:35 h., Andre Engels andreeng...@gmail.com wrote: I get the feeling that the problem is with the Python interactive mode. It does not have full unicode support, so uŽabovitá zmiešaná kaša is changed to u'\x8eabovit\xe1 zmie\x9aan\xe1 ka\x9aa'. If you call your code from another program, it might work correctly. -- André Engels, andreeng...@gmail.com thx a lot I spent 2 days of my life beacause of this so doctests are unuseable for non-engish users in python - seems to be yes, you are right, now it works: def slugify(name): slugify(u'\u017dabovit\xe1 zmie\u0161an\xe1 ka\u0161a s.r.o') u'zabovita-zmiesana-kasa-sro' for key, value in _MAP.iteritems(): name = name.replace(key, value) return defaultfilters.slugify(name) -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
gentlestone wrote: Why don't work this code on Python 2.6? Or how can I do this job? _MAP = # LATIN u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A', u'Æ': 'AE', u'Ç':'C', u'È': 'E', u'É': 'E', u'Ê': 'E', u'Ë': 'E', u'Ì': 'I', u'Í': 'I', u'Î': 'I', u'Ï': 'I', u'Ð': 'D', u'Ñ': 'N', u'Ò': 'O', u'Ó': 'O', u'Ô': 'O', u'Õ': 'O', u'Ö':'O', u'Ő': 'O', u'Ø': 'O', u'Ù': 'U', u'Ú': 'U', u'Û': 'U', u'Ü': 'U', u'Ű': 'U', u'Ý': 'Y', u'Þ': 'TH', u'ß': 'ss', u'à':'a', u'á':'a', u'â': 'a', u'ã': 'a', u'ä':'a', u'å': 'a', u'æ': 'ae', u'ç': 'c', u'è': 'e', u'é': 'e', u'ê': 'e', u'ë': 'e', u'ì': 'i', u'í': 'i', u'î': 'i', u'ï': 'i', u'ð': 'd', u'ñ': 'n', u'ò': 'o', u'ó':'o', u'ô': 'o', u'õ': 'o', u'ö': 'o', u'ő': 'o', u'ø': 'o', u'ù': 'u', u'ú': 'u', u'û': 'u', u'ü': 'u', u'ű': 'u', u'ý': 'y', u'þ': 'th', u'ÿ': 'y', # LATIN_SYMBOLS u'©':'(c)', # GREEK u'α':'a', u'β':'b', u'γ':'g', u'δ':'d', u'ε':'e', u'ζ':'z', u'η':'h', u'θ':'8', u'ι':'i', u'κ':'k', u'λ':'l', u'μ':'m', u'ν':'n', u'ξ':'3', u'ο':'o', u'π':'p', u'ρ':'r', u'σ':'s', u'τ':'t', u'υ':'y', u'φ':'f', u'χ':'x', u'ψ':'ps', u'ω':'w', u'ά':'a', u'έ':'e', u'ί':'i', u'ό':'o', u'ύ':'y', u'ή':'h', u'ώ':'w', u'ς':'s', u'ϊ':'i', u'ΰ':'y', u'ϋ':'y', u'ΐ':'i', u'Α':'A', u'Β':'B', u'Γ':'G', u'Δ':'D', u'Ε':'E', u'Ζ':'Z', u'Η':'H', u'Θ':'8', u'Ι':'I', u'Κ':'K', u'Λ':'L', u'Μ':'M', u'Ν':'N', u'Ξ':'3', u'Ο':'O', u'Π':'P', u'Ρ':'R', u'Σ':'S', u'Τ':'T', u'Υ':'Y', u'Φ':'F', u'Χ':'X', u'Ψ':'PS', u'Ω':'W', u'Ά':'A', u'Έ':'E', u'Ί':'I', u'Ό':'O', u'Ύ':'Y', u'Ή':'H', u'Ώ':'W', u'Ϊ':'I', u'Ϋ':'Y', # TURKISH u'ş':'s', u'Ş':'S', u'ı':'i', u'İ':'I', u'ç':'c', u'Ç':'C', u'ü':'u', u'Ü':'U', u'ö':'o', u'Ö':'O', u'ğ':'g', u'Ğ':'G', # RUSSIAN u'а':'a', u'б':'b', u'в':'v', u'г':'g', u'д':'d', u'е':'e', u'ё':'yo', u'ж':'zh', u'з':'z', u'и':'i', u'й':'j', u'к':'k', u'л':'l', u'м':'m', u'н':'n', u'о':'o', u'п':'p', u'р':'r', u'с':'s', u'т':'t', u'у':'u', u'ф':'f', u'х':'h', u'ц':'c', u'ч':'ch', u'ш':'sh', u'щ':'sh', u'ъ':'', u'ы':'y', u'ь':'', u'э':'e', u'ю':'yu', u'я':'ya', u'А':'A', u'Б':'B', u'В':'V', u'Г':'G', u'Д':'D', u'Е':'E', u'Ё':'Yo', u'Ж':'Zh', u'З':'Z', u'И':'I', u'Й':'J', u'К':'K', u'Л':'L', u'М':'M', u'Н':'N', u'О':'O', u'П':'P', u'Р':'R', u'С':'S', u'Т':'T', u'У':'U', u'Ф':'F', u'Х':'H', u'Ц':'C', u'Ч':'Ch', u'Ш':'Sh', u'Щ':'Sh', u'Ъ':'', u'Ы':'Y', u'Ь':'', u'Э':'E', u'Ю':'Yu', u'Я':'Ya', # UKRAINIAN u'Є':'Ye', u'І':'I', u'Ї':'Yi', u'Ґ':'G', u'є':'ye', u'і':'i', u'ї':'yi', u'ґ':'g', # CZECH u'č':'c', u'ď':'d', u'ě':'e', u'ň':'n', u'ř':'r', u'š':'s', u'ť':'t', u'ů':'u', u'ž':'z', u'Č':'C', u'Ď':'D', u'Ě':'E', u'Ň':'N', u'Ř':'R', u'Š':'S', u'Ť':'T', u'Ů':'U', u'Ž':'Z', # POLISH u'ą':'a', u'ć':'c', u'ę':'e', u'ł':'l', u'ń':'n', u'ó':'o', u'ś':'s', u'ź':'z', u'ż':'z', u'Ą':'A', u'Ć':'C', u'Ę':'e', u'Ł':'L', u'Ń':'N', u'Ó':'o', u'Ś':'S', u'Ź':'Z', u'Ż':'Z', # LATVIAN u'ā':'a', u'č':'c', u'ē':'e', u'ģ':'g', u'ī':'i', u'ķ':'k', u'ļ':'l', u'ņ':'n', u'š':'s', u'ū':'u', u'ž':'z', u'Ā':'A', u'Č':'C', u'Ē':'E', u'Ģ':'G', u'Ī':'i', u'Ķ':'k', u'Ļ':'L', u'Ņ':'N', u'Š':'S', u'Ū':'u', u'Ž':'Z' } def downcode(name): downcode(uŽabovitá zmiešaná kaša) u'Zabovita zmiesana kasa' for key, value in _MAP.iteritems(): name =ame.replace(key, value) return name Works for me: rrr = downcode(uŽabovitá zmiešaná kaša) print repr(rrr) print rrr prints out: u'Zabovita zmiesana kasa' Zabovita zmiesana kasa I did have to add an encoding declaration as line 2 of the file: #-*- coding: latin-1 -*- and I had to convince my editor (Komodo) to save the file in utf-8. DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
On 30. Sep., 11:45 h., Dave Angel da...@dejaviewphoto.com wrote: gentlestone wrote: Why don't work this code on Python 2.6? Or how can I do this job? _MAP = # LATIN u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A', u'Æ': 'AE', u'Ç':'C', u'È': 'E', u'É': 'E', u'Ê': 'E', u'Ë': 'E', u'Ì': 'I', u'Í': 'I', u'Î': 'I', u'Ï': 'I', u'Ð': 'D', u'Ñ': 'N', u'Ò': 'O', u'Ó': 'O', u'Ô': 'O', u'Õ': 'O', u'Ö':'O', u'Ő': 'O', u'Ø': 'O', u'Ù': 'U', u'Ú': 'U', u'Û': 'U', u'Ü': 'U', u'Ű': 'U', u'Ý': 'Y', u'Þ': 'TH', u'ß': 'ss', u'à':'a', u'á':'a', u'â': 'a', u'ã': 'a', u'ä':'a', u'å': 'a', u'æ': 'ae', u'ç': 'c', u'è': 'e', u'é': 'e', u'ê': 'e', u'ë': 'e', u'ì': 'i', u'í': 'i', u'î': 'i', u'ï': 'i', u'ð': 'd', u'ñ': 'n', u'ò': 'o', u'ó':'o', u'ô': 'o', u'õ': 'o', u'ö': 'o', u'ő': 'o', u'ø': 'o', u'ù': 'u', u'ú': 'u', u'û': 'u', u'ü': 'u', u'ű': 'u', u'ý': 'y', u'þ': 'th', u'ÿ': 'y', # LATIN_SYMBOLS u'©':'(c)', # GREEK u'α':'a', u'β':'b', u'γ':'g', u'δ':'d', u'ε':'e', u'ζ':'z', u'η':'h', u'θ':'8', u'ι':'i', u'κ':'k', u'λ':'l', u'μ':'m', u'ν':'n', u'ξ':'3', u'ο':'o', u'π':'p', u'ρ':'r', u'σ':'s', u'τ':'t', u'υ':'y', u'φ':'f', u'χ':'x', u'ψ':'ps', u'ω':'w', u'ά':'a', u'έ':'e', u'ί':'i', u'ό':'o', u'ύ':'y', u'ή':'h', u'ώ':'w', u'ς':'s', u'ϊ':'i', u'ΰ':'y', u'ϋ':'y', u'ΐ':'i', u'Α':'A', u'Β':'B', u'Γ':'G', u'Δ':'D', u'Ε':'E', u'Ζ':'Z', u'Η':'H', u'Θ':'8', u'Ι':'I', u'Κ':'K', u'Λ':'L', u'Μ':'M', u'Ν':'N', u'Ξ':'3', u'Ο':'O', u'Π':'P', u'Ρ':'R', u'Σ':'S', u'Τ':'T', u'Υ':'Y', u'Φ':'F', u'Χ':'X', u'Ψ':'PS', u'Ω':'W', u'Ά':'A', u'Έ':'E', u'Ί':'I', u'Ό':'O', u'Ύ':'Y', u'Ή':'H', u'Ώ':'W', u'Ϊ':'I', u'Ϋ':'Y', # TURKISH u'ş':'s', u'Ş':'S', u'ı':'i', u'İ':'I', u'ç':'c', u'Ç':'C', u'ü':'u', u'Ü':'U', u'ö':'o', u'Ö':'O', u'ğ':'g', u'Ğ':'G', # RUSSIAN u'а':'a', u'б':'b', u'в':'v', u'г':'g', u'д':'d', u'е':'e', u'ё':'yo', u'ж':'zh', u'з':'z', u'и':'i', u'й':'j', u'к':'k', u'л':'l', u'м':'m', u'н':'n', u'о':'o', u'п':'p', u'р':'r', u'с':'s', u'т':'t', u'у':'u', u'ф':'f', u'х':'h', u'ц':'c', u'ч':'ch', u'ш':'sh', u'щ':'sh', u'ъ':'', u'ы':'y', u'ь':'', u'э':'e', u'ю':'yu', u'я':'ya', u'А':'A', u'Б':'B', u'В':'V', u'Г':'G', u'Д':'D', u'Е':'E', u'Ё':'Yo', u'Ж':'Zh', u'З':'Z', u'И':'I', u'Й':'J', u'К':'K', u'Л':'L', u'М':'M', u'Н':'N', u'О':'O', u'П':'P', u'Р':'R', u'С':'S', u'Т':'T', u'У':'U', u'Ф':'F', u'Х':'H', u'Ц':'C', u'Ч':'Ch', u'Ш':'Sh', u'Щ':'Sh', u'Ъ':'', u'Ы':'Y', u'Ь':'', u'Э':'E', u'Ю':'Yu', u'Я':'Ya', # UKRAINIAN u'Є':'Ye', u'І':'I', u'Ї':'Yi', u'Ґ':'G', u'є':'ye', u'і':'i', u'ї':'yi', u'ґ':'g', # CZECH u'č':'c', u'ď':'d', u'ě':'e', u'ň':'n', u'ř':'r', u'š':'s', u'ť':'t', u'ů':'u', u'ž':'z', u'Č':'C', u'Ď':'D', u'Ě':'E', u'Ň':'N', u'Ř':'R', u'Š':'S', u'Ť':'T', u'Ů':'U', u'Ž':'Z', # POLISH u'ą':'a', u'ć':'c', u'ę':'e', u'ł':'l', u'ń':'n', u'ó':'o', u'ś':'s', u'ź':'z', u'ż':'z', u'Ą':'A', u'Ć':'C', u'Ę':'e', u'Ł':'L', u'Ń':'N', u'Ó':'o', u'Ś':'S', u'Ź':'Z', u'Ż':'Z', # LATVIAN u'ā':'a', u'č':'c', u'ē':'e', u'ģ':'g', u'ī':'i', u'ķ':'k', u'ļ':'l', u'ņ':'n', u'š':'s', u'ū':'u', u'ž':'z', u'Ā':'A', u'Č':'C', u'Ē':'E', u'Ģ':'G', u'Ī':'i', u'Ķ':'k', u'Ļ':'L', u'Ņ':'N', u'Š':'S', u'Ū':'u', u'Ž':'Z' } def downcode(name): downcode(uŽabovitá zmiešaná kaša) u'Zabovita zmiesana kasa' for key, value in _MAP.iteritems(): name =ame.replace(key, value) return name Works for me: rrr = downcode(uŽabovitá zmiešaná kaša) print repr(rrr) print rrr prints out: u'Zabovita zmiesana kasa' Zabovita zmiesana kasa I did have to add an encoding declaration as line 2 of the file: #-*- coding: latin-1 -*- and I had to convince my editor (Komodo) to save the file in utf-8. DaveA great, thanks you all, I changed utf-8 to latin-1 in the header and it works for me too how mutch time could I save, just ask in this forum -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
I recommend to use UTF-8 coding(specially in GNU/Linux) then write this in the second line: #-*- coding: latin-1 -*- -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
Dave Angel da...@dejaviewphoto.com wrote in message news:4ac328d4.3060...@dejaviewphoto.com... gentlestone wrote: Why don't work this code on Python 2.6? Or how can I do this job? _MAP = # LATIN u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A', u'Æ': 'AE', u'Ç':'C', u'È': 'E', u'É': 'E', u'Ê': 'E', u'Ë': 'E', u'Ì': 'I', u'Í': 'I', u'Î': 'I', u'Ï': 'I', u'Ð': 'D', u'Ñ': 'N', u'Ò': 'O', u'Ó': 'O', u'Ô': 'O', u'Õ': 'O', u'Ö':'O', u'Ő': 'O', u'Ø': 'O', u'Ù': 'U', u'Ú': 'U', u'Û': 'U', u'Ü': 'U', u'Ű': 'U', u'Ý': 'Y', u'Þ': 'TH', u'ß': 'ss', u'à':'a', u'á':'a', u'â': 'a', u'ã': 'a', u'ä':'a', u'å': 'a', u'æ': 'ae', u'ç': 'c', u'è': 'e', u'é': 'e', u'ê': 'e', u'ë': 'e', u'ì': 'i', u'í': 'i', u'î': 'i', u'ï': 'i', u'ð': 'd', u'ñ': 'n', u'ò': 'o', u'ó':'o', u'ô': 'o', u'õ': 'o', u'ö': 'o', u'ő': 'o', u'ø': 'o', u'ù': 'u', u'ú': 'u', u'û': 'u', u'ü': 'u', u'ű': 'u', u'ý': 'y', u'þ': 'th', u'ÿ': 'y', # LATIN_SYMBOLS u'©':'(c)', # GREEK u'α':'a', u'β':'b', u'γ':'g', u'δ':'d', u'ε':'e', u'ζ':'z', u'η':'h', u'θ':'8', u'ι':'i', u'κ':'k', u'λ':'l', u'μ':'m', u'ν':'n', u'ξ':'3', u'ο':'o', u'π':'p', u'ρ':'r', u'σ':'s', u'τ':'t', u'υ':'y', u'φ':'f', u'χ':'x', u'ψ':'ps', u'ω':'w', u'ά':'a', u'έ':'e', u'ί':'i', u'ό':'o', u'ύ':'y', u'ή':'h', u'ώ':'w', u'ς':'s', u'ϊ':'i', u'ΰ':'y', u'ϋ':'y', u'ΐ':'i', u'Α':'A', u'Β':'B', u'Γ':'G', u'Δ':'D', u'Ε':'E', u'Ζ':'Z', u'Η':'H', u'Θ':'8', u'Ι':'I', u'Κ':'K', u'Λ':'L', u'Μ':'M', u'Ν':'N', u'Ξ':'3', u'Ο':'O', u'Π':'P', u'Ρ':'R', u'Σ':'S', u'Τ':'T', u'Υ':'Y', u'Φ':'F', u'Χ':'X', u'Ψ':'PS', u'Ω':'W', u'Ά':'A', u'Έ':'E', u'Ί':'I', u'Ό':'O', u'Ύ':'Y', u'Ή':'H', u'Ώ':'W', u'Ϊ':'I', u'Ϋ':'Y', # TURKISH u'ş':'s', u'Ş':'S', u'ı':'i', u'İ':'I', u'ç':'c', u'Ç':'C', u'ü':'u', u'Ü':'U', u'ö':'o', u'Ö':'O', u'ğ':'g', u'Ğ':'G', # RUSSIAN u'а':'a', u'б':'b', u'в':'v', u'г':'g', u'д':'d', u'е':'e', u'ё':'yo', u'ж':'zh', u'з':'z', u'и':'i', u'й':'j', u'к':'k', u'л':'l', u'м':'m', u'н':'n', u'о':'o', u'п':'p', u'р':'r', u'с':'s', u'т':'t', u'у':'u', u'ф':'f', u'х':'h', u'ц':'c', u'ч':'ch', u'ш':'sh', u'щ':'sh', u'ъ':'', u'ы':'y', u'ь':'', u'э':'e', u'ю':'yu', u'я':'ya', u'А':'A', u'Б':'B', u'В':'V', u'Г':'G', u'Д':'D', u'Е':'E', u'Ё':'Yo', u'Ж':'Zh', u'З':'Z', u'И':'I', u'Й':'J', u'К':'K', u'Л':'L', u'М':'M', u'Н':'N', u'О':'O', u'П':'P', u'Р':'R', u'С':'S', u'Т':'T', u'У':'U', u'Ф':'F', u'Х':'H', u'Ц':'C', u'Ч':'Ch', u'Ш':'Sh', u'Щ':'Sh', u'Ъ':'', u'Ы':'Y', u'Ь':'', u'Э':'E', u'Ю':'Yu', u'Я':'Ya', # UKRAINIAN u'Є':'Ye', u'І':'I', u'Ї':'Yi', u'Ґ':'G', u'є':'ye', u'і':'i', u'ї':'yi', u'ґ':'g', # CZECH u'č':'c', u'ď':'d', u'ě':'e', u'ň':'n', u'ř':'r', u'š':'s', u'ť':'t', u'ů':'u', u'ž':'z', u'Č':'C', u'Ď':'D', u'Ě':'E', u'Ň':'N', u'Ř':'R', u'Š':'S', u'Ť':'T', u'Ů':'U', u'Ž':'Z', # POLISH u'ą':'a', u'ć':'c', u'ę':'e', u'ł':'l', u'ń':'n', u'ó':'o', u'ś':'s', u'ź':'z', u'ż':'z', u'Ą':'A', u'Ć':'C', u'Ę':'e', u'Ł':'L', u'Ń':'N', u'Ó':'o', u'Ś':'S', u'Ź':'Z', u'Ż':'Z', # LATVIAN u'ā':'a', u'č':'c', u'ē':'e', u'ģ':'g', u'ī':'i', u'ķ':'k', u'ļ':'l', u'ņ':'n', u'š':'s', u'ū':'u', u'ž':'z', u'Ā':'A', u'Č':'C', u'Ē':'E', u'Ģ':'G', u'Ī':'i', u'Ķ':'k', u'Ļ':'L', u'Ņ':'N', u'Š':'S', u'Ū':'u', u'Ž':'Z' } def downcode(name): downcode(uŽabovitá zmiešaná kaša) u'Zabovita zmiesana kasa' for key, value in _MAP.iteritems(): name =ame.replace(key, value) return name Works for me: rrr = downcode(uŽabovitá zmiešaná kaša) print repr(rrr) print rrr prints out: u'Zabovita zmiesana kasa' Zabovita zmiesana kasa I did have to add an encoding declaration as line 2 of the file: #-*- coding: latin-1 -*- and I had to convince my editor (Komodo) to save the file in utf-8. Why decare latin-1 and save in utf-8? I'm not sure how you got that to work because those encodings aren't equivalent. I get: Traceback (most recent call last): File stdin, line 1, in module File testit.py, line 1 SyntaxError: encoding problem: utf-8 In fact, some of the characters in the above code don't map to latin-1. Traceback (most recent call last): File stdin, line 1, in module UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0150' in position 309: ordinal not in range(256) import unicodedata as ud ud.name(u'\u0150') -Mark -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
Dave Angel da...@dejaviewphoto.com (DA) wrote: DA Works for me: DA rrr = downcode(uŽabovitá zmiešaná kaša) DA print repr(rrr) DA print rrr DA prints out: DA u'Zabovita zmiesana kasa' DA Zabovita zmiesana kasa DA I did have to add an encoding declaration as line 2 of the file: DA #-*- coding: latin-1 -*- DA and I had to convince my editor (Komodo) to save the file in utf-8. *Seems to work*. If you save in utf-8 the coding declaration also has to be utf-8. Besides, many of these characters won't be representable in latin-1. The reason it worked is that these characters were translated into two- or more-bytes sequences and replace did work with these. But it's dangerous, as they are then no longer the unicode characters they were intended to be. -- Piet van Oostrum p...@vanoostrum.org WWW: http://pietvanoostrum.com/ PGP key: [8DAE142BE17999C4] -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
Piet van Oostrum wrote: Dave Angel da...@dejaviewphoto.com (DA) wrote: DA Works for me: DA rrr = downcode(uŽabovitá zmiešaná kaša) DA print repr(rrr) DA print rrr DA prints out: DA u'Zabovita zmiesana kasa' DA Zabovita zmiesana kasa DA I did have to add an encoding declaration as line 2 of the file: DA #-*- coding: latin-1 -*- DA and I had to convince my editor (Komodo) to save the file in utf-8. *Seems to work*. If you save in utf-8 the coding declaration also has to be utf-8. Besides, many of these characters won't be representable in latin-1. The reason it worked is that these characters were translated into two- or more-bytes sequences and replace did work with these. But it's dangerous, as they are then no longer the unicode characters they were intended to be. Thanks for the correction. What I meant by works for me is that the single example in the docstring translated okay. But I do have a lot to learn about using Unicode in sources, and I want to learn. So tell me, how were we supposed to guess what encoding the original message used? I originally had the mailing list message (in Thunderbird email). When I copied (copy/paste) to Komodo IDE (text editor), it wouldn't let me save because the file type was ASCII. So I randomly chosen latin-1 for file type, and it seemed to like it. At that point I expected and got errors from Python because I had no coding declaration. I used latin-1, and still had problems, though I forget what they were. Only when I changed the file encoding type again, to utf-8, did the errors go away. I agree that they should agree, but I don't know how to reconcile the copy/paste boundary, the file type (without BOM, which is another variable), the coding declaration, and the stdout implicit ASCII encoding. I understand a bunch of it, but not enough to be able to safely walk through the choices. Is this all written up in one place, to where an experienced programmer can make sense of it? I've nibbled at the edges (even wrote a UTF-8 encoder/decoder a dozen years ago). DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
Dave Angel da...@ieee.org (DA) wrote: [snip] DA Thanks for the correction. What I meant by works for me is that the DA single example in the docstring translated okay. But I do have a lot to DA learn about using Unicode in sources, and I want to learn. DA So tell me, how were we supposed to guess what encoding the original DA message used? I originally had the mailing list message (in Thunderbird DA email). When I copied (copy/paste) to Komodo IDE (text editor), it wouldn't DA let me save because the file type was ASCII. So I randomly chosen latin-1 DA for file type, and it seemed to like it. You can see the encoding of the message in its headers. But it is not important, as the Unicode characters you see is what it is about. You just copy and paste them in your Python file. The Python file does not have to use the same encoding as the message from which you pasted. The editor will do the proper conversion. (If it doesn't throw it away immediately.) Only for the Python file you must choose an encoding that can encode all the characters that are in the file. In this case utf-8 is the only reasonable choice, but if there are only latin-1 characters in the file then of course latin-1 (iso-8859-1) will also be good. Any decent editor will only allow you to save in an encoding that can encode all the characters in the file, otherwise you will lose some characters. Because Python must also know which encoding you used and this is not in itself deductible from the file contents, you need the coding declaration. And it must be the same as the encoding in which the file is saved, otherwise Python will see something different than you saw in your editor. Sooner or later this will give you a big headache. DA At that point I expected and got errors from Python because I had no coding DA declaration. I used latin-1, and still had problems, though I forget what DA they were. Only when I changed the file encoding type again, to utf-8, did DA the errors go away. I agree that they should agree, but I don't know how to DA reconcile the copy/paste boundary, the file type (without BOM, which is DA another variable), the coding declaration, and the stdout implicit ASCII DA encoding. I understand a bunch of it, but not enough to be able to safely DA walk through the choices. DA Is this all written up in one place, to where an experienced programmer can DA make sense of it? I've nibbled at the edges (even wrote a UTF-8 DA encoder/decoder a dozen years ago). I don't know a place. Usually utf-8 is a safe bet but in some cases can be overkill. And then in you Python input/output (read/write) you may have to use a different encoding if the programs that you have to communicate with expect something different. -- Piet van Oostrum p...@vanoostrum.org WWW: http://pietvanoostrum.com/ PGP key: [8DAE142BE17999C4] -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
Piet van Oostrum wrote: Dave Angel da...@ieee.org (DA) wrote: [snip] DA Thanks for the correction. What I meant by works for me is that the DA single example in the docstring translated okay. But I do have a lot to DA learn about using Unicode in sources, and I want to learn. DA So tell me, how were we supposed to guess what encoding the original DA message used? I originally had the mailing list message (in Thunderbird DA email). When I copied (copy/paste) to Komodo IDE (text editor), it wouldn't DA let me save because the file type was ASCII. So I randomly chosen latin-1 DA for file type, and it seemed to like it. You can see the encoding of the message in its headers. But it is not important, as the Unicode characters you see is what it is about. You just copy and paste them in your Python file. The Python file does not have to use the same encoding as the message from which you pasted. The editor will do the proper conversion. (If it doesn't throw it away immediately.) Only for the Python file you must choose an encoding that can encode all the characters that are in the file. In this case utf-8 is the only reasonable choice, but if there are only latin-1 characters in the file then of course latin-1 (iso-8859-1) will also be good. Any decent editor will only allow you to save in an encoding that can encode all the characters in the file, otherwise you will lose some characters. Because Python must also know which encoding you used and this is not in itself deductible from the file contents, you need the coding declaration. And it must be the same as the encoding in which the file is saved, otherwise Python will see something different than you saw in your editor. Sooner or later this will give you a big headache. DA At that point I expected and got errors from Python because I had no coding DA declaration. I used latin-1, and still had problems, though I forget what DA they were. Only when I changed the file encoding type again, to utf-8, did DA the errors go away. I agree that they should agree, but I don't know how to DA reconcile the copy/paste boundary, the file type (without BOM, which is DA another variable), the coding declaration, and the stdout implicit ASCII DA encoding. I understand a bunch of it, but not enough to be able to safely DA walk through the choices. DA Is this all written up in one place, to where an experienced programmer can DA make sense of it? I've nibbled at the edges (even wrote a UTF-8 DA encoder/decoder a dozen years ago). I don't know a place. Usually utf-8 is a safe bet but in some cases can be overkill. And then in you Python input/output (read/write) you may have to use a different encoding if the programs that you have to communicate with expect something different. I know what I was missing. The copy/paste must be doing it in pure Unicode. And the in-memory version of the source text is in Unicode. So the text editor's encoding affects how that Unicode is encoded into 8 bit bytes for the file (and how it will be reloaded next time). OK, that seems to make sense. I know that the clipboard has type tags, but I haven't looked at them in so long that I forget what they look like. For text, is it just ASCII and Unicode? Or are there other possible encodings that the source and sink negotiate? Thanks for the clear explanation. DaveA -- http://mail.python.org/mailman/listinfo/python-list
[issue6077] Unicode issue with tempfile on Windows
Amaury Forgeot d'Arc amaur...@gmail.com added the comment: File descriptors wrapped by the new IO module should be opened in binary mode. The attached patch changes TemporaryFile and NamedTemporaryFile to always call os.open() in binary mode; the mode is really used by the io.open() function. mkstemp() returns a raw file descriptor and was not changed. -- keywords: +needs review, patch nosy: +amaury.forgeotdarc Added file: http://bugs.python.org/file14092/tempfile.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6077 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6077] Unicode issue with tempfile on Windows
New submission from Ugra Dániel daniel.u...@gmail.com: Opening a file with tempfile.TemporaryFile using wt+ mode, then reading content back, will cause reading to stop (without any exception) when encountering byte '0x1a' (aka. Ctrl+Z) on Windows even tough UTF-16 encoding is used. When using built-in open with the same parameters (plus a file name of course) everything works as expected. On Linux this issue does not exists. -- components: Library (Lib) files: UnicodeTest.py messages: 88151 nosy: daniel.ugra severity: normal status: open title: Unicode issue with tempfile on Windows type: behavior versions: Python 3.0 Added file: http://bugs.python.org/file14032/UnicodeTest.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6077 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6077] Unicode issue with tempfile on Windows
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6077 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Unicode issue on Windows cmd line
Having issue on Windows cmd. Python.exe a = u'\xf0' print a This gives a unicode error. Works fine in IDLE, PythonWin, and my Macbook but I need to run this from a windows batch. Character should look like this ð. Please help! -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue on Windows cmd line
On Wed, 2009-02-11 at 10:35 -0800, jeffg wrote: Having issue on Windows cmd. Python.exe a = u'\xf0' print a This gives a unicode error. Works fine in IDLE, PythonWin, and my Macbook but I need to run this from a windows batch. Character should look like this ð. Please help! You forgot to paste the error. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue on Windows cmd line
On Feb 11, 2:35 pm, Albert Hopkins mar...@letterboxes.org wrote: On Wed, 2009-02-11 at 10:35 -0800, jeffg wrote: Having issue on Windows cmd. Python.exe a = u'\xf0' print a This gives a unicode error. Works fine in IDLE, PythonWin, and my Macbook but I need to run this from a windows batch. Character should look like this ð. Please help! You forgot to paste the error. The error looks like this: File stdin, line 1, in module File C:\python25\lib\encodings\cp437.py, line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\xf0' in position 0 : character maps to undefined Running Python 2.5.4 on Windows XP -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue on Windows cmd line
On Wed, Feb 11, 2009 at 2:50 PM, jeffg jeffgem...@gmail.com wrote: On Feb 11, 2:35 pm, Albert Hopkins mar...@letterboxes.org wrote: On Wed, 2009-02-11 at 10:35 -0800, jeffg wrote: Having issue on Windows cmd. Python.exe a = u'\xf0' print a This gives a unicode error. Works fine in IDLE, PythonWin, and my Macbook but I need to run this from a windows batch. Character should look like this ð. Please help! You forgot to paste the error. The error looks like this: File stdin, line 1, in module File C:\python25\lib\encodings\cp437.py, line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\xf0' in position 0 : character maps to undefined Running Python 2.5.4 on Windows XP That isn't a python problem, it's a Windows problem. For compatibility reasons, Microsoft never added Unicode support to cmd. When you do print u'', python tries to convert the characters to the console encoding (the really old cp437, not even the Windows standard cp1252), it messes up. AFAIK, you'll have to use the chcp command to switch to an encoding that has the character and then print u'\xf0'.encode(the_encoding) to get it to display. There isn't any way around it- we've tried. -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue on Windows cmd line
On Wed, Feb 11, 2009 at 2:50 PM, jeffg jeffgem...@gmail.com wrote: On Feb 11, 2:35 pm, Albert Hopkins mar...@letterboxes.org wrote: On Wed, 2009-02-11 at 10:35 -0800, jeffg wrote: Having issue on Windows cmd. Python.exe a = u'\xf0' print a This gives a unicode error. Works fine in IDLE, PythonWin, and my Macbook but I need to run this from a windows batch. Character should look like this ð. Please help! You forgot to paste the error. The error looks like this: File stdin, line 1, in module File C:\python25\lib\encodings\cp437.py, line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\xf0' in position 0 : character maps to undefined Running Python 2.5.4 on Windows XP First, you may need to change your command prompt Properties-Font to use Lucida Console rather than raster fonts. Then you'll need to change the code page using chcp to something that has a mapping for the character you want. E.g.: D:\chcp Active code page: 437 D:\python Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. a = u'\xf0' print a Traceback (most recent call last): File stdin, line 1, in module File D:\bin\Python2.5.2\lib\encodings\cp437.py, line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\xf0' in position 0: character maps to undefined quit() D:\chcp 1252 Active code page: 1252 D:\python Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. a = u'\xf0' print a ð quit() D:\ (Just changing the code page works to avoid the UnicodeEncodeError, but with raster fonts that character displays as thee horizontal bars.) Karen -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue on Windows cmd line
Having issue on Windows cmd. Python.exe a = u'\xf0' print a This gives a unicode error. Works fine in IDLE, PythonWin, and my Macbook but I need to run this from a windows batch. Character should look like this ð. Please help! Well, your terminal just cannot display this character by default; you need to use a different terminal program, or reconfigure your terminal. For example, do chcp 1252 and select Lucida Console as the terminal font, then try again. Of course, this will cause *different* characters to become non-displayable. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue on Windows cmd line
On Wed, Feb 11, 2009 at 3:57 PM, Martin v. Löwis mar...@v.loewis.dewrote: Having issue on Windows cmd. Python.exe a = u'\xf0' print a This gives a unicode error. Works fine in IDLE, PythonWin, and my Macbook but I need to run this from a windows batch. Character should look like this ð. Please help! Well, your terminal just cannot display this character by default; you need to use a different terminal program, or reconfigure your terminal. For example, do chcp 1252 and select Lucida Console as the terminal font, then try again. Of course, this will cause *different* characters to become non-displayable. Well, Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode issue on Windows cmd line
On Wed, Feb 11, 2009 at 4:10 PM, Benjamin Kaplan benjamin.kap...@case.eduwrote: On Wed, Feb 11, 2009 at 3:57 PM, Martin v. Löwis mar...@v.loewis.dewrote: Having issue on Windows cmd. Python.exe a = u'\xf0' print a This gives a unicode error. Works fine in IDLE, PythonWin, and my Macbook but I need to run this from a windows batch. Character should look like this ð. Please help! Well, your terminal just cannot display this character by default; you need to use a different terminal program, or reconfigure your terminal. For example, do chcp 1252 and select Lucida Console as the terminal font, then try again. Of course, this will cause *different* characters to become non-displayable. Well, Whoops. Didn't mean to hit send there. I was going to say, you can't have everything when Microsoft is only willing to break the programs that average people are going to use on a daily basis. I mean, why would they do something nice for the international community at the expense of breaking some 20 year old batch scripts? Those were the only things that still worked when Vista first came out. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list