subject:"unicode issue"

Someone HEELP ME!!
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-12 Thread Chris Angelico

On Fri, Apr 12, 2013 at 10:50 PM,  nagia.rets...@gmail.com wrote:
 Someone HEELP ME!!

http://youtu.be/VxMYwjp8t0o

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

Τη Παρασκευή, 12 Απριλίου 2013 4:14:39 μ.μ. UTC+3, ο χρήστης Chris Angelico 
έγραψε:
 On Fri, Apr 12, 2013 at 10:50 PM,  nagia.rets...@gmail.com wrote:
 
  Someone HEELP ME!!
 
 
 
 http://youtu.be/VxMYwjp8t0o
 
 
 
 ChrisA


Well, instead of being a smartass it would be nice if you could actually help 
for once.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-12 Thread Chris Angelico

On Fri, Apr 12, 2013 at 11:18 PM,  nagia.rets...@gmail.com wrote:
 Τη Παρασκευή, 12 Απριλίου 2013 4:14:39 μ.μ. UTC+3, ο χρήστης Chris Angelico 
 έγραψε:
 On Fri, Apr 12, 2013 at 10:50 PM,  nagia.rets...@gmail.com wrote:

  Someone HEELP ME!!

 http://youtu.be/VxMYwjp8t0o

 ChrisA


 Well, instead of being a smartass it would be nice if you could actually help 
 for once.

Yeah, I'm done with that. Your whining ran through my patience a few
posts ago. But you should feel special; I clipped that just for you.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-12 Thread rusi

On Apr 12, 6:18 pm, nagia.rets...@gmail.com wrote:
 Τη Παρασκευή, 12 Απριλίου 2013 4:14:39 μ.μ. UTC+3, ο χρήστης Chris Angelico 
 έγραψε:

  On Fri, Apr 12, 2013 at 10:50 PM,  nagia.rets...@gmail.com wrote:

   Someone HEELP ME!!

 http://youtu.be/VxMYwjp8t0o

  ChrisA

 Well, instead of being a smartass it would be nice if you could actually help 
 for once.

Interesting!
Among the things which you dont seem to know is the meaning of the
word 'once'.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

Τη Παρασκευή, 12 Απριλίου 2013 4:29:51 μ.μ. UTC+3, ο χρήστης rusi έγραψε:
 On Apr 12, 6:18 pm, nagia.rets...@gmail.com wrote:
 
  Τη Παρασκευή, 12 Απριλίου 2013 4:14:39 μ.μ. UTC+3, ο χρήστης Chris Angelico 
  έγραψε:
 
 
 
   On Fri, Apr 12, 2013 at 10:50 PM,  nagia.rets...@gmail.com wrote:
 
 
 
Someone HEELP ME!!
 
 
 
  http://youtu.be/VxMYwjp8t0o
 
 
 
   ChrisA
 
 
 
  Well, instead of being a smartass it would be nice if you could actually 
  help for once.
 
 
 
 Interesting!
 
 Among the things which you dont seem to know is the meaning of the
 
 word 'once'.

Same applies for you too. Stop being smartasses.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-12 Thread Ian Kelly

On Fri, Apr 12, 2013 at 8:36 AM,  nagia.rets...@gmail.com wrote:
 Τη Παρασκευή, 12 Απριλίου 2013 4:29:51 μ.μ. UTC+3, ο χρήστης rusi έγραψε:
 On Apr 12, 6:18 pm, nagia.rets...@gmail.com wrote:
  Well, instead of being a smartass it would be nice if you could actually 
  help for once.

 Interesting!

 Among the things which you dont seem to know is the meaning of the
 word 'once'.

 Same applies for you too. Stop being smartasses.

Please keep in mind that this is a community of volunteers.  Nobody
here is being paid for their time to help you fix your website, and if
you manage to irritate us in the process, we're likely to just walk
away from it.

I looked over the code that you have provided us with, and based on
that I could not see any reason why the html would be in the form of a
bytes instead of a str.  Since nobody else here seems to have any
further insight into the problem either, you're just going to have to
find a a way to debug the code.  If you cannot do that on your own,
then I suggest that you find a contractor who can, hire them, and
grant them the access they need to do a real debugging session.

I would also recommend that in the future you should stop deploying
untested code to your production website.  Set up a development
environment for yourself, make the changes there, and only deploy when
you know that everything is working.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-12 Thread Roy Smith

In article mailman.533.1365792239.3114.python-l...@python.org,
 Ian Kelly ian.g.ke...@gmail.com wrote:

 I would also recommend that in the future you should stop deploying
 untested code to your production website.  Set up a development
 environment for yourself, make the changes there, and only deploy when
 you know that everything is working.

But that takes all the fun out of it :-)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

Τη Παρασκευή, 12 Απριλίου 2013 9:37:29 μ.μ. UTC+3, ο χρήστης Ian έγραψε:
On Fri, Apr 12, 2013 at 8:36 AM, nagia.rets...@gmail.com wrote:

Τη Παρασκευή, 12 Απριλίου 2013 4:29:51 μ.μ. UTC+3, ο χρήστης rusi έγραψε:

On Apr 12, 6:18 pm, nagia.rets...@gmail.com wrote:

Well, instead of being a smartass it would be nice if you could actually
help for once.

Interesting!

Among the things which you dont seem to know is the meaning of the

word 'once'.

Same applies for you too. Stop being smartasses.

Please keep in mind that this is a community of volunteers. Nobody

here is being paid for their time to help you fix your website, and if

you manage to irritate us in the process, we're likely to just walk

away from it.

I looked over the code that you have provided us with, and based on

that I could not see any reason why the html would be in the form of a

bytes instead of a str. Since nobody else here seems to have any

further insight into the problem either, you're just going to have to

find a a way to debug the code. If you cannot do that on your own,

then I suggest that you find a contractor who can, hire them, and

grant them the access they need to do a real debugging session.

I would also recommend that in the future you should stop deploying

untested code to your production website. Set up a development

environment for yourself, make the changes there, and only deploy when

you know that everything is working.

I agree with what you say except form the fact that i try to irritate people.
Look at the thread and you will see who's irritating whom first.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-12 Thread Cameron Simpson

On 11Apr2013 09:55, Nikos nagia.rets...@gmail.com wrote:
| Τη Πέμπτη, 11 Απριλίου 2013 1:45:22 μ.μ. UTC+3, ο χρήστης Cameron Simpson 
έγραψε:
|  On 10Apr2013 21:50, nagia.rets...@gmail.com nagia.rets...@gmail.com wrote:
|  | the doctype is coming form the attempt of script metrites.py to open and 
read the 'index.html' file.
|  | But i don't know how to try to open it as a byte file instead of an tetxt 
file.

Lele Gaifax showed one way:

from codecs import open
with open('index.html', encoding='utf-8') as f:
content = f.read()

But a plain open() should also do:

with open('index.html') as f:
content = f.read()

if you're not taking tight control of the file encoding.

The point here is to get _text_ (i.e. str) data from the file, not bytes.

If the text turns out to be incorrectly decoded (i.e. incorrectly
reading the file bytes and assembling them into text strings) because
the default encoding is wrong, then you may need to read for Lele's
more verbose open() example to select the correct encoding.

But first ignore that and get text (str) instead of bytes.
If you're already getting text from the file, something later is
making bytes and handing it to print().

Another approach to try is to use
  sys.stdout.write()
instead of
  print()

The print() function will take _anything_ and write text of some form.
The write() function will throw an exception if it gets the wrong type of data.

If sys.stdout is opened in binary mode then write() will require
bytes as data; strings will need to be explicitly turned into bytes
via .encode() in order to not raise an exception.

If sys.stdout is open in text mode, write() will require str data.
The sys.stdout file itself will transcribe to bytes for you.

If you take that route, at least you will not have confusion about
str versus bytes.

For an HTML output page I would advocate arranging that sys.stdout
is in text mode; that way you can do the natural thing and .write()
str data and lovely UTF-8 bytes will come out the other end.

If the above test (using .write() instead of print()) shows it to
be in binary mode we can fix that. But you need to find out.

You will want access to the error messages from the CGI environment;
do you have access to the web servers error_log? You can tail that
in a terminal while you reload the page to see what's going on.

| This works in the shell, but doesn't work on my website:
| 
| $ cat utf8.txt
| υλικό!Πρόκειται γ

Ok, so your terminal is using UTF-8 as its output coding. (And so
is your mail posting program, since we see it unmangled on my screen
here.)

| $ python3
| Python 3.2.3 (default, Oct 19 2012, 20:10:41)
| [GCC 4.6.3] on linux2
| Type help, copyright, credits or license for more information.
|  data = open('utf8.txt').read()
|  print(data)
| υλικό!Πρόκειται γ

Likewise.

However, in an exciting twist, I seem to recall that Python invoked
interactively with aterminal as output will have the default terminal
encoding in place on sys.stdout. Producing what you expect. _However_,
python invoked in a batch environment where stdout is not a terminal
(such as in the CGI environment producing your web page), that is
_not_ necessarily the case.

|  print(data.encode('utf-8'))
| 
b'\xcf\x85\xce\xbb\xce\xb9\xce\xba\xcf\x8c!\xce\xa0\xcf\x81\xcf\x8c\xce\xba\xce\xb5\xce\xb9\xcf\x84\xce\xb1\xce\xb9
 \xce\xb3\n'
| 
| See, the last line is what i'am getting on my website.

The above line takes your Unicode text in data and transcribed
it to bytes using UTF-8 as the encoding. And print() is then receiving
that bytes object and printing its str() representation as b''.
That str is itself unicode, and when print passes it to sys.stdout,
_that_ transcribed the unicode b'...' string as bytes to your
terminal. Using UTF-8 based on the previous examples above, but
since all those characters are in the bottom 127 code range the
byte sequence will be the same if it uses ASCII or ISO8859-1 or
almost anything else:-)

As you can see, there's a lot of encoding/decoding going on behind
the scenes even in this superficially simple example.

| If i remove
| the encode('utf-8') part in metrites.py, the webpage will not show
| anything at all...

Ah, but data will be being output. The print() function _will_ be
writing data out in some form.  I suggest you remove the .encode()
and then examine the _source_ text of the web page, not its visible
form.

So: remove .encode(), reload the web page, view page source
(depends on your browser, it is ctrl-U in Firefox ((Cmd-U in firefox
on a Mac))).

I think a lot of the issue you have in this thread is that your
page is too complex. Make another page to do the same thing, and
start with nothing. Add stuff to it a single item at a time until
the page behaves incorrectly. Then you will know the exact item of
code that introduced the issue. And then that single item can be
examined in detail for the decode/encode issues.

The other issue in the thread is that people losing

Re: Unicode issue with Python v3.3

Τη Σάββατο, 13 Απριλίου 2013 4:41:57 π.μ. UTC+3, ο χρήστης Cameron Simpson 
έγραψε:
 On 11Apr2013 09:55, Nikos nagia.rets...@gmail.com wrote:
 
 | Τη Πέμπτη, 11 Απριλίου 2013 1:45:22 μ.μ. UTC+3, ο χρήστης Cameron Simpson 
 έγραψε:
 
 |  On 10Apr2013 21:50, nagia.rets...@gmail.com nagia.rets...@gmail.com 
 wrote:
 
 |  | the doctype is coming form the attempt of script metrites.py to open 
 and read the 'index.html' file.
 
 |  | But i don't know how to try to open it as a byte file instead of an 
 tetxt file.
 
 
 
 Lele Gaifax showed one way:
 
 
 
 from codecs import open
 
 with open('index.html', encoding='utf-8') as f:
 
 content = f.read()
 
 
 
 But a plain open() should also do:
 
 
 
 with open('index.html') as f:
 
 content = f.read()
 
 
 
 if you're not taking tight control of the file encoding.
 
 
 
 The point here is to get _text_ (i.e. str) data from the file, not bytes.
 
 
 
 If the text turns out to be incorrectly decoded (i.e. incorrectly
 
 reading the file bytes and assembling them into text strings) because
 
 the default encoding is wrong, then you may need to read for Lele's
 
 more verbose open() example to select the correct encoding.
 
 
 
 But first ignore that and get text (str) instead of bytes.
 
 If you're already getting text from the file, something later is
 
 making bytes and handing it to print().
 
 
 
 Another approach to try is to use
 
   sys.stdout.write()
 
 instead of
 
   print()
 
 
 
 The print() function will take _anything_ and write text of some form.
 
 The write() function will throw an exception if it gets the wrong type of 
 data.
 
 
 
 If sys.stdout is opened in binary mode then write() will require
 
 bytes as data; strings will need to be explicitly turned into bytes
 
 via .encode() in order to not raise an exception.
 
 
 
 If sys.stdout is open in text mode, write() will require str data.
 
 The sys.stdout file itself will transcribe to bytes for you.
 
 
 
 If you take that route, at least you will not have confusion about
 
 str versus bytes.
 
 
 
 For an HTML output page I would advocate arranging that sys.stdout
 
 is in text mode; that way you can do the natural thing and .write()
 
 str data and lovely UTF-8 bytes will come out the other end.
 
 
 
 If the above test (using .write() instead of print()) shows it to
 
 be in binary mode we can fix that. But you need to find out.
 
 
 
 You will want access to the error messages from the CGI environment;
 
 do you have access to the web servers error_log? You can tail that
 
 in a terminal while you reload the page to see what's going on.
 
 
 
 | This works in the shell, but doesn't work on my website:
 
 | 
 
 | $ cat utf8.txt
 
 | υλικό!Πρόκειται γ
 
 
 
 Ok, so your terminal is using UTF-8 as its output coding. (And so
 
 is your mail posting program, since we see it unmangled on my screen
 
 here.)
 
 
 
 | $ python3
 
 | Python 3.2.3 (default, Oct 19 2012, 20:10:41)
 
 | [GCC 4.6.3] on linux2
 
 | Type help, copyright, credits or license for more information.
 
 |  data = open('utf8.txt').read()
 
 |  print(data)
 
 | υλικό!Πρόκειται γ
 
 
 
 Likewise.
 
 
 
 However, in an exciting twist, I seem to recall that Python invoked
 
 interactively with aterminal as output will have the default terminal
 
 encoding in place on sys.stdout. Producing what you expect. _However_,
 
 python invoked in a batch environment where stdout is not a terminal
 
 (such as in the CGI environment producing your web page), that is
 
 _not_ necessarily the case.
 
 
 
 |  print(data.encode('utf-8'))
 
 | 
 b'\xcf\x85\xce\xbb\xce\xb9\xce\xba\xcf\x8c!\xce\xa0\xcf\x81\xcf\x8c\xce\xba\xce\xb5\xce\xb9\xcf\x84\xce\xb1\xce\xb9
  \xce\xb3\n'
 
 | 
 
 | See, the last line is what i'am getting on my website.
 
 
 
 The above line takes your Unicode text in data and transcribed
 
 it to bytes using UTF-8 as the encoding. And print() is then receiving
 
 that bytes object and printing its str() representation as b''.
 
 That str is itself unicode, and when print passes it to sys.stdout,
 
 _that_ transcribed the unicode b'...' string as bytes to your
 
 terminal. Using UTF-8 based on the previous examples above, but
 
 since all those characters are in the bottom 127 code range the
 
 byte sequence will be the same if it uses ASCII or ISO8859-1 or
 
 almost anything else:-)
 
 
 
 As you can see, there's a lot of encoding/decoding going on behind
 
 the scenes even in this superficially simple example.
 
 
 
 | If i remove
 
 | the encode('utf-8') part in metrites.py, the webpage will not show
 
 | anything at all...
 
 
 
 Ah, but data will be being output. The print() function _will_ be
 
 writing data out in some form.  I suggest you remove the .encode()
 
 and then examine the _source_ text of the web page, not its visible
 
 form.
 
 
 
 So: remove .encode(), reload the web page, view page source
 
 (depends on your browser, it is ctrl-U in Firefox ((Cmd-U in firefox
 
 on a Mac))).

Re: Unicode issue with Python v3.3

Since now we k ow the problem maybe we can tell metrites.py to open index.html 
using utf-8 encoding rather as binary, dont you think?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-11 Thread Steven D'Aprano

On Thu, 11 Apr 2013 00:13:46 -0700, nagia.retsina wrote:

 Since now we k ow the problem maybe we can tell metrites.py to open
 index.html using utf-8 encoding rather as binary, dont you think?

What makes you think it is UTF-8?

Last time you tried decoding content as UTF-8, you got an error that it 
wasn't a legal UTF-8 file. 


Where does index.html come from? Whatever program generates that, you 
need to find out what encoding it is using.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-11 Thread Steven D'Aprano

On Thu, 11 Apr 2013 07:50:19 +, Steven D'Aprano wrote:

 On Thu, 11 Apr 2013 00:13:46 -0700, nagia.retsina wrote:
 
 Since now we k ow the problem maybe we can tell metrites.py to open
 index.html using utf-8 encoding rather as binary, dont you think?
 
 What makes you think it is UTF-8?
 
 Last time you tried decoding content as UTF-8, you got an error that it
 wasn't a legal UTF-8 file.

Oops, sorry, correction. It wasn't a legal UTF-8 string. It was an 
environment variable that was causing the decoding error, since it 
contained illegal bytes for a UTF-8 string.


 Where does index.html come from? Whatever program generates that, you
 need to find out what encoding it is using.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

Τη Πέμπτη, 11 Απριλίου 2013 11:20:47 π.μ. UTC+3, ο χρήστης Steven D'Aprano 
έγραψε:
 On Thu, 11 Apr 2013 07:50:19 +, Steven D'Aprano wrote:
 
 
 
  On Thu, 11 Apr 2013 00:13:46 -0700, nagia.retsina wrote:
 
  
 
  Since now we k ow the problem maybe we can tell metrites.py to open
 
  index.html using utf-8 encoding rather as binary, dont you think?
 
  
 
  What makes you think it is UTF-8?
 
  
 
  Last time you tried decoding content as UTF-8, you got an error that it
 
  wasn't a legal UTF-8 file.
 
 
 
 Oops, sorry, correction. It wasn't a legal UTF-8 string. It was an 
 
 environment variable that was causing the decoding error, since it 
 
 contained illegal bytes for a UTF-8 string.
 
 
 
 
 
  Where does index.html come from? Whatever program generates that, you
 
  need to find out what encoding it is using.

Hello steven, index.html was writenn by handcode from me utilizing html + css

metrites.py tries to open that script so we must tell it to open as utf-8 text 
and not as a binary file.

How can we do that?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-11 Thread Lele Gaifax

nagia.rets...@gmail.com writes:

 metrites.py tries to open that script so we must tell it to open as
 utf-8 text and not as a binary file.

One way is the following:

from codecs import open

with open('index.html', encoding='utf-8') as f:
content = f.read()

ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
l...@metapensiero.it  | -- Fortunato Depero, 1929.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-11 Thread Cameron Simpson

On 10Apr2013 21:50, nagia.rets...@gmail.com nagia.rets...@gmail.com wrote:
| Firtly thank uou for taking a look into the code.
| the doctype is coming form the attempt of script metrites.py to open and read 
the 'index.html' file.
| But i don't know how to try to open it as a byte file instead of an tetxt 
file.

I think you've got it backwards. It looks like metrites.py has
opened the file as bytes instead of as text (probably utf8, but
that remains to be seen). Because it has opened it in binary mode
you're getting bytes when you read from the file.

Can you show the relevant code that opens the files and reads from
it, and the print statement that is putting it back out?

You probably need to ensure that metrites.py is opening it as text,
with the correct encoding.  Note that the encoding is nothing to
do with your _output_. It is the encoding of the data in the file
you are reading, and that is dictated by the editor used to make
the file.

Anyway, code first. What does it look like?

Cheers,
-- 
Cameron Simpson c...@zip.com.au

Six trillion RFID tags is four orders of magnitude bigger than any electronic 
item ever made.
- overhead by WIRED at the Intelligent Printing conference Oct2006
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

Of course here is how it look like:

if page.endswith('.html'):
f = open( /home/nikos/www/ + page, encoding=utf-8 )
htmldata = f.read()
htmldata = htmldata % (quote, music)

counter = ''' center
  a 
href=mailto:supp...@superhost.gr; img src=/data/images/mail.png/a
  table border=2 cellpadding=2 
bgcolor=black
tdfont 
color=limeΑριθμός Επισκεπτών/td
tda 
href=http://superhost.gr/?show=logpage=%s;font color=yellow %d /td
  /tablebr
  ''' % (page, data[0])
  
template = htmldata + counter
print( template )
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-11 Thread Nikos

Τη Πέμπτη, 11 Απριλίου 2013 1:45:22 μ.μ. UTC+3, ο χρήστης Cameron Simpson
έγραψε:
On 10Apr2013 21:50, nagia.rets...@gmail.com nagia.rets...@gmail.com wrote:

| Firtly thank uou for taking a look into the code.

| the doctype is coming form the attempt of script metrites.py to open and
read the 'index.html' file.

| But i don't know how to try to open it as a byte file instead of an tetxt
file.

I think you've got it backwards. It looks like metrites.py has

opened the file as bytes instead of as text (probably utf8, but

that remains to be seen). Because it has opened it in binary mode

you're getting bytes when you read from the file.

Can you show the relevant code that opens the files and reads from

it, and the print statement that is putting it back out?

You probably need to ensure that metrites.py is opening it as text,

with the correct encoding. Note that the encoding is nothing to

do with your _output_. It is the encoding of the data in the file

you are reading, and that is dictated by the editor used to make

the file.

Webhost Weblog
This works in the shell, but doesn't work on my website:

$ cat utf8.txt
υλικό!Πρόκειται γ
$ python3
Python 3.2.3 (default, Oct 19 2012, 20:10:41)
[GCC 4.6.3] on linux2
Type help, copyright, credits or license for more information.
data = open('utf8.txt').read()
print(data)
υλικό!Πρόκειται γ

print(data.encode('utf-8'))
b'\xcf\x85\xce\xbb\xce\xb9\xce\xba\xcf\x8c!\xce\xa0\xcf\x81\xcf\x8c\xce\xba\xce\xb5\xce\xb9\xcf\x84\xce\xb1\xce\xb9
\xce\xb3\n'

See, the last line is what i'am getting on my website. If i remove the
encode('utf-8') part in metrites.py, the webpage will not show anything at
all...
--
http://mail.python.org/mailman/listinfo/python-list

Re: People in the python community [was Re: Unicode issue with Python v3.3]

2013-04-11 Thread Michael Torrie

On 04/10/2013 10:50 AM, Νίκος Γκρ33κ wrote:
 I'am not sure i follow you. How did my topic changed?! Is this
 possible?

This is a mailing list/nntp newsgroup.  The subject line can be changed
arbitrarily by anyone replying to another message.  Normally this is
done to indicate a natural progression of the conversation in a new
direction.  In this case, Steven D'Aprano wrote a reply that did not
answer your pleas, but instead made some observations, and so he changed
the subject line to reflect that.

If you read your messages using a threaded message display, this will
make more sense to you.  But if you use Gmail's (or Google's) broken
conversation view, then this information about who is responding to whom
does get lost--actually in conversation view a lot of information about
the message flow is lost; it really is unfortunate that this way of
communicating has become so widespread.

 How about the oce i posted at patebin.com. Did anyone by any chnace
 had a look into?

 It's only a single thing iam missing for the encoding and the the
 script will load properly with python 3.3

I'm truly sorry, but I simply do not have the time to do so.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

Well, can somebody else propose somehting plz?

i have paste the whole script and even the necessary snippet that perhaps 
causing this encoding confusion in 3.3
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-11 Thread alex23

On Apr 12, 2:36 pm, nagia.rets...@gmail.com wrote:
 Well, can somebody else propose somehting plz?

Pay for a professional.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-10 Thread rusi

On Apr 10, 10:06 am, rusi rustompm...@gmail.com wrote:
 An interesting case of two threads:

 On Apr 10, 9:46 am, Chris Angelico ros...@gmail.com wrote:

  On Wed, Apr 10, 2013 at 2:25 PM, Steven D'Aprano
   Obviously you know what the problem is much better than the Python
   interpreter.

  I just went to the page and it started playing sound. Between that and
  this arrogant refusal to believe either the interpreter or the people
  who are freely donating time to assist, I'm done. No more looking at
  Nikos's home page to try to figure out his problems. Have fun, Nikos.

  ChrisA

 Some swans are black
 Some homo sapiens have negative IQ

Hmm I see some cut-paste goofup on my part.
I was meaning to juxtapose this thread where we put up with inordinate
amount of nonsense from OP
along with the recent thread in which a newcomer who thinks he has
found a bug in pdb is made fun of.

Then thought better of it and deleted the stuff.
However I did not do a good delete-job so I better now say what I
avoided saying:

If those who habitually post rubbish are given much of our time and
effort,
whereas newcomers and first-timers are treated rudely, the list begins
to smell like a club of old farts.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-10 Thread Antoine Pitrou

rusi rustompmody at gmail.com writes:
 
 Hmm I see some cut-paste goofup on my part.
 I was meaning to juxtapose this thread where we put up with inordinate
 amount of nonsense from OP
 along with the recent thread in which a newcomer who thinks he has
 found a bug in pdb is made fun of.
 
 Then thought better of it and deleted the stuff.
 However I did not do a good delete-job so I better now say what I
 avoided saying:
 
 If those who habitually post rubbish are given much of our time and
 effort,
 whereas newcomers and first-timers are treated rudely, the list begins
 to smell like a club of old farts.

+1. If you think you have something intelligent to say to jmfauth,
you might as well start a private discussion with him.

As far as I'm concerned, python-list is *already* of club of old
farts. Many regular posters are more interested in being right on the
Internet rather than helping people out.

(this is where the StackOverflow mechanics probably work better, sadly)

Regards

Antoine.


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-10 Thread nagia . retsina

Τη Τετάρτη, 10 Απριλίου 2013 7:25:21 π.μ. UTC+3, ο χρήστης Steven D'Aprano 
έγραψε:

 What does os.environ['REMOTE_ADDR'] give? Until you answer that question, 
 you won't make any progress.

I insists stevv.

Look at what 'python3 metrites.py' gives me

!-- The above is a description of an error in a Python program, formatted
 for a Web browser because the 'cgitb' module was enabled.  In case you
 are not reading this in a Web browser, here is the original traceback:

Traceback (most recent call last):
  File metrites.py, line 34, in lt;modulegt;
userinfo = os.environ['HTTP_USER_AGENT']
  File /root/.local/lib/python2.7/lib/python3.3/os.py, line 669, in 
__getitem__
value = self._data[self.encodekey(key)]
KeyError: b'HTTP_USER_AGENT'

--

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

Here is the whole code for metrites.py in case someone wants to take allok.

Everything is correct after altering it to meet python 3.3, everythign aprt 
from the weird unicode error thing.

http://pastebin.com/5Mpjx5Fd

please take a look.
Thank you. 
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-10 Thread Steven D'Aprano

On Tue, 09 Apr 2013 23:04:35 -0700, rusi wrote:

 Hmm I see some cut-paste goofup on my part. I was meaning to juxtapose
 this thread where we put up with inordinate amount of nonsense from OP
 along with the recent thread in which a newcomer who thinks he has found
 a bug in pdb is made fun of.

Curious. Is this making fun of the newcomer?

  If you are able to supply more details, we might be able to
  follow up on the registration problem.  And,  as someone else
  suggested, you could post the details of the pdb problem here.
  Note, there are already a number of currently open issues with
  pdb reported on the bug tracker. If you haven't already, you
  could search for pdb and see if your problem has been reported.
  Thanks for bringing the problem(s) up!


Or perhaps this is making fun of them?

  Post the 10-line program here, so others can verify whether it is a bug.


I think it is quite unfair of you to mischaracterise the entire community 
response in this way. One person made a light-hearted, silly, unhelpful 
response. (As sarcasm, I'm afraid it missed the target.) Two people made 
good, sensible responses -- and you were not either of them.

If you want to be helpful, how about leading by example and taking on 
some of the less coherent newbie questions, instead of just bitching that 
others don't? It's easy, and a pleasure, to give good answers to well-
written, carefully thought out questions. It's much harder to do the same 
for those questions which are... shall we say... less optimal. We could 
do with a few more people who make an effort to be helpful and friendly, 
instead of scolds who just tell us off when we stumble.



 Then thought better of it and deleted the stuff. However I did not do a
 good delete-job so I better now say what I avoided saying:
 
 If those who habitually post rubbish are given much of our time and
 effort,
 whereas newcomers and first-timers are treated rudely, the list begins
 to smell like a club of old farts.


It's often the newcomers who are posting rubbish. Should we ignore them 
for posting rubbish, or welcome them for being newcomers?



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

People in the python community [was Re: Unicode issue with Python v3.3]

2013-04-10 Thread Steven D'Aprano

On Wed, 10 Apr 2013 08:28:55 +, Steven D'Aprano wrote:

 If you want to be helpful, how about leading by example and taking on
 some of the less coherent newbie questions
[...]


On that note, I think I'll take the opportunity to give thanks to Peter 
Otten, who (if I remember correctly) has been here for longer than I 
have, and I've been here for a long time. In all that time, I don't think 
I've ever seen him snap at or be rude to anyone, not even those who 
deserved it, and he doesn't shy away from answering even the most poorly 
written questions.


Peter, I don't know how you do it, but you're doing a fantastic job.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: People in the python community [was Re: Unicode issue with Python v3.3]

2013-04-10 Thread Mark Lawrence


On 10/04/2013 09:34, Steven D'Aprano wrote:


On that note, I think I'll take the opportunity to give thanks to Peter
Otten, who (if I remember correctly) has been here for longer than I
have, and I've been here for a long time. In all that time, I don't think
I've ever seen him snap at or be rude to anyone, not even those who
deserved it, and he doesn't shy away from answering even the most poorly
written questions.


Peter, I don't know how you do it, but you're doing a fantastic job.



Seconded.  For those who don't know Peter is always responding to 
queries on the tutor mailing list as well.  Definite case of the 
patience of a saint.


--
If you're using GoogleCrap™ please read this 
http://wiki.python.org/moin/GoogleGroupsPython.


Mark Lawrence

--
http://mail.python.org/mailman/listinfo/python-list

Re: People in the python community [was Re: Unicode issue with Python v3.3]

 os.environ['HTTP_USER_AGENT'] is only set when running from browser.

so i faked it by using:

userinfo = os.environ.get('HTTP_USER_AGENT', 'some default')

but the encoding issues are still there.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: People in the python community [was Re: Unicode issue with Python v3.3]

Thank you just altered it but i still get the same encoding issues.

please its only a matter of simple alternation that iam not able to see.

When you have the time plz take a look.

Thank you!
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: People in the python community [was Re: Unicode issue with Python v3.3]

2013-04-10 Thread Peter Otten

Steven D'Aprano wrote:

 On Wed, 10 Apr 2013 08:28:55 +, Steven D'Aprano wrote:
 
 If you want to be helpful, how about leading by example and taking on
 some of the less coherent newbie questions
 [...]
 
 
 On that note, I think I'll take the opportunity to give thanks to Peter
 Otten, who (if I remember correctly) has been here for longer than I
 have, and I've been here for a long time. In all that time, I don't think
 I've ever seen him snap at or be rude to anyone, not even those who
 deserved it, and he doesn't shy away from answering even the most poorly
 written questions.
 
 
 Peter, I don't know how you do it, but you're doing a fantastic job.

Thank you :)

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: People in the python community [was Re: Unicode issue with Python v3.3]

2013-04-10 Thread Peter Otten

Mark Lawrence wrote:

 On 10/04/2013 09:34, Steven D'Aprano wrote:

 On that note, I think I'll take the opportunity to give thanks to Peter
 Otten, who (if I remember correctly) has been here for longer than I
 have, and I've been here for a long time. In all that time, I don't think
 I've ever seen him snap at or be rude to anyone, not even those who
 deserved it, and he doesn't shy away from answering even the most poorly
 written questions.


 Peter, I don't know how you do it, but you're doing a fantastic job.

 
 Seconded.  For those who don't know Peter is always responding to
 queries on the tutor mailing list as well.  Definite case of the
 patience of a saint.

You're invited as a speaker to my funeral ;)

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: People in the python community [was Re: Unicode issue with Python v3.3]

Anyone please?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: People in the python community [was Re: Unicode issue with Python v3.3]

2013-04-10 Thread Mark Lawrence


On 10/04/2013 15:43, Νίκος Γκρ33κ wrote:

Anyone please?



I have already shown my support for Peter Otten on this thread.  Are you 
asking for more people to do so?


--
If you're using GoogleCrap™ please read this 
http://wiki.python.org/moin/GoogleGroupsPython.


Mark Lawrence

--
http://mail.python.org/mailman/listinfo/python-list

Re: People in the python community [was Re: Unicode issue with Python v3.3]

2013-04-10 Thread Chris Angelico

On Thu, Apr 11, 2013 at 1:15 AM, Mark Lawrence breamore...@yahoo.co.uk wrote:
 On 10/04/2013 15:43, Νίκος Γκρ33κ wrote:

 Anyone please?


 I have already shown my support for Peter Otten on this thread.  Are you
 asking for more people to do so?

Sure, I can! He's one of the people who keeps this list/ng productive
and helpful. People can come here with Python problems and get Python
solutions.

(I wouldn't normally me too a thread, but hey, with that opening!)

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: People in the python community [was Re: Unicode issue with Python v3.3]

I'am not sure i follow you.
How did my topic changed?! Is this possible?

How about the oce i posted at patebin.com.
Did anyone by any chnace had a look into?

It's only a single thing iam missing for the encoding and the the script will 
load properly with python 3.3
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-10 Thread Nobody

On Wed, 10 Apr 2013 00:23:46 -0700, nagia.retsina wrote:

 Look at what 'python3 metrites.py' gives me

   File /root/.local/lib/python2.7/lib/python3.3/os.py, line 669, ...
 ^^^   ^^^


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

Τη Τετάρτη, 10 Απριλίου 2013 9:08:38 μ.μ. UTC+3, ο χρήστης Nobody έγραψε:
 On Wed, 10 Apr 2013 00:23:46 -0700, nagia.retsina wrote:
 
 
 
  Look at what 'python3 metrites.py' gives me
 
 
 
File /root/.local/lib/python2.7/lib/python3.3/os.py, line 669, ...
 
  ^^^   ^^^

Yes i see it in the traceback but i dont know what it means.
Please explain to me.
Tahnk you.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-10 Thread Ian Kelly

On Wed, Apr 10, 2013 at 12:25 PM, Νίκος Γκρ33κ nikos.gr...@gmail.com wrote:
 Τη Τετάρτη, 10 Απριλίου 2013 9:08:38 μ.μ. UTC+3, ο χρήστης Nobody έγραψε:
 On Wed, 10 Apr 2013 00:23:46 -0700, nagia.retsina wrote:



  Look at what 'python3 metrites.py' gives me



File /root/.local/lib/python2.7/lib/python3.3/os.py, line 669, ...

  ^^^   ^^^

 Yes i see it in the traceback but i dont know what it means.
 Please explain to me.
 Tahnk you.

It means that there is something very strange about the way that your
Python 3.3 is installed, as the libraries appear to be installed under
your Python 2.7 library directory.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-10 Thread Arnaud Delobelle

On 10 April 2013 09:28, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
 On Tue, 09 Apr 2013 23:04:35 -0700, rusi wrote:
[...]
 I think it is quite unfair of you to mischaracterise the entire community
 response in this way. One person made a light-hearted, silly, unhelpful
 response. (As sarcasm, I'm afraid it missed the target.) Two people made
 good, sensible responses -- and you were not either of them.

Enough already with the thought police.

It was me who made the silly reply to the guy who was ranting about
everything being broken, giving us nothing to help in on, ending his
message in an edifying and in my judgement, largely rhetorical
Suggestions?.  So I gave him some silly suggestions (*not* intended
to be sarcasm), and I'm not apologising for it.  At least I'm not
presuming to take the moral high ground at every half-opportunity.

Recently I gave a very quick reply to someone who was wondering why he
couldn't get the docstring from his descriptor - I didn't have the
time to expand because two of my kids had jumped on my knees almost as
soon as I'd got on the computer.  I decided to post the reply anyway
as I thought it would give the OP something to get started on and
nobody else seemed to have replied so far - but I got remonstrated for
not being complete enough in my reply!  What is that about?

AFAIK, this is not Python Customer Service, but a place for people who
are interested in Python to discuss problems and *freely* exchange
thoughts about the language and its ecosystem.  Over the year I've
posted the occasional silly message but I think my record is
overwhelmingly that I've tried to be helpful, and when I've needed
some help myself, I've got some great advice.  My first question on
this list was answered by Alex Martelli and nowadays I get most
excellent and concise tips from Peter Otten - thanks, Peter! If
there's one person on this list I don't want to offend, it's you!

So here's to lots more good and bad humour on this list, and the
occasional slightly un-pc remark even!

Cheers,

-- 
Arnaud
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-10 Thread Cameron Simpson

On 10Apr2013 01:06, Νίκος Γκρ33κ nikos.gr...@gmail.com wrote:
| Here is the whole code for metrites.py in case someone wants to take allok.
| 
| Everything is correct after altering it to meet python 3.3,
| everythign aprt from the weird unicode error thing.
| 
| http://pastebin.com/5Mpjx5Fd
| 
| please take a look.

From looking at the HTML source of the page:

  http://superhost.gr/

I see near the start:

  b'!DOCTYPE html

I'd say you have a bytes object that you've fed to print().
In python2, str is effectively bytes.
In python3, str is a sequence of Unicode code points, and bytes are
arrays of small integers.
If you feed a bytes object to print it will print a strig represenation
of it, starting with b'

The question is: where did the bytes object come from? A cursory
glance through your pastebin code doesn't show me anthing very
obvious.

I'd start by asking: where does the string !DOCTYPE come from?
Wherever that is, it seems to be bytes rather than str.
Start with that.

Cheers,
-- 
Cameron Simpson c...@zip.com.au

You don't have to live on the edge, but you have to know where it is.
- Scott Lilliott, c...@swl.msd.ray.com
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-10 Thread nagia . retsina

Firtly thank uou for taking a look into the code.

the doctype is coming form the attempt of script metrites.py to open and read 
the 'index.html' file.

But i don't know how to try to open it as a byte file instead of an tetxt file.
-- 
http://mail.python.org/mailman/listinfo/python-list

Unicode issue with Python v3.3

2013-04-09 Thread Νίκος Γκρ33κ

Hello, iam still trying to alter the code form python 2.6 = 3.3

Everyrging its setup except that unicode error that you can see if you go to 
http://superhost.gr

Can anyone help with this?
I even tried to change print() with sys.stdout.buffer() but still i get the 
same unicode issue.

I don't know what to try anymore.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-09 Thread Ian Kelly

On Tue, Apr 9, 2013 at 3:10 PM, Νίκος Γκρ33κ nikos.gr...@gmail.com wrote:
 Hello, iam still trying to alter the code form python 2.6 = 3.3

 Everyrging its setup except that unicode error that you can see if you go to 
 http://superhost.gr

 Can anyone help with this?
 I even tried to change print() with sys.stdout.buffer() but still i get the 
 same unicode issue.

 I don't know what to try anymore.

It seems to be failing on the line:

host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0]

So the obvious question to ask is: what are the contents of
os.environ['REMOTE_ADDR'] when this line is reached?

And why are you still trying to solve these sorts of problems on your
production website?  Do you not have a development or staging
environment?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-09 Thread nagia . retsina

Τη Τετάρτη, 10 Απριλίου 2013 12:34:25 π.μ. UTC+3, ο χρήστης Ian έγραψε:
 On Tue, Apr 9, 2013 at 3:10 PM, Νίκος Γκρ33κ nikos.gr...@gmail.com wrote:
 
  Hello, iam still trying to alter the code form python 2.6 = 3.3
 
 
 
  Everyrging its setup except that unicode error that you can see if you go 
  to http://superhost.gr
 
 
 
  Can anyone help with this?
 
  I even tried to change print() with sys.stdout.buffer() but still i get the 
  same unicode issue.
 
 
 
  I don't know what to try anymore.
 
 
 
 It seems to be failing on the line:
 
 
 
 host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0]
 
 
 
 So the obvious question to ask is: what are the contents of
 
 os.environ['REMOTE_ADDR'] when this line is reached?
 
 
 
 And why are you still trying to solve these sorts of problems on your
 
 production website?  Do you not have a development or staging
 
 environment?

No forget this line. this is not the problem.
No i don't have  a testing enviroment, i altered all the code form 2.6 to 3.3 
in the live enviromtnt.

i strongly believe there is somethign goind wrong with the prints(). Thoese are 
causing the unicode isu es much like as thes changes from:

quote = random.choice( list( open( /home/nikos/www/data/private/quotes.txt, ) 
) )

quote = random.choice( list( open( /home/nikos/www/data/private/quotes.txt, 
encoding=utf-8 ) ) )

in order for the open() to work.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-09 Thread Steven D'Aprano

On Tue, 09 Apr 2013 20:16:12 -0700, nagia.retsina wrote:

Τη Τετάρτη, 10 Απριλίου 2013 12:34:25 π.μ. UTC+3, ο χρήστης Ian έγραψε:
On Tue, Apr 9, 2013 at 3:10 PM, Νίκος Γκρ33κ nikos.gr...@gmail.com
wrote:

Hello, iam still trying to alter the code form python 2.6 = 3.3

Everyrging its setup except that unicode error that you can see if
you go to http://superhost.gr

Can anyone help with this?

I even tried to change print() with sys.stdout.buffer() but still i
get the same unicode issue.

I don't know what to try anymore.

It seems to be failing on the line:

host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0]

So the obvious question to ask is: what are the contents of

os.environ['REMOTE_ADDR'] when this line is reached?
[...]

No forget this line. this is not the problem. No i don't have a testing
enviroment, i altered all the code form 2.6 to 3.3 in the live
enviromtnt.

i strongly believe there is somethign goind wrong with the prints().

Obviously you know what the problem is much better than the Python
interpreter.

I suggest you open a bug report:

Errors printing bytes are wrongly claimed to be socket errors

and see what happens.

Or, you can listen to people who actually know what they are talking
about, and look at the actual error, which has NOTHING to do with print.

What does os.environ['REMOTE_ADDR'] give? Until you answer that question,
you won't make any progress.

--
Steven
--
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-09 Thread Chris Angelico

On Wed, Apr 10, 2013 at 2:25 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
On Tue, 09 Apr 2013 20:16:12 -0700, nagia.retsina wrote:

Τη Τετάρτη, 10 Απριλίου 2013 12:34:25 π.μ. UTC+3, ο χρήστης Ian έγραψε:
On Tue, Apr 9, 2013 at 3:10 PM, Νίκος Γκρ33κ nikos.gr...@gmail.com
wrote:

Hello, iam still trying to alter the code form python 2.6 = 3.3

Everyrging its setup except that unicode error that you can see if
you go to http://superhost.gr

Can anyone help with this?

I even tried to change print() with sys.stdout.buffer() but still i
get the same unicode issue.

I don't know what to try anymore.

It seems to be failing on the line:

host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0]

So the obvious question to ask is: what are the contents of

os.environ['REMOTE_ADDR'] when this line is reached?
[...]

No forget this line. this is not the problem. No i don't have a testing
enviroment, i altered all the code form 2.6 to 3.3 in the live
enviromtnt.

i strongly believe there is somethign goind wrong with the prints().

Obviously you know what the problem is much better than the Python
interpreter.

I just went to the page and it started playing sound. Between that and
this arrogant refusal to believe either the interpreter or the people
who are freely donating time to assist, I'm done. No more looking at
Nikos's home page to try to figure out his problems. Have fun, Nikos.

ChrisA
--
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode issue with Python v3.3

2013-04-09 Thread rusi

An interesting case of two threads:

On Apr 10, 9:46 am, Chris Angelico ros...@gmail.com wrote:
 On Wed, Apr 10, 2013 at 2:25 PM, Steven D'Aprano

  Obviously you know what the problem is much better than the Python
  interpreter.

 I just went to the page and it started playing sound. Between that and
 this arrogant refusal to believe either the interpreter or the people
 who are freely donating time to assist, I'm done. No more looking at
 Nikos's home page to try to figure out his problems. Have fun, Nikos.

 ChrisA

Some swans are black
Some homo sapiens have negative IQ
-- 
http://mail.python.org/mailman/listinfo/python-list

[issue6077] Unicode issue with tempfile on Windows

2009-11-29 Thread Amaury Forgeot d'Arc


Amaury Forgeot d'Arc amaur...@gmail.com added the comment:

Fixed with r76593 (py3k) and r76594 (release31-maint)

--
resolution: accepted - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6077
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6077] Unicode issue with tempfile on Windows

2009-11-20 Thread Antoine Pitrou


Changes by Antoine Pitrou pit...@free.fr:


--
components: +IO -Library (Lib)
priority:  - normal
stage:  - patch review
versions: +Python 3.1, Python 3.2 -Python 3.0

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6077
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6077] Unicode issue with tempfile on Windows

2009-11-20 Thread Antoine Pitrou


Antoine Pitrou pit...@free.fr added the comment:

The patch looks ok to me.

--
assignee:  - amaury.forgeotdarc
nosy: +pitrou
resolution:  - accepted
stage: patch review - commit review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6077
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Re: unicode issue

2009-10-06 Thread Gabriel Genellina

En Thu, 01 Oct 2009 12:10:58 -0300, Walter Dörwald wal...@livinglogic.de  
escribió:

On 01.10.09 16:09, Hyuga wrote:

On Sep 30, 3:34 am, gentlestone tibor.b...@hotmail.com wrote:



_MAP = {
# LATIN
u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A',
u'Æ': 'AE', u'Ç':'C', [...long table...]
}

def downcode(name):

 downcode(uŽabovitá zmiešaná kaša)
u'Zabovita zmiesana kasa'

for key, value in _MAP.iteritems():
name = name.replace(key, value)
return name


import unicodedata

def downcode(name):
   return unicodedata.normalize(NFD, name)\
  .encode(ascii, ignore)\
  .decode(ascii)


This article [1] shows a mixed technique, decomposing characters when such  
info is available in the Unicode tables, and also allowing for a custom  
mapping when not.


[1] http://effbot.org/zone/unicode-convert.htm

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

2009-10-01 Thread gentlestone

save in utf-8 the coding declaration also has to be utf-8

ok, I understand, but what's the problem? Unfortunately seems to be
the Python interactive
mode doesn't have unicode support. It recognize the latin-1 encoding
only.

So I have 2 options, how to write doctest:
1. Replace native charaters with their encoded representation like
u\u017dabovit\xe1 zmie\u0161an\xe1 ka\u0161a instead of uŽabovitá
zmiešaná kaša
2. Use latin-1 encoding, where the file is saved in utf-8

The first is bad because doctest is a great documenttion tool and it
is propably the main reason I use python. And something like
u\u017dabovit\xe1 zmie\u0161an\xe1 ka\u0161a is not a best
documentation style. But the tests work.

The second is bad, because the declaration is incorrect and if I use
it in Django model declaration for example I got bad data in the
application.

So what is the solution? Back to Java? :-)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

2009-10-01 Thread Dave Angel


gentlestone wrote:

save in utf-8 the coding declaration also has to be utf-8



ok, I understand, but what's the problem? Unfortunately seems to be
the Python interactive
mode doesn't have unicode support. It recognize the latin-1 encoding
only.

So I have 2 options, how to write doctest:
1. Replace native charaters with their encoded representation like
u\u017dabovit\xe1 zmie\u0161an\xe1 ka\u0161a instead of uŽabovitá
zmiešaná kaša
2. Use latin-1 encoding, where the file is saved in utf-8

The first is bad because doctest is a great documenttion tool and it
is propably the main reason I use python. And something like
u\u017dabovit\xe1 zmie\u0161an\xe1 ka\u0161a is not a best
documentation style. But the tests work.

The second is bad, because the declaration is incorrect and if I use
it in Django model declaration for example I got bad data in the
application.

So what is the solution? Back to Java? :-)

  
Wait -- don't give up yet.  Since I'm one of the ones who (partially) 
steered you wrong, let me try to help.


Key variable here is how your text editor behaves.  Since I've never 
taken my (programming) text editor out of ASCII mode before this week, 
it took some experimenting (and more importantly a message from Piet on 
this thread) to make sense of things.  I think I now know how to make my 
own editor (Komodo IDE) behave in this environment, and you probably can 
do as well or better.  In fact, judging from your messages, you probably 
are doing much better on the editor front.


When I tried this morning to re-open that test file from yesterday, many 
of the characters were all messed up.  I was okay as long as the project 
was still open, but not today.  The editor itself apparently looks to 
that encoding declaration when it's deciding how to interpret the bytes 
on disk.


So I did the following, using Komodo IDE.  I created a new file in the 
project.  Before saving it, I used 
Edit-CurrentFileSettings-Properties-Encoding to set it to UTF-8.  
*NOW* I pasted the stuff from your email message.  And added the

#-*- coding: utf-8 -*-

as the second line of the file.   Notice it's *NOT* latin-1.

At this point I save and run the file, and it seems to work fine.

My guess is that I could set these as default settings in Komodo, if I 
were doing UTF-8 very often, and it would become painless.  I know I 
have certain stuff in my python template, and could add that encoding 
line as well.



Anyway, that gets us to the step of running the doctest.  The trick here 
seems to be that we need to define the docstring as a Unicode docstring 
to have it interpreted correctly.  Try adding the u in front of the 
triple quote as follows:


def downcode(name):
   u
downcode(uŽabovitá zmiešaná kaša)
   u'Zabovita zmiesana kasa'
   
   for key, value in _MAP.iteritems():
   name = name.replace(key, value)
   return name

Now, if the doctest passes, we seem to be in good shape.

There's another problem, that hopefully somebody else can help with.  
That's if doctest needs to report an error.  When I deliberately changed 
the expect string I get an error like the following.


UnicodeEncodeError: 'ascii' codec can't encode character u'\u017d' in 
position 1

50: ordinal not in range(128)

I get a similar error if running the -v option on doctest.   (Note that 
I do *NOT* get the error when running inside Komodo.  And what I've read 
implies that the same would be true if running inside IDLE.)  The 
problem is similar to the one you'd have doing a simple:


   print u\u017d

I think these are avoided if  sys.stdout.encoding (and maybe 
sys.stderr.encoding) are set to utf-8.  On my system they're set to 
None, which says to use the system default encoding.  On my system 
that would be ASCII, so I get the error.  But perhaps yours is already 
something better.


I found links:  
http://drj11.wordpress.com/2007/05/14/python-how-is-sysstdoutencoding-chosen/

http://wiki.python.org/moin/PrintFails

http://lists.macromates.com/textmate/2008-June/025735.html
  which indicate you may want to try:  


set LC_CTYPE=en_GB.utf-8 python

at the command prompt before running python.  This could be system specific;  
it didn't work for me on XP.

The workaround that works for me (so far) is:

if __name__ == __main__:
   import sys, codecs
   sys.stdout = codecs.getwriter('utf8')(sys.stdout)

   print uŽabovitá zmiešaná kaša
   import doctest
   doctest.testmod()

The codecs line tells python that stdout should use utf-8.  That doesn't make 
the characters look good on my console, but at least it avoids the errors.  I'm 
guessing that on my system I should use latin1 here instead of utf8.  But I 
don't want to confuse things.


HTH

DaveA

--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

2009-10-01 Thread Hyuga

On Sep 30, 3:34 am, gentlestone tibor.b...@hotmail.com wrote:
 Why don't work this code on Python 2.6? Or how can I do this job?

 _MAP = {
     # LATIN
     u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A',
 u'Æ': 'AE', u'Ç':'C',
     u'È': 'E', u'É': 'E', u'Ê': 'E', u'Ë': 'E', u'Ì': 'I', u'Í': 'I',
 u'Î': 'I',
     u'Ï': 'I', u'Ð': 'D', u'Ñ': 'N', u'Ò': 'O', u'Ó': 'O', u'Ô': 'O',
 u'Õ': 'O', u'Ö':'O',
     u'Ő': 'O', u'Ø': 'O', u'Ù': 'U', u'Ú': 'U', u'Û': 'U', u'Ü': 'U',
 u'Ű': 'U',
     u'Ý': 'Y', u'Þ': 'TH', u'ß': 'ss', u'à':'a', u'á':'a', u'â': 'a',
 u'ã': 'a', u'ä':'a',
     u'å': 'a', u'æ': 'ae', u'ç': 'c', u'è': 'e', u'é': 'e', u'ê': 'e',
 u'ë': 'e',
     u'ì': 'i', u'í': 'i', u'î': 'i', u'ï': 'i', u'ð': 'd', u'ñ': 'n',
 u'ò': 'o', u'ó':'o',
     u'ô': 'o', u'õ': 'o', u'ö': 'o', u'ő': 'o', u'ø': 'o', u'ù': 'u',
 u'ú': 'u',
     u'û': 'u', u'ü': 'u', u'ű': 'u', u'ý': 'y', u'þ': 'th', u'ÿ': 'y',
     # LATIN_SYMBOLS
     u'©':'(c)',
     # GREEK
     u'α':'a', u'β':'b', u'γ':'g', u'δ':'d', u'ε':'e', u'ζ':'z',
 u'η':'h', u'θ':'8',
     u'ι':'i', u'κ':'k', u'λ':'l', u'μ':'m', u'ν':'n', u'ξ':'3',
 u'ο':'o', u'π':'p',
     u'ρ':'r', u'σ':'s', u'τ':'t', u'υ':'y', u'φ':'f', u'χ':'x',
 u'ψ':'ps', u'ω':'w',
     u'ά':'a', u'έ':'e', u'ί':'i', u'ό':'o', u'ύ':'y', u'ή':'h',
 u'ώ':'w', u'ς':'s',
     u'ϊ':'i', u'ΰ':'y', u'ϋ':'y', u'ΐ':'i',
     u'Α':'A', u'Β':'B', u'Γ':'G', u'Δ':'D', u'Ε':'E', u'Ζ':'Z',
 u'Η':'H', u'Θ':'8',
     u'Ι':'I', u'Κ':'K', u'Λ':'L', u'Μ':'M', u'Ν':'N', u'Ξ':'3',
 u'Ο':'O', u'Π':'P',
     u'Ρ':'R', u'Σ':'S', u'Τ':'T', u'Υ':'Y', u'Φ':'F', u'Χ':'X',
 u'Ψ':'PS', u'Ω':'W',
     u'Ά':'A', u'Έ':'E', u'Ί':'I', u'Ό':'O', u'Ύ':'Y', u'Ή':'H',
 u'Ώ':'W', u'Ϊ':'I', u'Ϋ':'Y',
     # TURKISH
     u'ş':'s', u'Ş':'S', u'ı':'i', u'İ':'I', u'ç':'c', u'Ç':'C',
 u'ü':'u', u'Ü':'U',
     u'ö':'o', u'Ö':'O', u'ğ':'g', u'Ğ':'G',
     # RUSSIAN
     u'а':'a', u'б':'b', u'в':'v', u'г':'g', u'д':'d', u'е':'e',
 u'ё':'yo', u'ж':'zh',
     u'з':'z', u'и':'i', u'й':'j', u'к':'k', u'л':'l', u'м':'m',
 u'н':'n', u'о':'o',
     u'п':'p', u'р':'r', u'с':'s', u'т':'t', u'у':'u', u'ф':'f',
 u'х':'h', u'ц':'c',
     u'ч':'ch', u'ш':'sh', u'щ':'sh', u'ъ':'', u'ы':'y', u'ь':'',
 u'э':'e', u'ю':'yu', u'я':'ya',
     u'А':'A', u'Б':'B', u'В':'V', u'Г':'G', u'Д':'D', u'Е':'E',
 u'Ё':'Yo', u'Ж':'Zh',
     u'З':'Z', u'И':'I', u'Й':'J', u'К':'K', u'Л':'L', u'М':'M',
 u'Н':'N', u'О':'O',
     u'П':'P', u'Р':'R', u'С':'S', u'Т':'T', u'У':'U', u'Ф':'F',
 u'Х':'H', u'Ц':'C',
     u'Ч':'Ch', u'Ш':'Sh', u'Щ':'Sh', u'Ъ':'', u'Ы':'Y', u'Ь':'',
 u'Э':'E', u'Ю':'Yu', u'Я':'Ya',
     # UKRAINIAN
     u'Є':'Ye', u'І':'I', u'Ї':'Yi', u'Ґ':'G', u'є':'ye', u'і':'i',
 u'ї':'yi', u'ґ':'g',
     # CZECH
     u'č':'c', u'ď':'d', u'ě':'e', u'ň':'n', u'ř':'r', u'š':'s',
 u'ť':'t', u'ů':'u',
     u'ž':'z', u'Č':'C', u'Ď':'D', u'Ě':'E', u'Ň':'N', u'Ř':'R',
 u'Š':'S', u'Ť':'T', u'Ů':'U', u'Ž':'Z',
     # POLISH
     u'ą':'a', u'ć':'c', u'ę':'e', u'ł':'l', u'ń':'n', u'ó':'o',
 u'ś':'s', u'ź':'z',
     u'ż':'z', u'Ą':'A', u'Ć':'C', u'Ę':'e', u'Ł':'L', u'Ń':'N',
 u'Ó':'o', u'Ś':'S',
     u'Ź':'Z', u'Ż':'Z',
     # LATVIAN
     u'ā':'a', u'č':'c', u'ē':'e', u'ģ':'g', u'ī':'i', u'ķ':'k',
 u'ļ':'l', u'ņ':'n',
     u'š':'s', u'ū':'u', u'ž':'z', u'Ā':'A', u'Č':'C', u'Ē':'E',
 u'Ģ':'G', u'Ī':'i',
     u'Ķ':'k', u'Ļ':'L', u'Ņ':'N', u'Š':'S', u'Ū':'u', u'Ž':'Z'

 }

 def downcode(name):
     
      downcode(uŽabovitá zmiešaná kaša)
     u'Zabovita zmiesana kasa'
     
     for key, value in _MAP.iteritems():
         name = name.replace(key, value)
     return name

Though C Python is pretty optimized under the hood for this sort of
single-character replacement, this still seems pretty inefficient
since you're calling replace for every character you want to map.  I
think that a better approach might be something like:

def downcode(name):
return ''.join(_MAP.get(c, c) for c in name)

Or using string.translate:

import string
def downcode(name):
table = string.maketrans(
'ÀÁÂÃÄÅ...',
'AA...')
return name.translate(table)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

2009-10-01 Thread Walter Dörwald

On 01.10.09 16:09, Hyuga wrote:
 On Sep 30, 3:34 am, gentlestone tibor.b...@hotmail.com wrote:
 Why don't work this code on Python 2.6? Or how can I do this job?

 _MAP = {
 # LATIN
 u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A',
 u'Æ': 'AE', u'Ç':'C',
 u'È': 'E', u'É': 'E', u'Ê': 'E', u'Ë': 'E', u'Ì': 'I', u'Í': 'I',
 u'Î': 'I',
 u'Ï': 'I', u'Ð': 'D', u'Ñ': 'N', u'Ò': 'O', u'Ó': 'O', u'Ô': 'O',
 u'Õ': 'O', u'Ö':'O',
 u'Ő': 'O', u'Ø': 'O', u'Ù': 'U', u'Ú': 'U', u'Û': 'U', u'Ü': 'U',
 u'Ű': 'U',
 u'Ý': 'Y', u'Þ': 'TH', u'ß': 'ss', u'à':'a', u'á':'a', u'â': 'a',
 u'ã': 'a', u'ä':'a',
 u'å': 'a', u'æ': 'ae', u'ç': 'c', u'è': 'e', u'é': 'e', u'ê': 'e',
 u'ë': 'e',
 u'ì': 'i', u'í': 'i', u'î': 'i', u'ï': 'i', u'ð': 'd', u'ñ': 'n',
 u'ò': 'o', u'ó':'o',
 u'ô': 'o', u'õ': 'o', u'ö': 'o', u'ő': 'o', u'ø': 'o', u'ù': 'u',
 u'ú': 'u',
 u'û': 'u', u'ü': 'u', u'ű': 'u', u'ý': 'y', u'þ': 'th', u'ÿ': 'y',
 # LATIN_SYMBOLS
 u'©':'(c)',
 # GREEK
 u'α':'a', u'β':'b', u'γ':'g', u'δ':'d', u'ε':'e', u'ζ':'z',
 u'η':'h', u'θ':'8',
 u'ι':'i', u'κ':'k', u'λ':'l', u'μ':'m', u'ν':'n', u'ξ':'3',
 u'ο':'o', u'π':'p',
 u'ρ':'r', u'σ':'s', u'τ':'t', u'υ':'y', u'φ':'f', u'χ':'x',
 u'ψ':'ps', u'ω':'w',
 u'ά':'a', u'έ':'e', u'ί':'i', u'ό':'o', u'ύ':'y', u'ή':'h',
 u'ώ':'w', u'ς':'s',
 u'ϊ':'i', u'ΰ':'y', u'ϋ':'y', u'ΐ':'i',
 u'Α':'A', u'Β':'B', u'Γ':'G', u'Δ':'D', u'Ε':'E', u'Ζ':'Z',
 u'Η':'H', u'Θ':'8',
 u'Ι':'I', u'Κ':'K', u'Λ':'L', u'Μ':'M', u'Ν':'N', u'Ξ':'3',
 u'Ο':'O', u'Π':'P',
 u'Ρ':'R', u'Σ':'S', u'Τ':'T', u'Υ':'Y', u'Φ':'F', u'Χ':'X',
 u'Ψ':'PS', u'Ω':'W',
 u'Ά':'A', u'Έ':'E', u'Ί':'I', u'Ό':'O', u'Ύ':'Y', u'Ή':'H',
 u'Ώ':'W', u'Ϊ':'I', u'Ϋ':'Y',
 # TURKISH
 u'ş':'s', u'Ş':'S', u'ı':'i', u'İ':'I', u'ç':'c', u'Ç':'C',
 u'ü':'u', u'Ü':'U',
 u'ö':'o', u'Ö':'O', u'ğ':'g', u'Ğ':'G',
 # RUSSIAN
 u'а':'a', u'б':'b', u'в':'v', u'г':'g', u'д':'d', u'е':'e',
 u'ё':'yo', u'ж':'zh',
 u'з':'z', u'и':'i', u'й':'j', u'к':'k', u'л':'l', u'м':'m',
 u'н':'n', u'о':'o',
 u'п':'p', u'р':'r', u'с':'s', u'т':'t', u'у':'u', u'ф':'f',
 u'х':'h', u'ц':'c',
 u'ч':'ch', u'ш':'sh', u'щ':'sh', u'ъ':'', u'ы':'y', u'ь':'',
 u'э':'e', u'ю':'yu', u'я':'ya',
 u'А':'A', u'Б':'B', u'В':'V', u'Г':'G', u'Д':'D', u'Е':'E',
 u'Ё':'Yo', u'Ж':'Zh',
 u'З':'Z', u'И':'I', u'Й':'J', u'К':'K', u'Л':'L', u'М':'M',
 u'Н':'N', u'О':'O',
 u'П':'P', u'Р':'R', u'С':'S', u'Т':'T', u'У':'U', u'Ф':'F',
 u'Х':'H', u'Ц':'C',
 u'Ч':'Ch', u'Ш':'Sh', u'Щ':'Sh', u'Ъ':'', u'Ы':'Y', u'Ь':'',
 u'Э':'E', u'Ю':'Yu', u'Я':'Ya',
 # UKRAINIAN
 u'Є':'Ye', u'І':'I', u'Ї':'Yi', u'Ґ':'G', u'є':'ye', u'і':'i',
 u'ї':'yi', u'ґ':'g',
 # CZECH
 u'č':'c', u'ď':'d', u'ě':'e', u'ň':'n', u'ř':'r', u'š':'s',
 u'ť':'t', u'ů':'u',
 u'ž':'z', u'Č':'C', u'Ď':'D', u'Ě':'E', u'Ň':'N', u'Ř':'R',
 u'Š':'S', u'Ť':'T', u'Ů':'U', u'Ž':'Z',
 # POLISH
 u'ą':'a', u'ć':'c', u'ę':'e', u'ł':'l', u'ń':'n', u'ó':'o',
 u'ś':'s', u'ź':'z',
 u'ż':'z', u'Ą':'A', u'Ć':'C', u'Ę':'e', u'Ł':'L', u'Ń':'N',
 u'Ó':'o', u'Ś':'S',
 u'Ź':'Z', u'Ż':'Z',
 # LATVIAN
 u'ā':'a', u'č':'c', u'ē':'e', u'ģ':'g', u'ī':'i', u'ķ':'k',
 u'ļ':'l', u'ņ':'n',
 u'š':'s', u'ū':'u', u'ž':'z', u'Ā':'A', u'Č':'C', u'Ē':'E',
 u'Ģ':'G', u'Ī':'i',
 u'Ķ':'k', u'Ļ':'L', u'Ņ':'N', u'Š':'S', u'Ū':'u', u'Ž':'Z'

 }

 def downcode(name):
 
  downcode(uŽabovitá zmiešaná kaša)
 u'Zabovita zmiesana kasa'
 
 for key, value in _MAP.iteritems():
 name = name.replace(key, value)
 return name
 
 Though C Python is pretty optimized under the hood for this sort of
 single-character replacement, this still seems pretty inefficient
 since you're calling replace for every character you want to map.  I
 think that a better approach might be something like:
 
 def downcode(name):
 return ''.join(_MAP.get(c, c) for c in name)
 
 Or using string.translate:
 
 import string
 def downcode(name):
 table = string.maketrans(
 'ÀÁÂÃÄÅ...',
 'AA...')
 return name.translate(table)

Or even simpler:

import unicodedata

def downcode(name):
   return unicodedata.normalize(NFD, name)\
  .encode(ascii, ignore)\
  .decode(ascii)

Servus,
   Walter
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

2009-10-01 Thread Rami Chowdhury

On Thu, 01 Oct 2009 08:10:58 -0700, Walter Dörwald wal...@livinglogic.de  
wrote:



On 01.10.09 16:09, Hyuga wrote:

On Sep 30, 3:34 am, gentlestone tibor.b...@hotmail.com wrote:

Why don't work this code on Python 2.6? Or how can I do this job?

[snip _MAP]

def downcode(name):

 downcode(uŽabovitá zmiešaná kaša)
u'Zabovita zmiesana kasa'

for key, value in _MAP.iteritems():
name = name.replace(key, value)
return name


Though C Python is pretty optimized under the hood for this sort of
single-character replacement, this still seems pretty inefficient
since you're calling replace for every character you want to map.  I
think that a better approach might be something like:

def downcode(name):
return ''.join(_MAP.get(c, c) for c in name)

Or using string.translate:

import string
def downcode(name):
table = string.maketrans(
'ÀÁÂÃÄÅ...',
'AA...')
return name.translate(table)


Or even simpler:

import unicodedata

def downcode(name):
   return unicodedata.normalize(NFD, name)\
  .encode(ascii, ignore)\
  .decode(ascii)

Servus,
   Walter


As I understand it, the ignore argument to str.encode *removes* the  
undecodable characters, rather than replacing them with an ASCII  
approximation. Is that correct? If so, wouldn't that rather defeat the  
purpose?


--
Rami Chowdhury
Never attribute to malice that which can be attributed to stupidity --  
Hanlon's Razor

408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)
--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

2009-10-01 Thread Walter Dörwald

On 01.10.09 17:50, Rami Chowdhury wrote:
 On Thu, 01 Oct 2009 08:10:58 -0700, Walter Dörwald
 wal...@livinglogic.de wrote:
 
 On 01.10.09 16:09, Hyuga wrote:
 On Sep 30, 3:34 am, gentlestone tibor.b...@hotmail.com wrote:
 Why don't work this code on Python 2.6? Or how can I do this job?

 [snip _MAP]

 def downcode(name):
 
  downcode(uŽabovitá zmiešaná kaša)
 u'Zabovita zmiesana kasa'
 
 for key, value in _MAP.iteritems():
 name = name.replace(key, value)
 return name

 Though C Python is pretty optimized under the hood for this sort of
 single-character replacement, this still seems pretty inefficient
 since you're calling replace for every character you want to map.  I
 think that a better approach might be something like:

 def downcode(name):
 return ''.join(_MAP.get(c, c) for c in name)

 Or using string.translate:

 import string
 def downcode(name):
 table = string.maketrans(
 'ÀÁÂÃÄÅ...',
 'AA...')
 return name.translate(table)

 Or even simpler:

 import unicodedata

 def downcode(name):
return unicodedata.normalize(NFD, name)\
   .encode(ascii, ignore)\
   .decode(ascii)

 Servus,
Walter
 
 As I understand it, the ignore argument to str.encode *removes* the
 undecodable characters, rather than replacing them with an ASCII
 approximation. Is that correct? If so, wouldn't that rather defeat the
 purpose?

Yes, but any accented characters have been split into the base character
and the combining accent via normalize() before, so only the accent gets
removed. Of course non-decomposable characters will be removed
completely, but it would be possible to replace

   .encode(ascii, ignore).decode(ascii)

with something like this:

   u.join(c for c in name if unicodedata.category(c) == Mn)

Servus,
   Walter
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

2009-10-01 Thread Peter Otten

Rami Chowdhury wrote:

 On Thu, 01 Oct 2009 08:10:58 -0700, Walter Dörwald wal...@livinglogic.de
 wrote:
 
 On 01.10.09 16:09, Hyuga wrote:
 On Sep 30, 3:34 am, gentlestone tibor.b...@hotmail.com wrote:
 Why don't work this code on Python 2.6? Or how can I do this job?

 [snip _MAP]

 def downcode(name):
 
  downcode(uŽabovitá zmiešaná kaša)
 u'Zabovita zmiesana kasa'
 
 for key, value in _MAP.iteritems():
 name = name.replace(key, value)
 return name

 Though C Python is pretty optimized under the hood for this sort of
 single-character replacement, this still seems pretty inefficient
 since you're calling replace for every character you want to map.  I
 think that a better approach might be something like:

 def downcode(name):
 return ''.join(_MAP.get(c, c) for c in name)

 Or using string.translate:

 import string
 def downcode(name):
 table = string.maketrans(
 'ÀÁÂÃÄÅ...',
 'AA...')
 return name.translate(table)

 Or even simpler:

 import unicodedata

 def downcode(name):
return unicodedata.normalize(NFD, name)\
   .encode(ascii, ignore)\
   .decode(ascii)

 Servus,
Walter
 
 As I understand it, the ignore argument to str.encode *removes* the
 undecodable characters, rather than replacing them with an ASCII
 approximation. Is that correct? If so, wouldn't that rather defeat the
 purpose?

You didn't take the normalization step into your consideration. Example:

 import unicodedata
 s = uÄ
 unicodedata.normalize(NFD, s)
u'A\u0308'
 _.encode(ascii, ignore)
'A'



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

2009-10-01 Thread Rami Chowdhury

On Thu, 01 Oct 2009 09:03:38 -0700, Walter Dörwald wal...@livinglogic.de  
wrote:


Yes, but any accented characters have been split into the base character
and the combining accent via normalize() before, so only the accent gets
removed. Of course non-decomposable characters will be removed
completely, but it would be possible to replace

   .encode(ascii, ignore).decode(ascii)

with something like this:

   u.join(c for c in name if unicodedata.category(c) == Mn)

Servus,
   Walter


Thank you for the clarification!

--
Rami Chowdhury
Never attribute to malice that which can be attributed to stupidity --  
Hanlon's Razor

408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)
--
http://mail.python.org/mailman/listinfo/python-list

unicode issue

Why don't work this code on Python 2.6? Or how can I do this job?

_MAP = {
# LATIN
u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A',
u'Æ': 'AE', u'Ç':'C',
u'È': 'E', u'É': 'E', u'Ê': 'E', u'Ë': 'E', u'Ì': 'I', u'Í': 'I',
u'Î': 'I',
u'Ï': 'I', u'Ð': 'D', u'Ñ': 'N', u'Ò': 'O', u'Ó': 'O', u'Ô': 'O',
u'Õ': 'O', u'Ö':'O',
u'Ő': 'O', u'Ø': 'O', u'Ù': 'U', u'Ú': 'U', u'Û': 'U', u'Ü': 'U',
u'Ű': 'U',
u'Ý': 'Y', u'Þ': 'TH', u'ß': 'ss', u'à':'a', u'á':'a', u'â': 'a',
u'ã': 'a', u'ä':'a',
u'å': 'a', u'æ': 'ae', u'ç': 'c', u'è': 'e', u'é': 'e', u'ê': 'e',
u'ë': 'e',
u'ì': 'i', u'í': 'i', u'î': 'i', u'ï': 'i', u'ð': 'd', u'ñ': 'n',
u'ò': 'o', u'ó':'o',
u'ô': 'o', u'õ': 'o', u'ö': 'o', u'ő': 'o', u'ø': 'o', u'ù': 'u',
u'ú': 'u',
u'û': 'u', u'ü': 'u', u'ű': 'u', u'ý': 'y', u'þ': 'th', u'ÿ': 'y',
# LATIN_SYMBOLS
u'©':'(c)',
# GREEK
u'α':'a', u'β':'b', u'γ':'g', u'δ':'d', u'ε':'e', u'ζ':'z',
u'η':'h', u'θ':'8',
u'ι':'i', u'κ':'k', u'λ':'l', u'μ':'m', u'ν':'n', u'ξ':'3',
u'ο':'o', u'π':'p',
u'ρ':'r', u'σ':'s', u'τ':'t', u'υ':'y', u'φ':'f', u'χ':'x',
u'ψ':'ps', u'ω':'w',
u'ά':'a', u'έ':'e', u'ί':'i', u'ό':'o', u'ύ':'y', u'ή':'h',
u'ώ':'w', u'ς':'s',
u'ϊ':'i', u'ΰ':'y', u'ϋ':'y', u'ΐ':'i',
u'Α':'A', u'Β':'B', u'Γ':'G', u'Δ':'D', u'Ε':'E', u'Ζ':'Z',
u'Η':'H', u'Θ':'8',
u'Ι':'I', u'Κ':'K', u'Λ':'L', u'Μ':'M', u'Ν':'N', u'Ξ':'3',
u'Ο':'O', u'Π':'P',
u'Ρ':'R', u'Σ':'S', u'Τ':'T', u'Υ':'Y', u'Φ':'F', u'Χ':'X',
u'Ψ':'PS', u'Ω':'W',
u'Ά':'A', u'Έ':'E', u'Ί':'I', u'Ό':'O', u'Ύ':'Y', u'Ή':'H',
u'Ώ':'W', u'Ϊ':'I', u'Ϋ':'Y',
# TURKISH
u'ş':'s', u'Ş':'S', u'ı':'i', u'İ':'I', u'ç':'c', u'Ç':'C',
u'ü':'u', u'Ü':'U',
u'ö':'o', u'Ö':'O', u'ğ':'g', u'Ğ':'G',
# RUSSIAN
u'а':'a', u'б':'b', u'в':'v', u'г':'g', u'д':'d', u'е':'e',
u'ё':'yo', u'ж':'zh',
u'з':'z', u'и':'i', u'й':'j', u'к':'k', u'л':'l', u'м':'m',
u'н':'n', u'о':'o',
u'п':'p', u'р':'r', u'с':'s', u'т':'t', u'у':'u', u'ф':'f',
u'х':'h', u'ц':'c',
u'ч':'ch', u'ш':'sh', u'щ':'sh', u'ъ':'', u'ы':'y', u'ь':'',
u'э':'e', u'ю':'yu', u'я':'ya',
u'А':'A', u'Б':'B', u'В':'V', u'Г':'G', u'Д':'D', u'Е':'E',
u'Ё':'Yo', u'Ж':'Zh',
u'З':'Z', u'И':'I', u'Й':'J', u'К':'K', u'Л':'L', u'М':'M',
u'Н':'N', u'О':'O',
u'П':'P', u'Р':'R', u'С':'S', u'Т':'T', u'У':'U', u'Ф':'F',
u'Х':'H', u'Ц':'C',
u'Ч':'Ch', u'Ш':'Sh', u'Щ':'Sh', u'Ъ':'', u'Ы':'Y', u'Ь':'',
u'Э':'E', u'Ю':'Yu', u'Я':'Ya',
# UKRAINIAN
u'Є':'Ye', u'І':'I', u'Ї':'Yi', u'Ґ':'G', u'є':'ye', u'і':'i',
u'ї':'yi', u'ґ':'g',
# CZECH
u'č':'c', u'ď':'d', u'ě':'e', u'ň':'n', u'ř':'r', u'š':'s',
u'ť':'t', u'ů':'u',
u'ž':'z', u'Č':'C', u'Ď':'D', u'Ě':'E', u'Ň':'N', u'Ř':'R',
u'Š':'S', u'Ť':'T', u'Ů':'U', u'Ž':'Z',
# POLISH
u'ą':'a', u'ć':'c', u'ę':'e', u'ł':'l', u'ń':'n', u'ó':'o',
u'ś':'s', u'ź':'z',
u'ż':'z', u'Ą':'A', u'Ć':'C', u'Ę':'e', u'Ł':'L', u'Ń':'N',
u'Ó':'o', u'Ś':'S',
u'Ź':'Z', u'Ż':'Z',
# LATVIAN
u'ā':'a', u'č':'c', u'ē':'e', u'ģ':'g', u'ī':'i', u'ķ':'k',
u'ļ':'l', u'ņ':'n',
u'š':'s', u'ū':'u', u'ž':'z', u'Ā':'A', u'Č':'C', u'Ē':'E',
u'Ģ':'G', u'Ī':'i',
u'Ķ':'k', u'Ļ':'L', u'Ņ':'N', u'Š':'S', u'Ū':'u', u'Ž':'Z'
}

def downcode(name):

 downcode(uŽabovitá zmiešaná kaša)
u'Zabovita zmiesana kasa'

for key, value in _MAP.iteritems():
name = name.replace(key, value)
return name
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

2009-09-30 Thread Andre Engels

On Wed, Sep 30, 2009 at 9:34 AM, gentlestone tibor.b...@hotmail.com wrote:
 Why don't work this code on Python 2.6? Or how can I do this job?

Please be more specific than it doesn't work:
* What exactly are you doing
* What were you expecting the result of that to be
* What is the actual result?

-- 
André Engels, andreeng...@gmail.com
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

On 30. Sep., 09:41 h., Andre Engels andreeng...@gmail.com wrote:
 On Wed, Sep 30, 2009 at 9:34 AM, gentlestone tibor.b...@hotmail.com wrote:
  Why don't work this code on Python 2.6? Or how can I do this job?

 Please be more specific than it doesn't work:
 * What exactly are you doing
 * What were you expecting the result of that to be
 * What is the actual result?

 --
 André Engels, andreeng...@gmail.com

* What exactly are you doing
replace non-ascii characters - see doctest documentation

* What were you expecting the result of that to be
see doctest documentation

* What is the actual result?
the actual result is unchanged name
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

2009-09-30 Thread Andre Engels

I get the feeling that the problem is with the Python interactive
mode. It does not have full unicode support, so uŽabovitá zmiešaná
kaša is changed to u'\x8eabovit\xe1 zmie\x9aan\xe1 ka\x9aa'. If you
call your code from another program, it might work correctly.


-- 
André Engels, andreeng...@gmail.com
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

On 30. Sep., 10:35 h., Andre Engels andreeng...@gmail.com wrote:
 I get the feeling that the problem is with the Python interactive
 mode. It does not have full unicode support, so uŽabovitá zmiešaná
 kaša is changed to u'\x8eabovit\xe1 zmie\x9aan\xe1 ka\x9aa'. If you
 call your code from another program, it might work correctly.

 --
 André Engels, andreeng...@gmail.com

thx a lot

I spent 2 days of my life beacause of this

so doctests are unuseable for non-engish users in python - seems to be
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

On 30. Sep., 10:43 h., gentlestone tibor.b...@hotmail.com wrote:
 On 30. Sep., 10:35 h., Andre Engels andreeng...@gmail.com wrote:

  I get the feeling that the problem is with the Python interactive
  mode. It does not have full unicode support, so uŽabovitá zmiešaná
  kaša is changed to u'\x8eabovit\xe1 zmie\x9aan\xe1 ka\x9aa'. If you
  call your code from another program, it might work correctly.

  --
  André Engels, andreeng...@gmail.com

 thx a lot

 I spent 2 days of my life beacause of this

 so doctests are unuseable for non-engish users in python - seems to be

yes, you are right, now it works:

def slugify(name):

 slugify(u'\u017dabovit\xe1 zmie\u0161an\xe1 ka\u0161a s.r.o')
u'zabovita-zmiesana-kasa-sro'

for key, value in _MAP.iteritems():
name = name.replace(key, value)
return defaultfilters.slugify(name)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

2009-09-30 Thread Dave Angel


gentlestone wrote:

Why don't work this code on Python 2.6? Or how can I do this job?

_MAP =
# LATIN
u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A',
u'Æ': 'AE', u'Ç':'C',
u'È': 'E', u'É': 'E', u'Ê': 'E', u'Ë': 'E', u'Ì': 'I', u'Í': 'I',
u'Î': 'I',
u'Ï': 'I', u'Ð': 'D', u'Ñ': 'N', u'Ò': 'O', u'Ó': 'O', u'Ô': 'O',
u'Õ': 'O', u'Ö':'O',
u'Ő': 'O', u'Ø': 'O', u'Ù': 'U', u'Ú': 'U', u'Û': 'U', u'Ü': 'U',
u'Ű': 'U',
u'Ý': 'Y', u'Þ': 'TH', u'ß': 'ss', u'à':'a', u'á':'a', u'â': 'a',
u'ã': 'a', u'ä':'a',
u'å': 'a', u'æ': 'ae', u'ç': 'c', u'è': 'e', u'é': 'e', u'ê': 'e',
u'ë': 'e',
u'ì': 'i', u'í': 'i', u'î': 'i', u'ï': 'i', u'ð': 'd', u'ñ': 'n',
u'ò': 'o', u'ó':'o',
u'ô': 'o', u'õ': 'o', u'ö': 'o', u'ő': 'o', u'ø': 'o', u'ù': 'u',
u'ú': 'u',
u'û': 'u', u'ü': 'u', u'ű': 'u', u'ý': 'y', u'þ': 'th', u'ÿ': 'y',
# LATIN_SYMBOLS
u'©':'(c)',
# GREEK
u'α':'a', u'β':'b', u'γ':'g', u'δ':'d', u'ε':'e', u'ζ':'z',
u'η':'h', u'θ':'8',
u'ι':'i', u'κ':'k', u'λ':'l', u'μ':'m', u'ν':'n', u'ξ':'3',
u'ο':'o', u'π':'p',
u'ρ':'r', u'σ':'s', u'τ':'t', u'υ':'y', u'φ':'f', u'χ':'x',
u'ψ':'ps', u'ω':'w',
u'ά':'a', u'έ':'e', u'ί':'i', u'ό':'o', u'ύ':'y', u'ή':'h',
u'ώ':'w', u'ς':'s',
u'ϊ':'i', u'ΰ':'y', u'ϋ':'y', u'ΐ':'i',
u'Α':'A', u'Β':'B', u'Γ':'G', u'Δ':'D', u'Ε':'E', u'Ζ':'Z',
u'Η':'H', u'Θ':'8',
u'Ι':'I', u'Κ':'K', u'Λ':'L', u'Μ':'M', u'Ν':'N', u'Ξ':'3',
u'Ο':'O', u'Π':'P',
u'Ρ':'R', u'Σ':'S', u'Τ':'T', u'Υ':'Y', u'Φ':'F', u'Χ':'X',
u'Ψ':'PS', u'Ω':'W',
u'Ά':'A', u'Έ':'E', u'Ί':'I', u'Ό':'O', u'Ύ':'Y', u'Ή':'H',
u'Ώ':'W', u'Ϊ':'I', u'Ϋ':'Y',
# TURKISH
u'ş':'s', u'Ş':'S', u'ı':'i', u'İ':'I', u'ç':'c', u'Ç':'C',
u'ü':'u', u'Ü':'U',
u'ö':'o', u'Ö':'O', u'ğ':'g', u'Ğ':'G',
# RUSSIAN
u'а':'a', u'б':'b', u'в':'v', u'г':'g', u'д':'d', u'е':'e',
u'ё':'yo', u'ж':'zh',
u'з':'z', u'и':'i', u'й':'j', u'к':'k', u'л':'l', u'м':'m',
u'н':'n', u'о':'o',
u'п':'p', u'р':'r', u'с':'s', u'т':'t', u'у':'u', u'ф':'f',
u'х':'h', u'ц':'c',
u'ч':'ch', u'ш':'sh', u'щ':'sh', u'ъ':'', u'ы':'y', u'ь':'',
u'э':'e', u'ю':'yu', u'я':'ya',
u'А':'A', u'Б':'B', u'В':'V', u'Г':'G', u'Д':'D', u'Е':'E',
u'Ё':'Yo', u'Ж':'Zh',
u'З':'Z', u'И':'I', u'Й':'J', u'К':'K', u'Л':'L', u'М':'M',
u'Н':'N', u'О':'O',
u'П':'P', u'Р':'R', u'С':'S', u'Т':'T', u'У':'U', u'Ф':'F',
u'Х':'H', u'Ц':'C',
u'Ч':'Ch', u'Ш':'Sh', u'Щ':'Sh', u'Ъ':'', u'Ы':'Y', u'Ь':'',
u'Э':'E', u'Ю':'Yu', u'Я':'Ya',
# UKRAINIAN
u'Є':'Ye', u'І':'I', u'Ї':'Yi', u'Ґ':'G', u'є':'ye', u'і':'i',
u'ї':'yi', u'ґ':'g',
# CZECH
u'č':'c', u'ď':'d', u'ě':'e', u'ň':'n', u'ř':'r', u'š':'s',
u'ť':'t', u'ů':'u',
u'ž':'z', u'Č':'C', u'Ď':'D', u'Ě':'E', u'Ň':'N', u'Ř':'R',
u'Š':'S', u'Ť':'T', u'Ů':'U', u'Ž':'Z',
# POLISH
u'ą':'a', u'ć':'c', u'ę':'e', u'ł':'l', u'ń':'n', u'ó':'o',
u'ś':'s', u'ź':'z',
u'ż':'z', u'Ą':'A', u'Ć':'C', u'Ę':'e', u'Ł':'L', u'Ń':'N',
u'Ó':'o', u'Ś':'S',
u'Ź':'Z', u'Ż':'Z',
# LATVIAN
u'ā':'a', u'č':'c', u'ē':'e', u'ģ':'g', u'ī':'i', u'ķ':'k',
u'ļ':'l', u'ņ':'n',
u'š':'s', u'ū':'u', u'ž':'z', u'Ā':'A', u'Č':'C', u'Ē':'E',
u'Ģ':'G', u'Ī':'i',
u'Ķ':'k', u'Ļ':'L', u'Ņ':'N', u'Š':'S', u'Ū':'u', u'Ž':'Z'
}

def downcode(name):

 downcode(uŽabovitá zmiešaná kaša)
u'Zabovita zmiesana kasa'

for key, value in _MAP.iteritems():
name =ame.replace(key, value)
return name

  

Works for me:

rrr = downcode(uŽabovitá zmiešaná kaša)
print repr(rrr)
print rrr

prints out:

u'Zabovita zmiesana kasa'
Zabovita zmiesana kasa

I did have to add an encoding declaration as line 2 of the file:

#-*- coding: latin-1 -*-

and I had to convince my editor (Komodo) to save the file in utf-8.

DaveA

--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue