Re: unicode question

2015-01-28 Thread Albert-Jan Roskam
On Wed, Jan 28, 2015 8:21 AM CET Terry Reedy wrote: On 1/27/2015 12:17 AM, Rehab Habeeb wrote: Hi there python staff does python support arabic language for texts ? and what to do if it support it? i wrote hello in Arabic using codeskulptor and the powershell

Re: unicode question

2015-01-28 Thread Michael Torrie
On 01/28/2015 03:17 PM, Albert-Jan Roskam wrote: I do not know how complete the support is, but this is copied from 3.4.2, which uses tcl/tk 8.6. t = الحركات for c in t: print(c) # Prints rightmost char above first ا ل ح ر ك ا ت Wow, I never knew this was so clever. Is that with or

Re: unicode question

2015-01-27 Thread random832
On Tue, Jan 27, 2015, at 12:25, Mark Lawrence wrote: People might find this http://bugs.python.org/issue1602 and hence this https://github.com/Drekin/win-unicode-console useful. The latter is available on pypi. However, Arabic is one of those scripts that runs up against the real

Re: unicode question

2015-01-27 Thread Terry Reedy
On 1/27/2015 12:17 AM, Rehab Habeeb wrote: Hi there python staff does python support arabic language for texts ? and what to do if it support it? i wrote hello in Arabic using codeskulptor and the powershell just for testing and the same error appeared( a sytanx error in unicode)!! I do not

Re: unicode question

2015-01-27 Thread random832
On Tue, Jan 27, 2015, at 00:17, Rehab Habeeb wrote: Hi there python staff does python support arabic language for texts ? and what to do if it support it? i wrote hello in Arabic using codeskulptor and the powershell just for testing and the same error appeared( a sytanx error in unicode)!!

Re: unicode question

2015-01-27 Thread Mark Lawrence
On 27/01/2015 16:13, random...@fastmail.us wrote: On Tue, Jan 27, 2015, at 00:17, Rehab Habeeb wrote: Hi there python staff does python support arabic language for texts ? and what to do if it support it? i wrote hello in Arabic using codeskulptor and the powershell just for testing and the

Re: unicode question

2015-01-26 Thread Chris Angelico
On Tue, Jan 27, 2015 at 4:17 PM, Rehab Habeeb moonlight06082...@gmail.com wrote: Hi there python staff does python support arabic language for texts ? and what to do if it support it? i wrote hello in Arabic using codeskulptor and the powershell just for testing and the same error appeared( a

unicode question

2015-01-26 Thread Rehab Habeeb
Hi there python staff does python support arabic language for texts ? and what to do if it support it? i wrote hello in Arabic using codeskulptor and the powershell just for testing and the same error appeared( a sytanx error in unicode)!! -- https://mail.python.org/mailman/listinfo/python-list

Beginner python 3 unicode question

2013-11-16 Thread Laszlo Nagy
Example interactive: $ python3 Python 3.3.1 (default, Sep 25 2013, 19:29:01) [GCC 4.7.3] on linux Type help, copyright, credits or license for more information. import uuid import base64 base64.b32encode(uuid.uuid1().bytes)[:-6].lower() b'zsz653co6ii6hgjejqhw42ncgy' But when I put the same

Re: Beginner python 3 unicode question

2013-11-16 Thread Luuk
On 16-11-2013 20:12, Laszlo Nagy wrote: Example interactive: $ python3 Python 3.3.1 (default, Sep 25 2013, 19:29:01) [GCC 4.7.3] on linux Type help, copyright, credits or license for more information. import uuid import base64 base64.b32encode(uuid.uuid1().bytes)[:-6].lower()

Re: Beginner python 3 unicode question

2013-11-16 Thread Laszlo Nagy
the error is in one of the lines you did not copy here because this works without problems: BEGIN-of script #!/usr/bin/python Most probably, your /usr/bin/python program is python version 2, and not python version 3 Try the same program with /usr/bin/python3. And also try the

Re: Beginner python 3 unicode question

2013-11-16 Thread Laszlo Nagy
Why it is behaving differently on the command line? What should I do to fix this? I was experimenting with this a bit more and found some more confusing things. Can somebody please enlight me? Here is a test function: def password_hash(self,password): public =

Re: Beginner python 3 unicode question

2013-11-16 Thread Luuk
On 16-11-2013 21:57, Laszlo Nagy wrote: the error is in one of the lines you did not copy here because this works without problems: BEGIN-of script #!/usr/bin/python Most probably, your /usr/bin/python program is python version 2, and not python version 3 Try the same program with

Re: Beginner python 3 unicode question [SOLVED]

2013-11-16 Thread Laszlo Nagy
So is the default utf-8 or not? Should the documentation be updated? Or do we have a bug in the interactive shell? It was my fault, sorry. The other program used os.system at some places, and it accidentally used python2 instead of python 3. :-( -- This message has been scanned for

Re: Beginner python 3 unicode question

2013-11-16 Thread Chris Angelico
On Sun, Nov 17, 2013 at 8:19 AM, Laszlo Nagy gand...@shopzeus.com wrote: print(digest,digest,type(digest)) This function was called inside a script, and gave me this: ('digest', '\xa0\x98\x8b\xff\x04\xf9V;\xbd\x1eIHzh\x10-\xc5!\x14\x1b', type 'str') This looks very much like you're

Re: Beginner python 3 unicode question [SOLVED]

2013-11-16 Thread Chris Angelico
On Sun, Nov 17, 2013 at 8:44 AM, Laszlo Nagy gand...@shopzeus.com wrote: So is the default utf-8 or not? Should the documentation be updated? Or do we have a bug in the interactive shell? It was my fault, sorry. The other program used os.system at some places, and it accidentally used

tkinter unicode question

2010-07-27 Thread jyoung79
Just curious if anyone could shed some light on this? I'm using tkinter, but I can't seem to get certain unicode characters to show in the label for Python 3. In my test, the label and button will contain the same 3 characters - a Greek Alpha, a Greek Omega with a circumflex and soft

Re: tkinter unicode question

2010-07-27 Thread Ned Deily
In article 20100727204532.r7gmz.27213.r...@cdptpa-web20-z02, jyoun...@kc.rr.com wrote: Just curious if anyone could shed some light on this? I'm using tkinter, but I can't seem to get certain unicode characters to show in the label for Python 3. In my test, the label and button will

Another (simple) unicode question

2009-10-29 Thread Rustom Mody
Construct http://construct.wikispaces.com/ is a kick-ass binary file structurer (written by a 21 year old!) I thought of trying to port it to python3 but it barfs on some unicode related stuff (after running 2to3) which I am unable to wrap my head around. Can anyone direct me to what I should

Re: Another (simple) unicode question

2009-10-29 Thread John Machin
On Oct 29, 10:02 pm, Rustom Mody rustompm...@gmail.com wrote: Constructhttp://construct.wikispaces.com/is a kick-ass binary file structurer (written by a 21 year old!) I thought of trying to port it to python3 but it barfs on some unicode related stuff (after running 2to3) which I am unable to

Re: Another (simple) unicode question

2009-10-29 Thread Carl Banks
On Oct 29, 4:02 am, Rustom Mody rustompm...@gmail.com wrote: Constructhttp://construct.wikispaces.com/is a kick-ass binary file structurer (written by a 21 year old!) I thought of trying to port it to python3 but it barfs on some unicode related stuff (after running 2to3) which I am unable to

Re: Another (simple) unicode question

2009-10-29 Thread Scott David Daniels
John Machin wrote: On Oct 29, 10:02 pm, Rustom Mody rustompm...@gmail.com wrote:... I thought of trying to port it to python3 but it barfs on some unicode related stuff (after running 2to3) which I am unable to wrap my head around. Can anyone direct me to what I should read to try to

Re: a simple unicode question

2009-10-28 Thread Gabriel Genellina
En Wed, 28 Oct 2009 02:28:01 -0300, Chris Jones cjns1...@gmail.com escribió: On Tue, Oct 27, 2009 at 06:21:11AM EDT, Lie Ryan wrote: Chris Jones wrote: Best part of Unicode is that there are multiple encodings, right? ;-) No, the best part about Unicode is there is no encoding! Unicode does

Re: a simple unicode question

2009-10-28 Thread Tim Arnold
Chris Jones cjns1...@gmail.com wrote in message news:mailman.2149.1256707687.2807.python-l...@python.org... On Tue, Oct 27, 2009 at 06:21:11AM EDT, Lie Ryan wrote: Chris Jones wrote: [..] Best part of Unicode is that there are multiple encodings, right? ;-) No, the best part about Unicode

Re: a simple unicode question

2009-10-27 Thread Chris Jones
On Tue, Oct 27, 2009 at 06:21:11AM EDT, Lie Ryan wrote: Chris Jones wrote: [..] Best part of Unicode is that there are multiple encodings, right? ;-) No, the best part about Unicode is there is no encoding! Unicode does not define any encoding; RFC 3629: ISO/IEC 10646 and Unicode define

Re: a simple unicode question

2009-10-27 Thread Lie Ryan
Chris Jones wrote: On Wed, Oct 21, 2009 at 12:35:11PM EDT, Nobody wrote: [..] Characters outside the 16-bit range aren't supported on all builds. They won't be supported on most Windows builds, as Windows uses 16-bit Unicode extensively: I knew nothing about UTF-16 friends before this

Re: a simple unicode question

2009-10-22 Thread Gabriel Genellina
En Wed, 21 Oct 2009 15:14:32 -0300, ru...@yahoo.com escribió: On Oct 21, 4:59 am, Bruno Desthuilliers bruno. 42.desthuilli...@websiteburo.invalid wrote: beSTEfar a écrit : (snip) When parsing strings, use Regular Expressions. And now you have _two_ problems g For some simple parsing

Re: a simple unicode question

2009-10-22 Thread Chris Jones
On Wed, Oct 21, 2009 at 12:35:11PM EDT, Nobody wrote: [..] Characters outside the 16-bit range aren't supported on all builds. They won't be supported on most Windows builds, as Windows uses 16-bit Unicode extensively: I knew nothing about UTF-16 friends before this thread. Best part of

Re: a simple unicode question

2009-10-22 Thread rurpy
On 10/22/2009 03:23 AM, Gabriel Genellina wrote: En Wed, 21 Oct 2009 15:14:32 -0300, ru...@yahoo.com escribió: On Oct 21, 4:59 am, Bruno Desthuilliers bruno. 42.desthuilli...@websiteburo.invalid wrote: beSTEfar a écrit : (snip) When parsing strings, use Regular Expressions. And now you

Re: a simple unicode question

2009-10-22 Thread Gabriel Genellina
En Thu, 22 Oct 2009 17:08:21 -0300, ru...@yahoo.com escribió: On 10/22/2009 03:23 AM, Gabriel Genellina wrote: En Wed, 21 Oct 2009 15:14:32 -0300, ru...@yahoo.com escribió: On Oct 21, 4:59 am, Bruno Desthuilliers bruno. 42.desthuilli...@websiteburo.invalid wrote: beSTEfar a écrit : (snip)

Re: a simple unicode question

2009-10-21 Thread Mark Tolonen
George Trojan george.tro...@noaa.gov wrote in message news:hbktk6$8b...@news.nems.noaa.gov... Thanks for all suggestions. It took me a while to find out how to configure my keyboard to be able to type the degree sign. I prefer to stick with pure ASCII if possible. Where are the literals (i.e.

Re: a simple unicode question

2009-10-21 Thread Scott David Daniels
George Trojan wrote: Scott David Daniels wrote: ... And if you are unsure of the name to use: import unicodedata unicodedata.name(u'\xb0') 'DEGREE SIGN' Thanks for all suggestions. It took me a while to find out how to configure my keyboard to be able to type the degree sign. I prefer

Re: a simple unicode question

2009-10-21 Thread Chris Jones
On Wed, Oct 21, 2009 at 12:20:35AM EDT, Nobody wrote: On Tue, 20 Oct 2009 17:56:21 +, George Trojan wrote: [..] Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? You can get them from the unicodedata module, e.g.: import unicodedata for i in xrange(0x1):

Re: a simple unicode question

2009-10-21 Thread Bruno Desthuilliers
beSTEfar a écrit : (snip) When parsing strings, use Regular Expressions. And now you have _two_ problems g For some simple parsing problems, Python's string methods are powerful enough to make REs overkill. And for any complex enough parsing (any recursive construct for example - think XML,

Re: a simple unicode question

2009-10-21 Thread Nobody
On Wed, 21 Oct 2009 05:16:56 -0400, Chris Jones wrote: Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? You can get them from the unicodedata module, e.g.: import unicodedata for i in xrange(0x1): n = unicodedata.name(unichr(i),None) if n is not

Re: a simple unicode question

2009-10-21 Thread rurpy
On Oct 21, 4:59 am, Bruno Desthuilliers bruno. 42.desthuilli...@websiteburo.invalid wrote: beSTEfar a écrit : (snip)   When parsing strings, use Regular Expressions. And now you have _two_ problems g For some simple parsing problems, Python's string methods are powerful enough to make REs

Re: a simple unicode question

2009-10-21 Thread Terry Reedy
Nobody wrote: Just curious, why did you choose to set the upper boundary at 0x? Characters outside the 16-bit range aren't supported on all builds. They won't be supported on most Windows builds, as Windows uses 16-bit Unicode extensively: Python 2.5.1 (r251:54863, Apr 18 2007,

Re: a simple unicode question

2009-10-20 Thread Scott David Daniels
Mark Tolonen wrote: Is there a better way of getting the degrees? It seems your string is UTF-8. \xc2\xb0 is UTF-8 for DEGREE SIGN. If you type non-ASCII characters in source code, make sure to declare the encoding the file is *actually* saved in: # coding: utf-8 s = '''48° 13' 16.80

Re: a simple unicode question

2009-10-20 Thread George Trojan
Thanks for all suggestions. It took me a while to find out how to configure my keyboard to be able to type the degree sign. I prefer to stick with pure ASCII if possible. Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? I found http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt

Re: a simple unicode question

2009-10-20 Thread Nobody
On Tue, 20 Oct 2009 17:56:21 +, George Trojan wrote: Thanks for all suggestions. It took me a while to find out how to configure my keyboard to be able to type the degree sign. I prefer to stick with pure ASCII if possible. Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? I

Re: a simple unicode question

2009-10-20 Thread Martin v. Löwis
Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? I found http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt Is that the place to look? Correct - you are supposed to fill in a Unicode character name into the \N escape. The specific list of names depends on the version of the UCD

a simple unicode question

2009-10-19 Thread George Trojan
A trivial one, this is the first time I have to deal with Unicode. I am trying to parse a string s='''48° 13' 16.80 N'''. I know the charset is iso-8859-1. To get the degrees I did encoding='iso-8859-1' q=s.decode(encoding) q.split() [u'48\xc2\xb0', u13', u'16.80', u'N'] r=q.split()[0]

Re: a simple unicode question

2009-10-19 Thread Diez B. Roggisch
George Trojan schrieb: A trivial one, this is the first time I have to deal with Unicode. I am trying to parse a string s='''48° 13' 16.80 N'''. I know the charset is iso-8859-1. To get the degrees I did encoding='iso-8859-1' q=s.decode(encoding) q.split() [u'48\xc2\xb0', u13', u'16.80',

Re: a simple unicode question

2009-10-19 Thread beSTEfar
On 19 Okt, 21:07, George Trojan george.tro...@noaa.gov wrote: A trivial one, this is the first time I have to deal with Unicode. I am trying to parse a string s='''48° 13' 16.80 N'''. I know the charset is iso-8859-1. To get the degrees I did   encoding='iso-8859-1'   q=s.decode(encoding)  

Re: a simple unicode question

2009-10-19 Thread Mark Tolonen
George Trojan george.tro...@noaa.gov wrote in message news:hbidd7$i9...@news.nems.noaa.gov... A trivial one, this is the first time I have to deal with Unicode. I am trying to parse a string s='''48° 13' 16.80 N'''. I know the charset is iso-8859-1. To get the degrees I did

Re: a simple unicode question

2009-10-19 Thread Mark Tolonen
George Trojan george.tro...@noaa.gov wrote in message news:hbidd7$i9...@news.nems.noaa.gov... A trivial one, this is the first time I have to deal with Unicode. I am trying to parse a string s='''48° 13' 16.80 N'''. I know the charset is iso-8859-1. To get the degrees I did

Re: python 3.1 unicode question

2009-09-16 Thread Duncan Booth
jeffunit j...@jeffunit.com wrote: That looks like a surrogate escape (See PEP 383) http://www.python.org/dev/peps/pep-0383/. It indicates the wrong encoding was used to decode the filename. That seems likely. How do I set the encoding to something correct to decode the filename?

Re: python 3.1 unicode question

2009-09-15 Thread Mark Tolonen
jeffunit j...@jeffunit.com wrote in message news:20090915144123964.ljka6...@cdptpa-omta01.mail.rr.com... I wrote a program that diffs files and prints out matching file names. I will be executing the output with sh, to delete select files. Most of the files names are plain ascii, but about 10%

Re: python 3.1 unicode question

2009-09-15 Thread jeffunit
At 09:25 PM 9/15/2009, Mark Tolonen wrote: jeffunit j...@jeffunit.com wrote in message news:20090915144123964.ljka6...@cdptpa-omta01.mail.rr.com... I wrote a program that diffs files and prints out matching file names. I will be executing the output with sh, to delete select files. Most of the

Re: python 3.1 unicode question

2009-09-15 Thread Chris Rebert
On Tue, Sep 15, 2009 at 9:48 PM, jeffunit j...@jeffunit.com wrote: At 09:25 PM 9/15/2009, Mark Tolonen wrote: jeffunit j...@jeffunit.com wrote in message news:20090915144123964.ljka6...@cdptpa-omta01.mail.rr.com... I wrote a program that diffs files and prints out matching file names. I

Re: (Simple?) Unicode Question

2009-08-30 Thread Nobody
On Sun, 30 Aug 2009 02:36:49 +, Steven D'Aprano wrote: So long as your terminal has a sensible encoding, and you have a good quality font, you should be able to print any string you can create. UTF-8 isn't a particularly sensible encoding for terminals. Did I mention UTF-8? Out of

Re: (Simple?) Unicode Question

2009-08-29 Thread Thorsten Kampe
* Rami Chowdhury (Thu, 27 Aug 2009 09:44:41 -0700) Further, does anything, except a printing device need to know the encoding of a piece of text? Python needs to know if you are processing the text. I may be wrong, but I believe that's part of the idea between separation of string and

Re: (Simple?) Unicode Question

2009-08-29 Thread Steven D'Aprano
On Sat, 29 Aug 2009 09:34:43 +0200, Thorsten Kampe wrote: * Rami Chowdhury (Thu, 27 Aug 2009 09:44:41 -0700) Further, does anything, except a printing device need to know the encoding of a piece of text? Python needs to know if you are processing the text. Python only needs to know when

Re: (Simple?) Unicode Question

2009-08-29 Thread Nobody
On Sat, 29 Aug 2009 08:26:54 +, Steven D'Aprano wrote: Python only needs to know when you convert the text to or from bytes. I can do this: s = hello t = world print(' '.join([s, t])) hello world and not need to care anything about encodings. So long as your terminal has a

Re: (Simple?) Unicode Question

2009-08-29 Thread Steven D'Aprano
On Sat, 29 Aug 2009 20:09:12 +0100, Nobody wrote: On Sat, 29 Aug 2009 08:26:54 +, Steven D'Aprano wrote: Python only needs to know when you convert the text to or from bytes. I can do this: s = hello t = world print(' '.join([s, t])) hello world and not need to care anything

(Simple?) Unicode Question

2009-08-27 Thread Shashank Singh
Hi All! I have a very simple (and probably stupid) question eluding me. When exactly is the char-set information needed? To make my question clear consider reading a file. While reading a file, all I get is basically an array of bytes. Now suppose a file has 10 bytes in it (all is data, no

Re: (Simple?) Unicode Question

2009-08-27 Thread Rami Chowdhury
Further, does anything, except a printing device need to know the encoding of a piece of text? I may be wrong, but I believe that's part of the idea between separation of string and bytes types in Python 3.x. I believe, if you are using Python 3.x, you don't need the character encoding

Re: (Simple?) Unicode Question

2009-08-27 Thread Albert Hopkins
On Thu, 2009-08-27 at 22:09 +0530, Shashank Singh wrote: Hi All! I have a very simple (and probably stupid) question eluding me. When exactly is the char-set information needed? To make my question clear consider reading a file. While reading a file, all I get is basically an array of

Unicode question

2006-07-28 Thread Ben Edwards (lists)
I am using python 2.4 on Ubuntu dapper, I am working through Dive into Python. There are a couple of inconsictencies. Firstly sys.setdefaultencoding('iso−8859−1') does not work, I have to do sys.setdefaultencoding = 'iso−8859−1' secondly the following does not give a 'UnicodeError: ASCII

Re: Unicode question

2006-07-28 Thread Max Erickson
Ben Edwards (lists) [EMAIL PROTECTED] wrote: I am using python 2.4 on Ubuntu dapper, I am working through Dive into Python. ... Any insight? Ben Did you follow all the instructions, or did you try to call sys.setdefaultencoding interactively? See:

Re: Unicode question

2006-07-28 Thread Steve M
Ben Edwards (lists) wrote: I am using python 2.4 on Ubuntu dapper, I am working through Dive into Python. There are a couple of inconsictencies. Firstly sys.setdefaultencoding('iso-8859-1') does not work, I have to do sys.setdefaultencoding = 'iso-8859-1' When you run a Python script, the

Re: Unicode question

2006-07-28 Thread Martin v. Löwis
Ben Edwards (lists) wrote: Firstly sys.setdefaultencoding('iso−8859−1') does not work, I have to do sys.setdefaultencoding = 'iso−8859−1' That works, but has no effect. You bind the variable sys.setdefaultencoding to some value, but that value is never used for anything (do

[OT] Re: a unicode question?

2006-04-11 Thread Peter Otten
John Machin wrote: ... and yes Peter, info travels faster also from China that it does from Armenia :-()) Q: Can info travel faster from Armenia than from China? Radio Yerevan: In principle, yes. Just make sure that it doesn't go the other way round the globe or meets some friends on the

Re: a unicode question?

2006-04-10 Thread Serge Orlov
[EMAIL PROTECTED] wrote: Mr. John Machin This question come form the flow codes. I use the PyXml to build a DOM tree. from xml.dom.ext.reader import HtmlLib doc = HtmlLib.FromHtmlUrl('http://stock.business.sohu.com/q/nbcg.php?code=600028') title_elem =

Re: a unicode question?

2006-04-10 Thread John Machin
E, it get's worse: not only is the title written in Chinese, it is encoded as gb2312 -- here is the repr() of the first few chunks: html\nhead\ntitle\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) : \xc4\xd a\xb2\xbf\xc8\xcb\xd4\xb1\xb3\xd6\xb9\xc9 - \xcb\xd1\xba\xfc\xb9\xc9\xc6\xb1/ti

a unicode question?

2006-04-09 Thread zdwang
Hello, There is a unicode string, I want to change it to ansi string. but it raise an exception. Could you help me? ## I want to change s1 to s2. s1 = u'\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) ' s2 = '\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) ' --

Re: a unicode question?

2006-04-09 Thread John Machin
What do you mean by ansi string? Here is a superficially not-unreasonable answer to your more specific question: # s1 = u'\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) ' # s2 = '\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) ' # s3 = s1.encode('latin1') # s2 == s3 # True But what are you really trying

Re: a unicode question?

2006-04-09 Thread zdwang
Mr. John Machin, Thank you very much! -- http://mail.python.org/mailman/listinfo/python-list

Re: a unicode question?

2006-04-09 Thread zdwang
Mr. John Machin This question come form the flow codes. I use the PyXml to build a DOM tree. from xml.dom.ext.reader import HtmlLib doc = HtmlLib.FromHtmlUrl('http://stock.business.sohu.com/q/nbcg.php?code=600028') title_elem = doc.documentElement.getElementsByTagName(TITLE)[0] title_string =

Unicode question : turn José into uJosé

2006-04-05 Thread Ian Sparks
This is probably stupid and/or misguided but supposing I'm passed a byte-string value that I want to be unicode, this is what I do. I'm sure I'm missing something very important. Short version : s = José #Start with non-unicode string unicoded = eval(u'%s' % José) Long version : s = José

Re: Unicode question : turn José into uJosé

2006-04-05 Thread aurora
First of all, if you run this on the console, find out your console's encoding. In my case it is English Windows XP. It uses 'cp437'. C:\chcp Active code page: 437 Then s = José u = uJos\u00e9 # same thing in unicode escape s.decode('cp437') == u # use encoding that match your

Re: Unicode question : turn José into uJosé

2006-04-05 Thread ianaré
maybe a bit off topic, but how does one find the console's encoding from within python? -- http://mail.python.org/mailman/listinfo/python-list

Re: Unicode question : turn José into uJosé

2006-04-05 Thread John Machin
The most important thing that you are missing is that you need to know the encoding used for the 8-bit-character string. Let's guess that it's Latin1. Then all you have to do is use the unicode() builtin function, or the string decode method. # s = 'Jos\xe9' # s # 'Jos\xe9' # u = unicode(s,

Re: Unicode question : turn José into uJosé

2006-04-05 Thread Kent Johnson
ianaré wrote: maybe a bit off topic, but how does one find the console's encoding from within python? In [1]: import sys In [3]: sys.stdout.encoding Out[3]: 'cp437' In [4]: sys.stdin.encoding Out[4]: 'cp437' Kent -- http://mail.python.org/mailman/listinfo/python-list

Re: Unicode question : turn José into uJosé

2006-04-05 Thread Ben Finney
Ian Sparks [EMAIL PROTECTED] writes: This is probably stupid and/or misguided but supposing I'm passed a byte-string value that I want to be unicode, this is what I do. I'm sure I'm missing something very important. Perhaps you need to read one of the good Python Unicode tutorials, such as:

Re: unicode question

2006-03-01 Thread Walter Dörwald
Edward Loper wrote: Walter Dörwald wrote: Edward Loper wrote: [...] Surely there's a better way than converting back and forth 3 times? Is there a reason that the 'backslashreplace' error mode can't be used with codecs.decode? 'abc \xff\xe8 def'.decode('ascii', 'backslashreplace')

Re: unicode question

2006-02-27 Thread Walter Dörwald
Edward Loper wrote: [...] Surely there's a better way than converting back and forth 3 times? Is there a reason that the 'backslashreplace' error mode can't be used with codecs.decode? 'abc \xff\xe8 def'.decode('ascii', 'backslashreplace') Traceback (most recent call last): File

Re: unicode question

2006-02-27 Thread Edward Loper
Walter Dörwald wrote: Edward Loper wrote: [...] Surely there's a better way than converting back and forth 3 times? Is there a reason that the 'backslashreplace' error mode can't be used with codecs.decode? 'abc \xff\xe8 def'.decode('ascii', 'backslashreplace') Traceback (most recent

Re: unicode question

2006-02-25 Thread Tim Roberts
Edward Loper [EMAIL PROTECTED] wrote: I would like to convert an 8-bit string (i.e., a str) into unicode, treating chars \x00-\x7f as ascii, and converting any chars \x80-xff into a backslashed escape sequences. I.e., I want something like this: decode_with_backslashreplace('abc \xff\xe8

Re: unicode question

2006-02-25 Thread Kent Johnson
Edward Loper wrote: I would like to convert an 8-bit string (i.e., a str) into unicode, treating chars \x00-\x7f as ascii, and converting any chars \x80-xff into a backslashed escape sequences. I.e., I want something like this: decode_with_backslashreplace('abc \xff\xe8 def') u'abc

unicode question

2006-02-24 Thread Edward Loper
I would like to convert an 8-bit string (i.e., a str) into unicode, treating chars \x00-\x7f as ascii, and converting any chars \x80-xff into a backslashed escape sequences. I.e., I want something like this: decode_with_backslashreplace('abc \xff\xe8 def') u'abc \\xff\\xe8 def' The best I

Re: Unicode Question

2006-01-09 Thread Erik Max Francis
David Pratt wrote: This is not working for me. Can someone explain why. Many thanks. Because '\xbe' isn't UTF-8 for the character you want, '\xc2\xbe' is, as you just showed yourself in the code snippet. -- Erik Max Francis [EMAIL PROTECTED] http://www.alcyone.com/max/ San Jose, CA, USA

Unicode Question

2006-01-09 Thread David Pratt
Hi. I am working through some tutorials on unicode and am hoping that someone can help explain this for me. I am on mac platform using python 2.4.1 at the moment. I am experimenting with unicode with the 3/4 symbol. I want to prepare strings for db storage that come from normal Windows

Re: Unicode Question

2006-01-09 Thread Max Erickson
The encoding argument to unicode() is used to specify the encoding of the string that you want to translate into unicode. The interpreter stores unicode as unicode, it isn't encoded... unicode('\xbe','cp1252') u'\xbe' unicode('\xbe','cp1252').encode('utf-8') '\xc2\xbe' max --

Re: Unicode Question

2006-01-09 Thread David Pratt
Hi Martin. Many thanks for your reply. What I am reall after, the following accomplishes. If you are looking for at the same time, perhaps this is also interesting: py unicode('\xbe', 'windows-1252').encode('utf-8') '\xc2\xbe' Your answer really helped quite a bit to clarify this for

Re: Unicode Question

2006-01-09 Thread David Pratt
Hi Erik. Thank you for your reply. The advice I has helped clarify this for me. Regards, David Erik Max Francis wrote: David Pratt wrote: This is not working for me. Can someone explain why. Many thanks. Because '\xbe' isn't UTF-8 for the character you want, '\xc2\xbe' is, as you

Re: Unicode Question

2006-01-09 Thread David Pratt
Hi Max. Many thanks for helping to realize where I was missing the point and making this clearer. Regards, David Max Erickson wrote: The encoding argument to unicode() is used to specify the encoding of the string that you want to translate into unicode. The interpreter stores unicode as

Once again a unicode question

2005-03-26 Thread Nicolas Evrard
Hello, I'm puzzled by this test I made while trying to transform a page in html to plain text. Because I cannot send unicode to feed, nor str so how can I do this ? [EMAIL PROTECTED]:~$ python2.4 .Python 2.4.1c2 (#2, Mar 19 2005, 01:04:19) .[GCC 3.3.5 (Debian 1:3.3.5-12)] on linux2 .Type help,

Re: Once again a unicode question

2005-03-26 Thread Serge Orlov
Nicolas Evrard wrote: Hello, I'm puzzled by this test I made while trying to transform a page in html to plain text. Because I cannot send unicode to feed, nor str so how can I do this ? Seems like the parser is in the broken state after the first exception. Feed only binary strings to it.

Re: Once again a unicode question

2005-03-26 Thread Nicolas Evrard
* Serge Orlov [23:45 26/03/05 CET]: Nicolas Evrard wrote: Hello, I'm puzzled by this test I made while trying to transform a page in html to plain text. Because I cannot send unicode to feed, nor str so how can I do this ? Seems like the parser is in the broken state after the first exception.

Re: unicode question

2004-11-29 Thread Bengt Richter
On Tue, 23 Nov 2004 20:37:04 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= [EMAIL PROTECTED] wrote: Steve Holden wrote: Am I the only person who found it scary that Bengt could apparently casually drop on a polynomial the would decode to Löwis? Well, don't give me too much credit, though