Another (simple) unicode question

2009-10-29 Thread Rustom Mody
Construct http://construct.wikispaces.com/ is a kick-ass binary file structurer (written by a 21 year old!) I thought of trying to port it to python3 but it barfs on some unicode related stuff (after running 2to3) which I am unable to wrap my head around. Can anyone direct me to what I should

Re: Another (simple) unicode question

2009-10-29 Thread John Machin
On Oct 29, 10:02 pm, Rustom Mody rustompm...@gmail.com wrote: Constructhttp://construct.wikispaces.com/is a kick-ass binary file structurer (written by a 21 year old!) I thought of trying to port it to python3 but it barfs on some unicode related stuff (after running 2to3) which I am unable to

Re: Another (simple) unicode question

2009-10-29 Thread Carl Banks
On Oct 29, 4:02 am, Rustom Mody rustompm...@gmail.com wrote: Constructhttp://construct.wikispaces.com/is a kick-ass binary file structurer (written by a 21 year old!) I thought of trying to port it to python3 but it barfs on some unicode related stuff (after running 2to3) which I am unable to

Re: Another (simple) unicode question

2009-10-29 Thread Scott David Daniels
John Machin wrote: On Oct 29, 10:02 pm, Rustom Mody rustompm...@gmail.com wrote:... I thought of trying to port it to python3 but it barfs on some unicode related stuff (after running 2to3) which I am unable to wrap my head around. Can anyone direct me to what I should read to try to

Re: a simple unicode question

2009-10-28 Thread Gabriel Genellina
En Wed, 28 Oct 2009 02:28:01 -0300, Chris Jones cjns1...@gmail.com escribió: On Tue, Oct 27, 2009 at 06:21:11AM EDT, Lie Ryan wrote: Chris Jones wrote: Best part of Unicode is that there are multiple encodings, right? ;-) No, the best part about Unicode is there is no encoding! Unicode does

Re: a simple unicode question

2009-10-28 Thread Tim Arnold
Chris Jones cjns1...@gmail.com wrote in message news:mailman.2149.1256707687.2807.python-l...@python.org... On Tue, Oct 27, 2009 at 06:21:11AM EDT, Lie Ryan wrote: Chris Jones wrote: [..] Best part of Unicode is that there are multiple encodings, right? ;-) No, the best part about Unicode

Re: a simple unicode question

2009-10-27 Thread Chris Jones
On Tue, Oct 27, 2009 at 06:21:11AM EDT, Lie Ryan wrote: Chris Jones wrote: [..] Best part of Unicode is that there are multiple encodings, right? ;-) No, the best part about Unicode is there is no encoding! Unicode does not define any encoding; RFC 3629: ISO/IEC 10646 and Unicode define

Re: a simple unicode question

2009-10-27 Thread Lie Ryan
Chris Jones wrote: On Wed, Oct 21, 2009 at 12:35:11PM EDT, Nobody wrote: [..] Characters outside the 16-bit range aren't supported on all builds. They won't be supported on most Windows builds, as Windows uses 16-bit Unicode extensively: I knew nothing about UTF-16 friends before this

Re: a simple unicode question

2009-10-22 Thread Gabriel Genellina
En Wed, 21 Oct 2009 15:14:32 -0300, ru...@yahoo.com escribió: On Oct 21, 4:59 am, Bruno Desthuilliers bruno. 42.desthuilli...@websiteburo.invalid wrote: beSTEfar a écrit : (snip) When parsing strings, use Regular Expressions. And now you have _two_ problems g For some simple parsing

Re: a simple unicode question

2009-10-22 Thread Chris Jones
On Wed, Oct 21, 2009 at 12:35:11PM EDT, Nobody wrote: [..] Characters outside the 16-bit range aren't supported on all builds. They won't be supported on most Windows builds, as Windows uses 16-bit Unicode extensively: I knew nothing about UTF-16 friends before this thread. Best part of

Re: a simple unicode question

2009-10-22 Thread rurpy
On 10/22/2009 03:23 AM, Gabriel Genellina wrote: En Wed, 21 Oct 2009 15:14:32 -0300, ru...@yahoo.com escribió: On Oct 21, 4:59 am, Bruno Desthuilliers bruno. 42.desthuilli...@websiteburo.invalid wrote: beSTEfar a écrit : (snip) When parsing strings, use Regular Expressions. And now you

Re: a simple unicode question

2009-10-22 Thread Gabriel Genellina
En Thu, 22 Oct 2009 17:08:21 -0300, ru...@yahoo.com escribió: On 10/22/2009 03:23 AM, Gabriel Genellina wrote: En Wed, 21 Oct 2009 15:14:32 -0300, ru...@yahoo.com escribió: On Oct 21, 4:59 am, Bruno Desthuilliers bruno. 42.desthuilli...@websiteburo.invalid wrote: beSTEfar a écrit : (snip)

Re: a simple unicode question

2009-10-21 Thread Mark Tolonen
George Trojan george.tro...@noaa.gov wrote in message news:hbktk6$8b...@news.nems.noaa.gov... Thanks for all suggestions. It took me a while to find out how to configure my keyboard to be able to type the degree sign. I prefer to stick with pure ASCII if possible. Where are the literals (i.e.

Re: a simple unicode question

2009-10-21 Thread Scott David Daniels
George Trojan wrote: Scott David Daniels wrote: ... And if you are unsure of the name to use: import unicodedata unicodedata.name(u'\xb0') 'DEGREE SIGN' Thanks for all suggestions. It took me a while to find out how to configure my keyboard to be able to type the degree sign. I prefer

Re: a simple unicode question

2009-10-21 Thread Chris Jones
On Wed, Oct 21, 2009 at 12:20:35AM EDT, Nobody wrote: On Tue, 20 Oct 2009 17:56:21 +, George Trojan wrote: [..] Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? You can get them from the unicodedata module, e.g.: import unicodedata for i in xrange(0x1):

Re: a simple unicode question

2009-10-21 Thread Bruno Desthuilliers
beSTEfar a écrit : (snip) When parsing strings, use Regular Expressions. And now you have _two_ problems g For some simple parsing problems, Python's string methods are powerful enough to make REs overkill. And for any complex enough parsing (any recursive construct for example - think XML,

Re: a simple unicode question

2009-10-21 Thread Nobody
On Wed, 21 Oct 2009 05:16:56 -0400, Chris Jones wrote: Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? You can get them from the unicodedata module, e.g.: import unicodedata for i in xrange(0x1): n = unicodedata.name(unichr(i),None) if n is not

Re: a simple unicode question

2009-10-21 Thread rurpy
On Oct 21, 4:59 am, Bruno Desthuilliers bruno. 42.desthuilli...@websiteburo.invalid wrote: beSTEfar a écrit : (snip)   When parsing strings, use Regular Expressions. And now you have _two_ problems g For some simple parsing problems, Python's string methods are powerful enough to make REs

Re: a simple unicode question

2009-10-21 Thread Terry Reedy
Nobody wrote: Just curious, why did you choose to set the upper boundary at 0x? Characters outside the 16-bit range aren't supported on all builds. They won't be supported on most Windows builds, as Windows uses 16-bit Unicode extensively: Python 2.5.1 (r251:54863, Apr 18 2007,

Re: a simple unicode question

2009-10-20 Thread Scott David Daniels
Mark Tolonen wrote: Is there a better way of getting the degrees? It seems your string is UTF-8. \xc2\xb0 is UTF-8 for DEGREE SIGN. If you type non-ASCII characters in source code, make sure to declare the encoding the file is *actually* saved in: # coding: utf-8 s = '''48° 13' 16.80

Re: a simple unicode question

2009-10-20 Thread George Trojan
Thanks for all suggestions. It took me a while to find out how to configure my keyboard to be able to type the degree sign. I prefer to stick with pure ASCII if possible. Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? I found http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt

Re: a simple unicode question

2009-10-20 Thread Nobody
On Tue, 20 Oct 2009 17:56:21 +, George Trojan wrote: Thanks for all suggestions. It took me a while to find out how to configure my keyboard to be able to type the degree sign. I prefer to stick with pure ASCII if possible. Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? I

Re: a simple unicode question

2009-10-20 Thread Martin v. Löwis
Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? I found http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt Is that the place to look? Correct - you are supposed to fill in a Unicode character name into the \N escape. The specific list of names depends on the version of the UCD

a simple unicode question

2009-10-19 Thread George Trojan
A trivial one, this is the first time I have to deal with Unicode. I am trying to parse a string s='''48° 13' 16.80 N'''. I know the charset is iso-8859-1. To get the degrees I did encoding='iso-8859-1' q=s.decode(encoding) q.split() [u'48\xc2\xb0', u13', u'16.80', u'N'] r=q.split()[0]

Re: a simple unicode question

2009-10-19 Thread Diez B. Roggisch
George Trojan schrieb: A trivial one, this is the first time I have to deal with Unicode. I am trying to parse a string s='''48° 13' 16.80 N'''. I know the charset is iso-8859-1. To get the degrees I did encoding='iso-8859-1' q=s.decode(encoding) q.split() [u'48\xc2\xb0', u13', u'16.80',

Re: a simple unicode question

2009-10-19 Thread beSTEfar
On 19 Okt, 21:07, George Trojan george.tro...@noaa.gov wrote: A trivial one, this is the first time I have to deal with Unicode. I am trying to parse a string s='''48° 13' 16.80 N'''. I know the charset is iso-8859-1. To get the degrees I did   encoding='iso-8859-1'   q=s.decode(encoding)  

Re: a simple unicode question

2009-10-19 Thread Mark Tolonen
George Trojan george.tro...@noaa.gov wrote in message news:hbidd7$i9...@news.nems.noaa.gov... A trivial one, this is the first time I have to deal with Unicode. I am trying to parse a string s='''48° 13' 16.80 N'''. I know the charset is iso-8859-1. To get the degrees I did

Re: a simple unicode question

2009-10-19 Thread Mark Tolonen
George Trojan george.tro...@noaa.gov wrote in message news:hbidd7$i9...@news.nems.noaa.gov... A trivial one, this is the first time I have to deal with Unicode. I am trying to parse a string s='''48° 13' 16.80 N'''. I know the charset is iso-8859-1. To get the degrees I did

Re: (Simple?) Unicode Question

2009-08-30 Thread Nobody
On Sun, 30 Aug 2009 02:36:49 +, Steven D'Aprano wrote: So long as your terminal has a sensible encoding, and you have a good quality font, you should be able to print any string you can create. UTF-8 isn't a particularly sensible encoding for terminals. Did I mention UTF-8? Out of

Re: (Simple?) Unicode Question

2009-08-29 Thread Thorsten Kampe
* Rami Chowdhury (Thu, 27 Aug 2009 09:44:41 -0700) Further, does anything, except a printing device need to know the encoding of a piece of text? Python needs to know if you are processing the text. I may be wrong, but I believe that's part of the idea between separation of string and

Re: (Simple?) Unicode Question

2009-08-29 Thread Steven D'Aprano
On Sat, 29 Aug 2009 09:34:43 +0200, Thorsten Kampe wrote: * Rami Chowdhury (Thu, 27 Aug 2009 09:44:41 -0700) Further, does anything, except a printing device need to know the encoding of a piece of text? Python needs to know if you are processing the text. Python only needs to know when

Re: (Simple?) Unicode Question

2009-08-29 Thread Nobody
On Sat, 29 Aug 2009 08:26:54 +, Steven D'Aprano wrote: Python only needs to know when you convert the text to or from bytes. I can do this: s = hello t = world print(' '.join([s, t])) hello world and not need to care anything about encodings. So long as your terminal has a

Re: (Simple?) Unicode Question

2009-08-29 Thread Steven D'Aprano
On Sat, 29 Aug 2009 20:09:12 +0100, Nobody wrote: On Sat, 29 Aug 2009 08:26:54 +, Steven D'Aprano wrote: Python only needs to know when you convert the text to or from bytes. I can do this: s = hello t = world print(' '.join([s, t])) hello world and not need to care anything

(Simple?) Unicode Question

2009-08-27 Thread Shashank Singh
Hi All! I have a very simple (and probably stupid) question eluding me. When exactly is the char-set information needed? To make my question clear consider reading a file. While reading a file, all I get is basically an array of bytes. Now suppose a file has 10 bytes in it (all is data, no

Re: (Simple?) Unicode Question

2009-08-27 Thread Rami Chowdhury
Further, does anything, except a printing device need to know the encoding of a piece of text? I may be wrong, but I believe that's part of the idea between separation of string and bytes types in Python 3.x. I believe, if you are using Python 3.x, you don't need the character encoding

Re: (Simple?) Unicode Question

2009-08-27 Thread Albert Hopkins
On Thu, 2009-08-27 at 22:09 +0530, Shashank Singh wrote: Hi All! I have a very simple (and probably stupid) question eluding me. When exactly is the char-set information needed? To make my question clear consider reading a file. While reading a file, all I get is basically an array of