Re: Question on Strings

2009-02-06 Thread John Machin
On Feb 6, 9:24 pm, Chris Rebert c...@rebertia.com wrote:
 On Fri, Feb 6, 2009 at 1:49 AM, Kalyankumar Ramaseshan

 soft_sm...@yahoo.com wrote:

  Hi,

  Excuse me if this is a repeat question!

  I just wanted to know how are strings represented in python?

  I need to know in terms of:

  a) Strings are stored as UTF-16 (LE/BE) or UTF-32 characters?

Neither.


 IIRC, Depends on what the build settings were when CPython was
 compiled. UTF-16 is the default.

Unicode strings are held as arrays of 16-bit numbers or 32-bit numbers
[of which only 21 are used]. If you must use an acronym, use UCS-2 or
UCS-4.

The UTF-n siblings are *external* representations.
2.x: a_unicode_object.decode('UTF-16') - an_str_object
3.x: an_str_object.decode('UTF-16') - a_bytes_object

By the way, has anyone come up with a name for the shifting effect
observed above on str, and also with repr, range, and the iter*
family? If not, I suggest that the language's association with the
best of English humour be widened so that it be dubbed the Mad
Hatter's Tea Party effect.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Question on Strings

2009-02-06 Thread Hendrik van Rooyen
John Machin s...@le..n.net wrote:

By the way, has anyone come up with a name for the shifting effect
observed above on str, and also with repr, range, and the iter*
family? If not, I suggest that the language's association with the
best of English humour be widened so that it be dubbed the Mad
Hatter's Tea Party effect.

The MHTP effect.

Sounds educated, almost like
a network protocol.

+1

- Hendrik



--
http://mail.python.org/mailman/listinfo/python-list


Re: Question on Strings

2009-02-06 Thread MRAB

John Machin wrote:
 On Feb 6, 9:24 pm, Chris Rebert c...@rebertia.com wrote:
 On Fri, Feb 6, 2009 at 1:49 AM, Kalyankumar Ramaseshan

 soft_sm...@yahoo.com wrote:

 Hi,
 Excuse me if this is a repeat question!
 I just wanted to know how are strings represented in python?
 I need to know in terms of:
 a) Strings are stored as UTF-16 (LE/BE) or UTF-32 characters?

 Neither.

 IIRC, Depends on what the build settings were when CPython was
 compiled. UTF-16 is the default.

 Unicode strings are held as arrays of 16-bit numbers or 32-bit numbers
 [of which only 21 are used]. If you must use an acronym, use UCS-2 or
 UCS-4.

 The UTF-n siblings are *external* representations.
 2.x: a_unicode_object.decode('UTF-16') - an_str_object
 3.x: an_str_object.decode('UTF-16') - a_bytes_object

 By the way, has anyone come up with a name for the shifting effect
 observed above on str, and also with repr, range, and the iter*
 family? If not, I suggest that the language's association with the
 best of English humour be widened so that it be dubbed the Mad
 Hatter's Tea Party effect.

Bitwise shifts and rotates are collectively referred to as skew
operations. I therefore suggest the term skewing. :-)
--
http://mail.python.org/mailman/listinfo/python-list


Re: Question on Strings

2009-02-06 Thread Chris Rebert
On Fri, Feb 6, 2009 at 1:49 AM, Kalyankumar Ramaseshan
soft_sm...@yahoo.com wrote:

 Hi,

 Excuse me if this is a repeat question!

 I just wanted to know how are strings represented in python?

 I need to know in terms of:

 a) Strings are stored as UTF-16 (LE/BE) or UTF-32 characters?

IIRC, Depends on what the build settings were when CPython was
compiled. UTF-16 is the default.

 b) They are converted to utf-8 format when it is needed for e.g. when storing 
 the string to disk or sending it through a socket (tcp/ip)?

No. They are implicitly converted to ASCII in such cases. To properly
handle non-ASCII Unicode characters, you need to encode/decode the
strings to/from bytes manually by specifying the encoding.

Cheers,
Chris

-- 
Follow the path of the Iguana...
http://rebertia.com
--
http://mail.python.org/mailman/listinfo/python-list


Re: Question on Strings

2009-02-06 Thread Tino Wildenhain

Hi,

Kalyankumar Ramaseshan wrote:

Hi,

Excuse me if this is a repeat question!

I just wanted to know how are strings represented in python?


It depents on if you mean python2.x or python3.x - the model
changed.

Python 2.x knows str and unicode  - the former a sequence
of single byte characters and unicode depending on configure
options either 16 or 32 bit per character.

str in python3.x replaces unicode and what formerly used
to be like str is now bytes (iirc).


I need to know in terms of:

a) Strings are stored as UTF-16 (LE/BE) or UTF-32 characters?


It uses an internal fixed length encoding for unicode, not UTF

b) They are converted to utf-8 format when it is needed for e.g. when storing the string to disk or sending it through a socket (tcp/ip)? 


Nope. You need to do this explicitely. Default encoding for python2.x
implicit conversion is ascii.

In python2.x you would use unicodestr.encode('utf-8')
and simplestr.decode('utf-8') to convert an utf-8 encoded
string back to internal unicode.

There are many encodings available to select from.


Any help in this regard is appreciated.


Please see also pythons documentation which is very
good and just try it out in the interactive interpreter

Regards
Tino


smime.p7s
Description: S/MIME Cryptographic Signature
--
http://mail.python.org/mailman/listinfo/python-list


Re: Question on Strings

2009-02-06 Thread Terry Reedy

John Machin wrote:


The UTF-n siblings are *external* representations.
2.x: a_unicode_object.decode('UTF-16') - an_str_object
3.x: an_str_object.decode('UTF-16') - a_bytes_object


That should be .encode() to bytes, which is the coded form.
.decode is bytes = str/unicode

--
http://mail.python.org/mailman/listinfo/python-list


Re: Question on Strings

2009-02-06 Thread John Machin
On Feb 7, 5:23 am, Terry Reedy tjre...@udel.edu wrote:
 John Machin wrote:
  The UTF-n siblings are *external* representations.
  2.x: a_unicode_object.decode('UTF-16') - an_str_object
  3.x: an_str_object.decode('UTF-16') - a_bytes_object

 That should be .encode() to bytes, which is the coded form.
 .decode is bytes = str/unicode

True. I guess that makes me the Dohmouse :-)
--
http://mail.python.org/mailman/listinfo/python-list