Re: byte count unicode string

Gabriel Genellina Wed, 20 Sep 2006 17:00:54 -0700

At Wednesday 20/9/2006 19:53, willie wrote:

What is the proper way to describe "ustr" below?


 >>> ustr = buf.decode('UTF-8')
 >>> type(ustr)
<type 'unicode'>


Is it a "unicode object that contains a UTF-8 encoded
string object?"

ustr is an unicode object. Period. An unicode object containscharacters (not bytes).buf, apparently, is a string - a string of bytes. Those bytesapparently represent some unicode characters encoded using the UTF-8encoding. So, you can decode them -using the decode() method- to getthe unicode object.

Very roughly, the difference is like that of an integer and itsrepresentations:

w = 1
x = 0x0001
y = 001
z = struct.unpack('>h','\x00\x01')
All three objects are the *same* integer, 1.

There is no way of knowing *how* the integer was spelled, i.e., fromwhich representation it comes from - like the unicode object, it hasno "encoding" by itself.You can go back and forth between an integer number and its decimalrepresentation - like astring.decode() and ustring.encode()




Gabriel Genellina

Softlab SRL


        
        
                
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).

¡Probalo ya!http://www.yahoo.com.ar/respuestas

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: byte count unicode string

Reply via email to