Re: About size of Unicode string
Frank Abel Cancio Bello wrote: Can I get how many bytes have a string object independently of its encoding? strings hold characters, not bytes. an encoding is used to convert a stream of characters to a stream of bytes. if you need to know the number of bytes needed to hold an encoded string, you need to know the encoding. (and in some cases, including UTF-8, you need to *do* the encoding before you can tell how many bytes you get) Is the len function the right way of get it? len() on the encoded string, yes. Laci look the following code: import urllib2 request = urllib2.Request(url= 'http://localhost:6000') data = 'data to send\n'.encode('utf_8') request.add_data(data) request.add_header('content-length', str(len(data))) request.add_header('content-encoding', 'UTF-8') file = urllib2.urlopen(request) Is always true that the size of the entity-body is len(data) independently of the encoding of data? your data variable contains bytes, not characters, so the answer is yes. on the other hand, that add_header line isn't really needed -- if you leave it out, urllib2 will add the content-length header all by itself. /F -- http://mail.python.org/mailman/listinfo/python-list
About size of Unicode string
Hi all! I need know the size of string object independently of its encoding. For example: len('123') == len('123'.encode('utf_8')) while the size of '123' object is different of the size of '123'.encode('utf_8') More: I need send in HTTP request a string. Then I need know the length of the string to set the header content-length independently of its encoding. Any idea? Thanks in advance Frank -- http://mail.python.org/mailman/listinfo/python-list
Re: About size of Unicode string
Frank Abel Cancio Bello wrote: Hi all! I need know the size of string object independently of its encoding. For example: len('123') == len('123'.encode('utf_8')) while the size of '123' object is different of the size of '123'.encode('utf_8') More: I need send in HTTP request a string. Then I need know the length of the string to set the header content-length independently of its encoding. Any idea? This is from the RFC: The Content-Length entity-header field indicates the size of the entity-body, in decimal number of OCTETs, sent to the recipient or, in the case of the HEAD method, the size of the entity-body that would have been sent had the request been a GET. Content-Length= Content-Length : 1*DIGIT An example is Content-Length: 3495 Applications SHOULD use this field to indicate the transfer-length of the message-body, unless this is prohibited by the rules in section 4.4 http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4. Any Content-Length greater than or equal to zero is a valid value. Section 4.4 describes how to determine the length of a message-body if a Content-Length is not given. Looks to me that the Content-Length header has nothing to do with the encoding. It is a very low levet stuff. The content length is given in OCTETs and it represents the size of the body. Clearly, it has nothing to do with MIME/encoding etc. It is about the number of bits transferred in the body. Try to write your unicode strings into a StringIO and take its length Laci -- http://mail.python.org/mailman/listinfo/python-list
RE: About size of Unicode string
Well I will repeat the question: Can I get how many bytes have a string object independently of its encoding? Is the len function the right way of get it? Laci look the following code: import urllib2 request = urllib2.Request(url= 'http://localhost:6000') data = 'data to send\n'.encode('utf_8') request.add_data(data) request.add_header('content-length', str(len(data))) request.add_header('content-encoding', 'UTF-8') file = urllib2.urlopen(request) Is always true that the size of the entity-body is len(data) independently of the encoding of data? -Original Message- From: Laszlo Zsolt Nagy [mailto:[EMAIL PROTECTED] Sent: Monday, June 06, 2005 1:43 PM To: Frank Abel Cancio Bello; python-list@python.org Subject: Re: About size of Unicode string Frank Abel Cancio Bello wrote: Hi all! I need know the size of string object independently of its encoding. For example: len('123') == len('123'.encode('utf_8')) while the size of '123' object is different of the size of '123'.encode('utf_8') More: I need send in HTTP request a string. Then I need know the length of the string to set the header content-length independently of its encoding. Any idea? This is from the RFC: The Content-Length entity-header field indicates the size of the entity-body, in decimal number of OCTETs, sent to the recipient or, in the case of the HEAD method, the size of the entity-body that would have been sent had the request been a GET. Content-Length= Content-Length : 1*DIGIT An example is Content-Length: 3495 Applications SHOULD use this field to indicate the transfer-length of the message-body, unless this is prohibited by the rules in section 4.4 http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4. Any Content-Length greater than or equal to zero is a valid value. Section 4.4 describes how to determine the length of a message-body if a Content-Length is not given. Looks to me that the Content-Length header has nothing to do with the encoding. It is a very low levet stuff. The content length is given in OCTETs and it represents the size of the body. Clearly, it has nothing to do with MIME/encoding etc. It is about the number of bits transferred in the body. Try to write your unicode strings into a StringIO and take its length Laci -- http://mail.python.org/mailman/listinfo/python-list
RE: About size of Unicode string
Frank Abel Cancio Bello wrote: Can I get how many bytes have a string object independently of its encoding? Is the len function the right way of get it? No. len(unicode_string) returns the number of characters in the unicode_string. Number of bytes depends on how the unicode character are represented. Different encodings will use different numbers of bytes. u = uG\N{Latin small letter A with ring above} u u'G\xe5' len(u) 2 u.encode(utf-8) 'G\xc3\xa5' len(u.encode(utf-8)) 3 u.encode(latin1) 'G\xe5' len(u.encode(latin1)) 2 u.encode(utf16) '\xfe\xff\x00G\x00\xe5' len(u.encode(utf16)) 6 Laci look the following code: import urllib2 request = urllib2.Request(url= 'http://localhost:6000') data = 'data to send\n'.encode('utf_8') request.add_data(data) request.add_header('content-length', str(len(data))) request.add_header('content-encoding', 'UTF-8') file = urllib2.urlopen(request) Is always true that the size of the entity-body is len(data) independently of the encoding of data? For this case it is true because the logical length of 'data' (which is a byte string) is equal to the number of bytes in the string, and the utf-8 encoding of a byte string with character values in the range 0-127, inclusive, is unchanged from the original string. In general, as if 'data' is a unicode strings, no. len() returns the logical length of 'data'. That number does not need to be the number of bytes used to represent 'data'. To get the bytes you must encode the object. Andrew [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: About size of Unicode string
Frank Abel Cancio Bello wrote: request.add_header('content-encoding', 'UTF-8') The Content-Encoding header is for things like gzip, not for specifying the text encoding. Use the charset parameter to the Content-Type header for that, as in Content-Type: text/plain; charset=utf-8. -- http://mail.python.org/mailman/listinfo/python-list
RE: About size of Unicode string
Thanks to all. Andrew's answer was an excellent explanation. Thanks Leif for you suggestion. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Leif K-Brooks Sent: Monday, June 06, 2005 4:29 PM To: python-list@python.org Subject: Re: About size of Unicode string Frank Abel Cancio Bello wrote: request.add_header('content-encoding', 'UTF-8') The Content-Encoding header is for things like gzip, not for specifying the text encoding. Use the charset parameter to the Content-Type header for that, as in Content-Type: text/plain; charset=utf-8. -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list