Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-10 Thread eryk sun
On Fri, Aug 11, 2017 at 2:34 AM, Cameron Simpson wrote: > > In files however, the default encoding for text files is 'utf-8': Python > will read the file's bytes as UTF-8 data and will write Python string > characters in UTF-8 encoding when writing. The default encoding for

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-10 Thread Cameron Simpson
On 10Aug2017 20:40, boB Stepp wrote: (By the way, it is nearly 14 years later, and PHP still believes that the world is ASCII.) I thought you must surely be engaging in hyperbole, but at http://php.net/manual/en/xml.encoding.php I found: "The default source encoding

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-10 Thread boB Stepp
On Thu, Aug 10, 2017 at 8:40 PM, boB Stepp wrote: > On Thu, Aug 10, 2017 at 8:01 AM, Steven D'Aprano wrote: >> Python 3 makes Unicode about as easy as it can get. To include a unicode >> string in your source code, you just need to ensure your editor

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-10 Thread boB Stepp
On Thu, Aug 10, 2017 at 8:01 AM, Steven D'Aprano wrote: > > Another **Must Read** resource for unicode is: > > The Absolute Minimum Every Software Developer Absolutely Positively Must > Know About Unicode (No Excuses!) > >

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-10 Thread Steven D'Aprano
On Mon, Aug 07, 2017 at 10:04:21PM -0500, Zachary Ware wrote: > Next, take a dive into the wonderful* world of Unicode: > > https://nedbatchelder.com/text/unipain.html > https://www.youtube.com/watch?v=7m5JA3XaZ4k Another **Must Read** resource for unicode is: The Absolute Minimum Every

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-08 Thread Cameron Simpson
On 08Aug2017 22:30, boB Stepp wrote: On Mon, Aug 7, 2017 at 10:20 PM, Cameron Simpson wrote: On 07Aug2017 21:44, boB Stepp wrote: py3: s = 'Hello!' py3: len(s.encode("UTF-8")) 6 py3: len(s.encode("UTF-16")) 14 py3:

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-08 Thread boB Stepp
On Tue, Aug 8, 2017 at 10:17 PM, boB Stepp wrote: > On Mon, Aug 7, 2017 at 10:01 PM, Ben Finney > wrote: >> boB Stepp writes: >> >>> How is len() getting these values? >> > > It is translating the Unicode code points

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-08 Thread boB Stepp
On Tue, Aug 8, 2017 at 10:29 PM, Mats Wichmann wrote: > eh? the bytes are ff fe h 0 > 0xff is not literally four bytes, its the hex repr of an 8bit quantity with > all bits on ARG! (space inserted for visual clarity) truly is ff in hex. Again ARGH!!! All I can

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-08 Thread boB Stepp
On Mon, Aug 7, 2017 at 11:30 PM, eryk sun wrote: > On Tue, Aug 8, 2017 at 3:20 AM, Cameron Simpson wrote: >> >> As you note, the 16 and 32 forms are (6 + 1) times 2 or 4 respectively. This >> is because each encoding has a leading byte order marker to indicate

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-08 Thread boB Stepp
On Mon, Aug 7, 2017 at 10:20 PM, Cameron Simpson wrote: > On 07Aug2017 21:44, boB Stepp wrote: >> >> py3: s = 'Hello!' >> py3: len(s.encode("UTF-8")) >> 6 >> py3: len(s.encode("UTF-16")) >> 14 >> py3: len(s.encode("UTF-32")) >> 28 >> >> How is len()

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-08 Thread Mats Wichmann
eh? the bytes are ff fe h 0 0xff is not literally four bytes, its the hex repr of an 8bit quantity with all bits on On August 8, 2017 9:17:49 PM MDT, boB Stepp wrote: >On Mon, Aug 7, 2017 at 10:01 PM, Ben Finney > wrote: >> boB Stepp

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-08 Thread boB Stepp
On Mon, Aug 7, 2017 at 10:04 PM, Zachary Ware wrote: > Next, take a dive into the wonderful* world of Unicode: > > https://nedbatchelder.com/text/unipain.html > https://www.youtube.com/watch?v=7m5JA3XaZ4k > > Hope this helps, Thanks, Zach, this actually clarifies

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-08 Thread boB Stepp
On Mon, Aug 7, 2017 at 10:01 PM, Ben Finney wrote: > boB Stepp writes: > >> How is len() getting these values? > > By asking the objects themselves to report their length. You are > creating different objects with different content:: > >

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-07 Thread eryk sun
On Tue, Aug 8, 2017 at 3:20 AM, Cameron Simpson wrote: > > As you note, the 16 and 32 forms are (6 + 1) times 2 or 4 respectively. This > is because each encoding has a leading byte order marker to indicate the big > endianness or little endianness. For big endian data that is

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-07 Thread Cameron Simpson
On 07Aug2017 21:44, boB Stepp wrote: py3: s = 'Hello!' py3: len(s.encode("UTF-8")) 6 py3: len(s.encode("UTF-16")) 14 py3: len(s.encode("UTF-32")) 28 How is len() getting these values? And I am sure it will turn out not to be a coincidence that 2 * (6 + 1) = 14 and 4 *

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-07 Thread Zachary Ware
On Mon, Aug 7, 2017 at 9:44 PM, boB Stepp wrote: > py3: s = 'Hello!' > py3: len(s.encode("UTF-8")) > 6 > py3: len(s.encode("UTF-16")) > 14 > py3: len(s.encode("UTF-32")) > 28 > > How is len() getting these values? And I am sure it will turn out not > to be a coincidence

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

2017-08-07 Thread Ben Finney
boB Stepp writes: > How is len() getting these values? By asking the objects themselves to report their length. You are creating different objects with different content:: >>> s = 'Hello!' >>> s_utf8 = s.encode("UTF-8") >>> s == s_utf8 False >>>