On Fri, Aug 11, 2017 at 2:34 AM, Cameron Simpson wrote:
>
> In files however, the default encoding for text files is 'utf-8': Python
> will read the file's bytes as UTF-8 data and will write Python string
> characters in UTF-8 encoding when writing.
The default encoding for
On 10Aug2017 20:40, boB Stepp wrote:
(By the way, it is nearly 14 years later, and PHP still believes that
the world is ASCII.)
I thought you must surely be engaging in hyperbole, but at
http://php.net/manual/en/xml.encoding.php I found:
"The default source encoding
On Thu, Aug 10, 2017 at 8:40 PM, boB Stepp wrote:
> On Thu, Aug 10, 2017 at 8:01 AM, Steven D'Aprano wrote:
>> Python 3 makes Unicode about as easy as it can get. To include a unicode
>> string in your source code, you just need to ensure your editor
On Thu, Aug 10, 2017 at 8:01 AM, Steven D'Aprano wrote:
>
> Another **Must Read** resource for unicode is:
>
> The Absolute Minimum Every Software Developer Absolutely Positively Must
> Know About Unicode (No Excuses!)
>
>
On Mon, Aug 07, 2017 at 10:04:21PM -0500, Zachary Ware wrote:
> Next, take a dive into the wonderful* world of Unicode:
>
> https://nedbatchelder.com/text/unipain.html
> https://www.youtube.com/watch?v=7m5JA3XaZ4k
Another **Must Read** resource for unicode is:
The Absolute Minimum Every
On 08Aug2017 22:30, boB Stepp wrote:
On Mon, Aug 7, 2017 at 10:20 PM, Cameron Simpson wrote:
On 07Aug2017 21:44, boB Stepp wrote:
py3: s = 'Hello!'
py3: len(s.encode("UTF-8"))
6
py3: len(s.encode("UTF-16"))
14
py3:
On Tue, Aug 8, 2017 at 10:17 PM, boB Stepp wrote:
> On Mon, Aug 7, 2017 at 10:01 PM, Ben Finney
> wrote:
>> boB Stepp writes:
>>
>>> How is len() getting these values?
>>
>
> It is translating the Unicode code points
On Tue, Aug 8, 2017 at 10:29 PM, Mats Wichmann wrote:
> eh? the bytes are ff fe h 0
> 0xff is not literally four bytes, its the hex repr of an 8bit quantity with
> all bits on
ARG! (space inserted for visual clarity) truly is ff in
hex. Again ARGH!!!
All I can
On Mon, Aug 7, 2017 at 11:30 PM, eryk sun wrote:
> On Tue, Aug 8, 2017 at 3:20 AM, Cameron Simpson wrote:
>>
>> As you note, the 16 and 32 forms are (6 + 1) times 2 or 4 respectively. This
>> is because each encoding has a leading byte order marker to indicate
On Mon, Aug 7, 2017 at 10:20 PM, Cameron Simpson wrote:
> On 07Aug2017 21:44, boB Stepp wrote:
>>
>> py3: s = 'Hello!'
>> py3: len(s.encode("UTF-8"))
>> 6
>> py3: len(s.encode("UTF-16"))
>> 14
>> py3: len(s.encode("UTF-32"))
>> 28
>>
>> How is len()
eh? the bytes are ff fe h 0
0xff is not literally four bytes, its the hex repr of an 8bit quantity with all
bits on
On August 8, 2017 9:17:49 PM MDT, boB Stepp wrote:
>On Mon, Aug 7, 2017 at 10:01 PM, Ben Finney
> wrote:
>> boB Stepp
On Mon, Aug 7, 2017 at 10:04 PM, Zachary Ware
wrote:
> Next, take a dive into the wonderful* world of Unicode:
>
> https://nedbatchelder.com/text/unipain.html
> https://www.youtube.com/watch?v=7m5JA3XaZ4k
>
> Hope this helps,
Thanks, Zach, this actually clarifies
On Mon, Aug 7, 2017 at 10:01 PM, Ben Finney wrote:
> boB Stepp writes:
>
>> How is len() getting these values?
>
> By asking the objects themselves to report their length. You are
> creating different objects with different content::
>
>
On Tue, Aug 8, 2017 at 3:20 AM, Cameron Simpson wrote:
>
> As you note, the 16 and 32 forms are (6 + 1) times 2 or 4 respectively. This
> is because each encoding has a leading byte order marker to indicate the big
> endianness or little endianness. For big endian data that is
On 07Aug2017 21:44, boB Stepp wrote:
py3: s = 'Hello!'
py3: len(s.encode("UTF-8"))
6
py3: len(s.encode("UTF-16"))
14
py3: len(s.encode("UTF-32"))
28
How is len() getting these values? And I am sure it will turn out not
to be a coincidence that 2 * (6 + 1) = 14 and 4 *
On Mon, Aug 7, 2017 at 9:44 PM, boB Stepp wrote:
> py3: s = 'Hello!'
> py3: len(s.encode("UTF-8"))
> 6
> py3: len(s.encode("UTF-16"))
> 14
> py3: len(s.encode("UTF-32"))
> 28
>
> How is len() getting these values? And I am sure it will turn out not
> to be a coincidence
boB Stepp writes:
> How is len() getting these values?
By asking the objects themselves to report their length. You are
creating different objects with different content::
>>> s = 'Hello!'
>>> s_utf8 = s.encode("UTF-8")
>>> s == s_utf8
False
>>>
17 matches
Mail list logo