Re: [Python-ideas] Proposal for default character representation

Cory Benfield Thu, 13 Oct 2016 03:05:56 -0700

> On 13 Oct 2016, at 09:43, Greg Ewing <greg.ew...@canterbury.ac.nz> wrote:
> 
> Mikhail V wrote:
>> Did you see much code written with hex literals?
> 
> From /usr/include/sys/fcntl.h:
>


Backing Greg up for a moment, hex literals are extremely common in any code 
that needs to work with binary data, such as network programming or fine data 
structure manipulation. For example, consider the frequent requirement to mask 
out certain bits of a given integer (e.g., keep the low 24 bits of a 32 bit 
integer). Here are a few ways to represent that:

integer & 0x00FFFFFF  # Hex
integer & 16777215  # Decimal
integer & 0o77777777  # Octal
integer & 0b111111111111111111111111  # Binary

Of those four, hexadecimal has the advantage of being both extremely concise 
and clear. The octal representation is infuriating because one octal digit 
refers to *three* bits, which means that there is a non-whole number of octal 
digits in a byte (that is, one byte with all bits set is represented by 0o377). 
This causes problems both with reading comprehension and with most other common 
tasks. For example, moving from 0xFF to 0xFFFF (or 255 to 65535, also known as 
setting the next most significant byte to all 1) is represented in octal by 
moving from 0o377 to 0o177777. This is not an obvious transition, and I doubt 
many programmers could do it from memory in any representation but hex or 
binary.

Decimal is no clearer. Programmers know how to represent certain bit patterns 
from memory in decimal simply because they see them a lot: usually they can do 
the all 1s case, and often the 0 followed by all 1s case (255 and 128 for one 
byte, 65535 and 32767 for two bytes, and then increasingly few programmers know 
the next set). But trying to work out what mask to use for setting only bits 15 
and 14 is tricky in decimal, while in hex it’s fairly easy (in hex it’s 0xC000, 
in decimal it’s 49152).

Binary notation seems like the solution, but note the above case: the only way 
to work out how many bits are being masked out is to count them, and there can 
be quite a lot. IIRC there’s some new syntax coming for binary literals that 
would let us represent them as 0b1111_1111_1111_1111, which would help the 
readability case, but it’s still substantially less dense and loses clarity for 
many kinds of unusual bit patterns. Additionally, as the number of bits 
increases life gets really hard: masking out certain bits of a 64-bit number 
requires a literal that’s at least 66 characters long, not including the 
underscores that would add another 15 underscores for a literal that is 81 
characters long (more than the PEP8 line width recommendation). That starts 
getting unwieldy fast, while the hex representation is still down at 18 
characters.

Hexadecimal has the clear advantage that each character wholly represents 4 
bits, and the next 4 bits are independent of the previous bits. That’s not true 
of decimal or octal, and while it’s true of binary it costs a fourfold increase 
in the length of the representation. It’s definitely not as intuitive to the 
average human being, but that’s ok: it’s a specialised use case, and we aren’t 
requiring that all human beings learn this skill.

This is a very long argument to suggest that your argument against hexadecimal 
literals (namely, that they use 16 glyphs as opposed to the 10 glyphs used in 
decimal) is an argument that is too simple to be correct. Different collections 
of glyphs are clearer in different contexts. For example, decimal numerals can 
be represented using 10 glyphs, while the english language requires 26 glyphs 
plus punctuation. But I don’t think you’re seriously proposing we should swap 
from writing English using the larger glyph set to writing it in decimal 
representation of ASCII bytes.

Given this, I think the argument that says that the Unicode consortium said 
“write the number in hex” is good enough for me.

Cory

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

Reply via email to