# Re: [Python-ideas] Proposal for default character representation

```> On 13 Oct 2016, at 09:43, Greg Ewing <greg.ew...@canterbury.ac.nz> wrote:
>
> Mikhail V wrote:
>> Did you see much code written with hex literals?
>
> From /usr/include/sys/fcntl.h:
> ```
```
Backing Greg up for a moment, hex literals are extremely common in any code
that needs to work with binary data, such as network programming or fine data
structure manipulation. For example, consider the frequent requirement to mask
out certain bits of a given integer (e.g., keep the low 24 bits of a 32 bit
integer). Here are a few ways to represent that:

integer & 0x00FFFFFF  # Hex
integer & 16777215  # Decimal
integer & 0o77777777  # Octal
integer & 0b111111111111111111111111  # Binary

Of those four, hexadecimal has the advantage of being both extremely concise
and clear. The octal representation is infuriating because one octal digit
refers to *three* bits, which means that there is a non-whole number of octal
digits in a byte (that is, one byte with all bits set is represented by 0o377).
This causes problems both with reading comprehension and with most other common
tasks. For example, moving from 0xFF to 0xFFFF (or 255 to 65535, also known as
setting the next most significant byte to all 1) is represented in octal by
moving from 0o377 to 0o177777. This is not an obvious transition, and I doubt
many programmers could do it from memory in any representation but hex or
binary.

Decimal is no clearer. Programmers know how to represent certain bit patterns
from memory in decimal simply because they see them a lot: usually they can do
the all 1s case, and often the 0 followed by all 1s case (255 and 128 for one
byte, 65535 and 32767 for two bytes, and then increasingly few programmers know
the next set). But trying to work out what mask to use for setting only bits 15
and 14 is tricky in decimal, while in hex it’s fairly easy (in hex it’s 0xC000,
in decimal it’s 49152).

Binary notation seems like the solution, but note the above case: the only way
to work out how many bits are being masked out is to count them, and there can
be quite a lot. IIRC there’s some new syntax coming for binary literals that
would let us represent them as 0b1111_1111_1111_1111, which would help the
readability case, but it’s still substantially less dense and loses clarity for
many kinds of unusual bit patterns. Additionally, as the number of bits
increases life gets really hard: masking out certain bits of a 64-bit number
requires a literal that’s at least 66 characters long, not including the
underscores that would add another 15 underscores for a literal that is 81
characters long (more than the PEP8 line width recommendation). That starts
getting unwieldy fast, while the hex representation is still down at 18
characters.

bits, and the next 4 bits are independent of the previous bits. That’s not true
of decimal or octal, and while it’s true of binary it costs a fourfold increase
in the length of the representation. It’s definitely not as intuitive to the
average human being, but that’s ok: it’s a specialised use case, and we aren’t
requiring that all human beings learn this skill.

This is a very long argument to suggest that your argument against hexadecimal
literals (namely, that they use 16 glyphs as opposed to the 10 glyphs used in
decimal) is an argument that is too simple to be correct. Different collections
of glyphs are clearer in different contexts. For example, decimal numerals can
be represented using 10 glyphs, while the english language requires 26 glyphs
plus punctuation. But I don’t think you’re seriously proposing we should swap
from writing English using the larger glyph set to writing it in decimal
representation of ASCII bytes.

Given this, I think the argument that says that the Unicode consortium said
“write the number in hex” is good enough for me.

Cory

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/```