On Mon, 9 Aug 2010 07:23:56 pm Dave Angel wrote: > Big difference between 2.x and 3.x. In 3.x, strings are Unicode, and > may be stored either in 16bit or 32bit form (Windows usually compiled > using the former, and Linux the latter).
That's an internal storage that you (generic you) the Python programmer doesn't see, except perhaps indirectly via memory consumption. Do you know how many bits are used to store floats? If you try: >>> sys.getsizeof(1.1) 16 in Python 2.6 or better, it tells you that a float *object* takes 16 bytes, but it doesn't tell you anything about the underlying C-level floating point data. And a float that prints like: 1.0 takes up exactly the same storage as one that prints like: 1.234567890123456789 We can do a bit better with unicode strings: >>> sys.getsizeof(u'a') - sys.getsizeof(u'') 2 but frankly, who cares? It doesn't *mean* anything. Whether a character takes up two bytes, or twenty-two bytes, is irrelevant to how it prints. > Presumably in 3.x, urandom returns a byte string (see the b'xxxx' > form), which is 8 bits each, same as 2.x strings. So you'd expect > only two hex digits for each character. It looks like you've missed the point that bytes don't always display as two hex digits. Using Python 3.1, we can see some bytes display in hex-escape form, e.g.: >>> bytes([0, 1, 20, 30, 200]) b'\x00\x01\x14\x1e\xc8' some will display in character-escape form: >>> bytes([9, 10, 13]) b'\t\n\r' and some will display as unescaped ASCII characters: >>> bytes([40, 41, 80, 90, 110]) b'()PZn' So you can't make any definitive statement that the output of urandom will be displayed in hex form. Because the output is random, you might, by some incredible fluke, get: >>> os.urandom(6) b'hello ' >>> os.urandom(6) b'world!' I wouldn't like to bet on it though. By my calculation, the odds of that exact output is 1 in 79228162514264337593543950336. The odds of getting nothing but hex-escaped characters is a bit better. By my estimate, the odds of getting 12 hex-escaped characters in a row is about 1 in 330. For six in a row, it's about 1 in 18 or so. By the way, an interesting aside... bytes aren't always 8 bits. Of course, on just about all machines that have Python on them, they will be, but there are still machines and devices such as signal processors where bytes are something other than 8 bits. Historically, common values included 5, 6, 7, 9, or 16 bits, and the C and C++ standards still define a constant CHAR_BIT to specify the number of bits in a byte. -- Steven D'Aprano _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
