On 2017-11-01, Ned Batchelder <n...@nedbatchelder.com> wrote: > On 11/1/17 4:17 PM, MRAB wrote: >> On 2017-11-01 19:26, Ned Batchelder wrote: >>> From David Beazley >>> (https://twitter.com/dabeaz/status/925787482515533830): >>> >>> >>> a = 'n' >>> >>> b = 'ñ' >>> >>> sys.getsizeof(a) >>> 50 >>> >>> sys.getsizeof(b) >>> 74 >>> >>> float(b) >>> Traceback (most recent call last): >>> File "<stdin>", line 1, in <module> >>> ValueError: could not convert string to float: 'ñ' >>> >>> sys.getsizeof(b) >>> 77 >>> >>> Huh? >>> >> It's all explained in PEP 393. >> >> It's creating an additional representation (UTF-8 + zero-byte >> terminator) of the value and is caching that, so there'll then be the >> bytes for 'ñ' and the bytes for the UTF-8 (0xC3 0xB1 0x00). >> >> When the string is ASCII, the bytes of the UTF-8 representation is >> identical to those or the original string, so it can share them. > > That explains why b is larger than a to begin with
No, that size difference is due to the additional bytes required for the internal representation of the string. > but it doesn't explain why float(b) is changing the size of b. The additional UTF-8 representation isn't being created and cached until the float() call is made. -- Grant Edwards grant.b.edwards Yow! ONE LIFE TO LIVE for at ALL MY CHILDREN in ANOTHER gmail.com WORLD all THE DAYS OF OUR LIVES. -- https://mail.python.org/mailman/listinfo/python-list