This is a repeat of a question that came up on the "Decimal(unicode)" thread a little while ago. I think it needs an answer, so I'm reposting it in its own thread. I couldn't find any other previous discussion of this; apologies if I'm rehashing old issues.
Currently, int() and float() accept bytes instances. For example: >>> int(bytes([49, 50, 51])) 123 [40381 refs] >>> int(b'123') 123 [40381 refs] Philosophically, this seems wrong: it's not clear why bytes([49, 50, 51]) should represent an integer, or even which integer it should represent; if it's intended that the bytes sequence be thought of as an ascii string then really it should be explicitly decoded as such first: >>> int(b'123'.decode('ascii')) 123 On the other hand, there's at least some sense in which bytes already acts as a sort of poor-man's string: witness bytes.lower and friends. Maybe practicality beats purity here? What do people think about changing the int() and float() constructors so that they don't accept bytes? I experimented with removing int(bytes) and int(bytearray) support in longobject.c's long_new and in PyNumber_Long in abstract.c, to see how much breakage occurred. The results: 11 tests failed: test_email test_httplib test_io test_mimetools test_pickle test_pickletools test_random test_smtplib test_sqlite test_tarfile test_uu (random.py needed some patching to get the test-suite to run in the first place.) None of the breakage looks particularly serious or difficult to fix. I haven't tried removing float(bytes) support yet. See also http://bugs.python.org/issue2483 Mark
_______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com