On Thu, Jan 09, 2014 at 05:11:06PM +1000, Nick Coghlan wrote: > On 9 January 2014 10:07, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
> > So, if what you want is to parse text and not get gibberish, you need to > > *tell* Python what the encoding is. That's a brute fact of the world of > > text in computing. > > Set the mode to "rb", process it as binary. Done. A nice point, but really, you lose a lot by doing so. Even simple things like the ability to write: if word[0] == 'X' instead you have to write things like: if word[0:1] = b'X' if chr(word[0]) == 'X' if word[0] == ord('X') if word[0] == 0x58 (pick the one that annoys you the least). And while bytes objects do have a surprising (to me) number of string-ish methods, like upper(), there are a few missing, like format() and isnumeric(). So it's not quite as straightforward as "done". If it were, we wouldn't need text strings :-) -- Steven _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com