Steven D'Aprano wrote: > On Fri, Apr 24, 2015 at 04:34:19PM -0700, Jim Mooney wrote: > >> I was looking things up and although there are aliases for utf_8 (utf8 >> and utf-8) I see no aliases for utf_8_sig, so I'm surprised the utf-8-sig >> I tried using, worked at all. Actually, I was trying to find the file >> where the aliases are so I could change it and have utf_8_sig called up >> when I used utf8, but it appears to be hard-coded. > > I believe that Python's codecs system automatically normalises the > encoding name by removing spaces, dashes and underscores, but I'm afraid > that either I don't understand how it works or it is buggy: > > py> 'Hello'.encode('utf___---___ -- ___8') # Works. > b'Hello' > > py> 'Hello'.encode('ut-f8') # Fails. > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > LookupError: unknown encoding: ut-f8
I don't think this is a bug. Normalization of the name converts to lowercase and collapses arbitrary sequences of punctuation into a single "_". The lookup that follows maps "utf8" to "utf_8" via a table: >>> [n for n, v in encodings.aliases.aliases.items() if v == "utf_8"] ['utf8_ucs2', 'utf8', 'u8', 'utf', 'utf8_ucs4'] Hm, who the heck uses "u8"? I'd rather go with >>> encodings.aliases.aliases["steven_s_preferred_encoding"] = "utf_8" >>> "Hello".encode("--- Steven's preferred encoding ---") b'Hello' ;) _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor