Robert Sjoblom wrote: > Okay, so here's a fun one. Since I'm on a japanese locale my native > encoding is cp932. I was thinking of writing a parser for a bunch of > text files, but I stumbled on even printing the contents due to ... > something. I don't know what encoding the text file uses, which isn't > helping my case either (I have asked, but I've yet to get an answer). > > Okay, so: > > address = "C:/Path/to/file/file.ext" > with open(address, encoding="cp1252") as alpha:
Superfluous readlines() alert: > text = alpha.readlines() > for line in text: > print(line) You can iterate over the file directly with #python3 for line in alpha: print(line, end="") or even sys.stdout.writelines(alpha) > It starts to print until it hits the wonderful character é or '\xe9', > where it gives me this happy traceback: > Traceback (most recent call last): > File "C:\Users\Azaz\Desktop\CK2 Map Painter\Parser\test parser.py", > line 8, in <module> > print(line) > UnicodeEncodeError: 'cp932' codec can't encode character '\xe9' in > position 13: illegal multibyte sequence > > I can open the document and view it in UltraEdit -- and it displays > correct characters there -- but UE can't give me what encoding it > uses. Any chance of solving this without having to switch from my > japanese locale? Also, the cp1252 is just an educated guess, but it > doesn't really matter because it always comes back to the cp932 error. # python3 output_encoding = sys.stdout.encoding or "UTF-8" error_handling = "replace" Writer = codecs.getwriter(output_encoding) outstream = Writer(sys.stdout.buffer, error_handling) with open(filename, "r", encoding="cp1252") as instream: for line in instream: print(line, end="", file=outstream) error_handling = "replace" prints "?" for characters that cannot be displayed in the target encoding. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor