On Sat, 12 Dec 2015 21:35:36 +0100 Peter Otten <__pete...@web.de> wrote: > def read_file(filename): > for encoding in ["utf-8", "iso-8859-1"]: > try: > with open(filename, encoding=encoding) as f: > return f.read() > except UnicodeDecodeError: > pass > raise AssertionError("unreachable")
I replaced this in my test and it works. However, I still have a problem with my actual code. The point of this code was that I expect all the files that I am reading to be either ASCII, UTF-8 or LATIN-1 and I want to normalize my input. My problem may actually be elsewhere. My application is a web page of my wife's recipes. She has hundreds of files with a recipe in each one. Often she simply typed them in but sometimes she cuts and pastes from another source and gets non-ASCII characters. So far they seem to fit in the three categories above. I added test prints to sys.stderr so that I can see what is happening. In one particular case I have this "73 61 75 74 c3 a9" in the file. When I open the file with "open(filename, "r", encoding="utf-8").read()" I get what appears to be a latin-1 string. I print it to stderr and view it in the web log. The above string prints as "saut\xe9". The last is four actual characters in the file. When I try to print it to the web page it fails because the \xe9 character is not valid ASCII. However, my default encoding is utf-8. Other web pages on the same server display fine. I have the following in the Apache config by the way. SetEnv PYTHONIOENCODING utf8 So, my file is utf-8, I am reading it as utf-8, my Apache server output is set to utf-8. How is ASCII sneaking in? -- D'Arcy J.M. Cain Vybe Networks Inc. http://www.VybeNetworks.com/ IM:da...@vex.net VoIP: sip:da...@vybenetworks.com -- https://mail.python.org/mailman/listinfo/python-list