Adam Funk wrote: > On 2015-12-03, Adam Funk wrote: > >> I'm having trouble with some input files that are almost all proper >> UTF-8 but with a couple of troublesome characters mixed in, which I'd >> like to ignore instead of throwing ValueError. I've found the >> openhook for the encoding >> >> for line in fileinput.input(options.files, >> openhook=fileinput.hook_encoded("utf-8")): >> do_stuff(line) >> >> which the documentation describes as "a hook which opens each file >> with codecs.open(), using the given encoding to read the file", but >> I'd like codecs.open() to also have the errors='ignore' or >> errors='replace' effect. Is it possible to do this? > > I forgot to mention: this is for Python 2.7.3 & 2.7.10 (on different > machines).
Have a look at the source of fileinput.hook_encoded: def hook_encoded(encoding): import io def openhook(filename, mode): mode = mode.replace('U', '').replace('b', '') or 'r' return io.open(filename, mode, encoding=encoding, newline='') return openhook You can use it as a template to write your own factory function: def my_hook_encoded(encoding, errors=None): import io def openhook(filename, mode): mode = mode.replace('U', '').replace('b', '') or 'r' return io.open( filename, mode, encoding=encoding, newline='', errors=errors) return openhook for line in fileinput.input( options.files, openhook=my_hook_encoded("utf-8", errors="ignore")): do_stuff(line) Another option is to create the function on the fly: for line in fileinput.input( options.files, openhook=functools.partial( io.open, encoding="utf-8", errors="replace")): do_stuff(line) (codecs.open() instead of io.open() should also work) -- https://mail.python.org/mailman/listinfo/python-list