I am needing to access the text in hundreds of Microsoft .doc files on
an Ubuntu OS. I looked at win32 , but only saw support for windows. I am
going through all of these files to create a fairly simple text
delimited file for a spreadsheet.
A) Batch convert to text files so I can access them
B) import some module that allows me to decode this format
C) Open Office allows batch conversion to .odc ,but still don't know how
to access
D) Buy a 24 pack, some Twinkies, and go watch David Hasselhoff reruns
Opening .txt documents works fine.
Currently get:
inFile = open("myTestFile.doc", "r")
testRead = inFile.read()
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
test = inFile.read()
File "/usr/lib/python3.0/io.py", line 1728, in read
decoder.decode(self.buffer.read(), final=True))
File "/usr/lib/python3.0/io.py", line 1299, in decode
output = self.decoder.decode(input, final=final)
File "/usr/lib/python3.0/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
invalid data
Any help greatly appreciated Thanks bunches.
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor