Kushal Kumaran wrote:
On Wed, May 13, 2009 at 4:28 PM, Shailja Gulati <shailja.gul...@tcs.com> wrote:
Hi ,

I am currently working on "Information retrieval from semi structured
Documents" in which there is a need to read data from Resumes.

Could anyone tell me is there any python API to read Word doc?

If you're using Windows, you can use COM APIs to read Word documents.
Or you can use OpenOffice.org using uno.  You can find examples of
either by googling.

One problem that I keep getting with OOo an UNO and python. When asked to output a .txt file it comes out sorta pk-zipped. Same for .csv files it outputs. If you can, I suggest you work with Microsoft's COM. I have had better luck there. Not much, but better. Usually get a real .txt

For what it is worth, in OOo I did have some progress by creating a macro to write out text in it and setting it to run on EVERY file it opens and ten close OOo after the write. Then batched the OOo file.doc process with a:

files2process.sh            #files2process.bat  in window$
swriter file1.doc
swriter file2.doc

not very elegant, but it worked for me.

To be honest - I just give those to a clerk and let them point and click until done these days. Less frustrating. Documentation bad for each.


Reply via email to