Re: Parsing MS Word documents

Bill Nalen Tue, 04 Feb 2003 21:42:40 -0800

Bill J. wrote:
>The right way to approach it might be to add a new couple of lines
>to PyPlucker/Parser.py, and add a new file containing the WordParser
>class. Or, you could call a converter to convert it to text and/or HTML,
>and then call the PlainTextParser or the StructuredHTMLParser to grok
>that converted content.

Okay, I just spent a couple of hours getting wvware to run and making some changes to parser.py. I think I've got it to work okay. Here's what I did:
setup an else if condition for application/msword
save data to a temporary doc file
run wvware on the temporary doc file and > to a temporary html file
read the contents of the temporary html file
created a structuredhtmlparser with the new data

Seems to work okay. I think there then needs to be an addition to the pluckerini file to give the location of the wvware (or other) program.

Bill

Re: Parsing MS Word documents

Reply via email to