Bill J. wrote:
>The right way to approach it might be to add a new couple of lines
>to PyPlucker/Parser.py, and add a new file containing the WordParser
>class. Or, you could call a converter to convert it to text and/or HTML,
>and then call the PlainTextParser or the StructuredHTMLParser to grok
>that converted content.
Okay, I just spent a couple of hours getting wvware to run and making some changes to parser.py. I think I've got it to work okay. Here's what I did:
setup an else if condition for application/msword
save data to a temporary doc file
run wvware on the temporary doc file and > to a temporary html file
read the contents of the temporary html file
created a structuredhtmlparser with the new data
Seems to work okay. I think there then needs to be an addition to the pluckerini file to give the location of the wvware (or other) program.
Bill
- Parsing MS Word documents Bill Nalen
- Re: Parsing MS Word documents Chris Hawks
- Re: Parsing MS Word documents Dave Maddock
- Re: Parsing MS Word documents David A. Desrosiers
- Re: Parsing MS Word documents Bill Nalen
- Re: Parsing MS Word documents Chris Hawks
- Re: Parsing MS Word documents Dave Maddock
- Re: Parsing MS Word documents David A. Desrosiers
- Re: Parsing MS Word documents Dave Maddock
- Re: Parsing MS Word documents Bill Janssen
- Re: Parsing MS Word documents Bill Nalen
- Re: Parsing MS Word documents Bill Janssen
- Re: Parsing MS Word documents Bill Nalen
- Re: Parsing MS Word documents Chris Hawks
- Re: Parsing MS Word documents Bill Nalen
- Re: Parsing MS Word documents Chris Hawks
- Re: Parsing MS Word documents Bill Nalen
- Re: Parsing MS Word documents Chris Hawks
- Re: Parsing MS Word documents Bill Janssen
- Re: Parsing MS Word documents Bill Janssen
- Re: Parsing MS Word documents Bill Janssen
