Hello,

I'm working with Nutch to crawl some msword documents. What I'm trying to do
is to add custom properties to the msword files, so that when Nutch crawls,
it extracts those custom properties and indexes them. 

Nutch doesn't do that, so I came to the conclusion that I'll have to change
the msword parsers. From searching the web, I found that the best way to do
this is to use the CustomProperties and the DocumentSummaryInformation
classes. I followed the following example:

http://www.docjar.com/html/api/org/apache/poi/hpsf/examples/ModifyDocumentSummaryInformation.java.html

I kept all the parts that I needed to read the custom properties, but I get
syntax errors for:
dsi = PropertySetFactory.newDocumentSummaryInformation();
si = PropertySetFactory.newSummaryInformation();

The errors say:
The method newDocumentSummaryInformation() is undefined for the type
PropertySetFactory      
The method newSummaryInformation() is undefined for the type
PropertySetFactory      MSExtractor.java        

When I check the API page, those functions are defined. Why am I getting
this?

Also, this is what I believe to be the only way to do this. If there is a
better way to do this please suggest it. This is only the first step of the
process, I still need to figure out how to integrate this with Nutch's
msword parser.

Cheers

-- 
View this message in context: 
http://www.nabble.com/Extracting-the-custom-properties-from-msword-files.-tp21754082p21754082.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to