Hello, I'm working with Nutch to crawl some msword documents. What I'm trying to do is to add custom properties to the msword files, so that when Nutch crawls, it extracts those custom properties and indexes them.
Nutch doesn't do that, so I came to the conclusion that I'll have to change the msword parsers. From searching the web, I found that the best way to do this is to use the CustomProperties and the DocumentSummaryInformation classes. I followed the following example: http://www.docjar.com/html/api/org/apache/poi/hpsf/examples/ModifyDocumentSummaryInformation.java.html I kept all the parts that I needed to read the custom properties, but I get syntax errors for: dsi = PropertySetFactory.newDocumentSummaryInformation(); si = PropertySetFactory.newSummaryInformation(); The errors say: The method newDocumentSummaryInformation() is undefined for the type PropertySetFactory The method newSummaryInformation() is undefined for the type PropertySetFactory MSExtractor.java When I check the API page, those functions are defined. Why am I getting this? Also, this is what I believe to be the only way to do this. If there is a better way to do this please suggest it. This is only the first step of the process, I still need to figure out how to integrate this with Nutch's msword parser. Cheers -- View this message in context: http://www.nabble.com/Extracting-the-custom-properties-from-msword-files.-tp21754082p21754082.html Sent from the POI - User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
