is there any way to catch the illegalpropertysetdata exception? it seems to me that the documents must have an rtf header, but the text encoding is the same as a word document (don't know if this makes sense), because using the rtf handler it reads the documents but doesn't index the contents :s
thank you, Chris Rainer Schwarze wrote: > > chris.b wrote: >> here's a sample file that i wasn't able to index >> http://www.nabble.com/file/p13972759/monte.doc monte.doc >> thanks for the help :) > > As a last thing today I took a quick look at the file. A quick solution > might be to skip the readProperties() call in the HWPFDocument > constructor (don't know right now, whether the properties are really > needed if you only read the Word file): > > public HWPFDocument(POIFSFileSystem pfilesystem) throws IOException > { > // Sort out the hpsf properties > filesystem = pfilesystem; > readProperties(); // <---- remove that one > ... > > Depending on how much work you intend to do, you could either comment > the line out and rebuild the library or subclass HWPFDocument and > override readProperties() with an empty method (what I would recommend > to try first). For the second case, you should get along by changing the > WordExtractor constructor call in the code which you posted to: > > WordExtractor docextractor = new WordExtractor(new MyHWPFDoc(docfin)); > > (MyHWPFDoc being a subclass of HWPFDocument with the empty > readProperties() ) > > Best wishes, Rainer > -- > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Problem-with-word-documents-tf4877644.html#a14022545 Sent from the POI - User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
