You had to milk yourself?
Guess that makes you a Farming Ninja.
-----Original Message-----
From: chris.b [mailto:[EMAIL PROTECTED]
Sent: Monday, November 26, 2007 2:19 PM
To: [email protected]
Subject: Problem with word documents
got the wrong forum the first time round, so here goes...
okay, so i'm very new to lucene, so it may be my bad, but i can get it
to
index .txt files, and when trying to index word documents (using poi),
the
program starts running and when it reaches a .doc file, i get the
following
errors:
Exception in thread "main"
org.apache.poi.hpsf.IllegalPropertySetDataException: The property set
claims
to have a size of 16 bytes. However, it exceeds 16 bytes.
at org.apache.poi.hpsf.Section.<init>(Section.java:255)
at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:454)
at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:249)
at
org.apache.poi.hpsf.PropertySetFactory.create(PropertySetFactory.java:61
)
at
org.apache.poi.POIDocument.getPropertySet(POIDocument.java:92)
at
org.apache.poi.POIDocument.readProperties(POIDocument.java:69)
at
org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:147)
at
org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:56
)
at
org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:48
)
at Indexer.indexFile(Indexer.java:76)
at Indexer.indexDirectory(Indexer.java:57)
at Indexer.index(Indexer.java:38)
at Indexer.main(Indexer.java:20)
and my code is as follows:
private static void indexFile(IndexWriter writer, File f) throws
IOException {
if (f.isHidden() || !f.exists() || !f.canRead()) {
return;
}
System.out.println("A acrescentar " +
f.getCanonicalPath() +
" ao indice.");
Document doc = new Document();
// For .doc files
if (f.getName().endsWith(".doc")){
FileInputStream docfin = new
FileInputStream(f.getAbsolutePath());
WordExtractor docextractor = new
WordExtractor(docfin);
String content = docextractor.getText();
doc.add(new Field("contents", content,
Field.Store.NO, Field.Index.TOKENIZED));
} // For .txt files
else if (f.getName().endsWith(".txt")) {
doc.add(new Field("contents", new
FileReader(f)));
}
doc.add(new Field("filename", f.getCanonicalPath(),
Field.Store.YES, Field.Index.TOKENIZED));
writer.addDocument(doc);
}
(I think i included all that's necessary)
Thanks in advance for any help.
--
View this message in context:
http://www.nabble.com/Problem-with-word-documents-tf4877644.html#a139576
74
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]