HI Folks,
I was looking at the Lucene FAQ and I found this very interesting.
How can I index OpenOffice.org files?
These files (.sxw, .sxc, etc) are ZIP archives that contain XML files.
Uncompress the file using Java's ZIP support, then parse meta.xml to get
title etc. and content.xml to get the
.
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/ant/src/java/org/apache/lucene/ant/
Reporter: DURGA DEEP
Priority: Blocker
Writing e-mail parser, and we are impeded by this error.
HtmlDocument hd = new HtmlDocument (p.getInputStream