Looking to Index Various Document Types.

2008-03-12 Thread DURGA DEEP
HI Folks, I was looking at the Lucene FAQ and I found this very interesting. How can I index OpenOffice.org files? These files (.sxw, .sxc, etc) are ZIP archives that contain XML files. Uncompress the file using Java's ZIP support, then parse meta.xml to get title etc. and content.xml to get the

[jira] Created: (LUCENE-1041) This document has errors that must be fixed before

2007-10-31 Thread DURGA DEEP (JIRA)
. http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/ant/src/java/org/apache/lucene/ant/ Reporter: DURGA DEEP Priority: Blocker Writing e-mail parser, and we are impeded by this error. HtmlDocument hd = new HtmlDocument (p.getInputStream