You don't need a Lucene Parser (They don't exist). In stead use a Java
Parser (such as dom4j). I personally prefer DOM. It allows XPATH to extract
exactly what you need. SAX is an alternative to DOM. SAX isn't however
recommended by the W3C and lacks many of the extraction methods available
in DOM.
Hi Karthik,
​Sounds like you know what you have to do, the only problem I saw with your
statement is about parsing it with Lucene. You can read the files from
disk (basic I/O) and use a SAX parser to extract the information you want
to search against and then build your index from that informati