Hi,

Am using Tika with Solr to index Outlook .msg files. Looking at the file 
OutlookExtractor.java, I understand that XHTML stream is generated by the 
parser in such a way that "h1" contains subject, "d1" contains From, To, Cc, 
Bcc, Recipients and so on. Please correct me if am wrong. Am interested in 
knowing where the body of the message goes to. Am looking for this because am 
planning to capture this using the capture parameter, in the 
ExtractingRequestHandler of solrconfig.xml. All am interested is in capturing 
the body of .msg file, exclusively, into a field so that I can use to index and 
search in solr.

Thanks and Regards,
Swapna.

____________________________________________________________
Electronic mail messages entering and leaving Arup  business
systems are scanned for acceptability of content and viruses

Reply via email to