Hi, Am using Tika with Solr to index Outlook .msg files. Looking at the file OutlookExtractor.java, I understand that XHTML stream is generated by the parser in such a way that "h1" contains subject, "d1" contains From, To, Cc, Bcc, Recipients and so on. Please correct me if am wrong. Am interested in knowing where the body of the message goes to. Am looking for this because am planning to capture this using the capture parameter, in the ExtractingRequestHandler of solrconfig.xml. All am interested is in capturing the body of .msg file, exclusively, into a field so that I can use to index and search in solr.
Thanks and Regards, Swapna. ____________________________________________________________ Electronic mail messages entering and leaving Arup business systems are scanned for acceptability of content and viruses
