AarKay, We have a unit test for an MSG embedded within an MSG in POIContainerExtractionTest. I also just tried a newly created msg within an msg file, and I can extract the embedded content with TikaTest.RecursiveMetaParser. This suggests that the issue is not within the OutlookParser.
If you want the bytes of the embedded file, have you tried (or are you using) the Unpacker Resource? IIRC, this gets the attachments (non-recursively!!!) out of each doc you send it and sends you back a zip (or tar). You should be able to step through the ZipEntr(ies) and get the original attachment bytes. Best, Tim -----Original Message----- From: AarKay [mailto:ksu.wildc...@gmail.com] Sent: Thursday, July 31, 2014 12:30 AM To: user@tika.apache.org Subject: Tika - Outlook msg file with another Outlook msg as an attachment - OutlookExtractor passes empty stream I am using Tika Server (TikaJaxRs) for text extraction needs. I also have a need to extract the attachments in the file and save it to the disk in its native format. I was able to do it by having CustomParser and write the file to disk using 'stream' in parse method. Here is the post I used as a reference for building CustomParser. http://stackoverflow.com/questions/20172465/get-embedded-resourses-in-doc- files-using-apache-tika I was able to get it work fine if the attachment is anything but Outlook msg file. I am running into an issue when the attachment is a Outlook msg file. When CustomParser.parse method gets invoked the stream passed to it is empty because of which the file thats being written to disk is always 0 KB. Digging through the code I noticed that in OutlookExtractor.java class the attachment is handled by OfficeParser because msg.attachdata is always null when attachment is a Outlook msg and thats where it is always sending empty stream to CustomParser. Here is the snippet of code from OutlookExtractor where it iterates through Attachment files and uses handleEmbeddedResource method only when msg.attachData is not null. But msg.attachData is always null if the Attachment is of type Outlook msg because of which stream is always empty when delegating the request to CustomParser.parse method. Can someone please tell me how can i access the msg attachment and save it to disk in its Native format? for (AttachmentChunks attachment : msg.getAttachmentFiles()) { xhtml.startElement("div", "class", "attachment-entry"); String filename = null; if (attachment.attachLongFileName != null) { filename = attachment.attachLongFileName.getValue(); } else if (attachment.attachFileName != null) { filename = attachment.attachFileName.getValue(); } if (filename != null && filename.length() > 0) { xhtml.element("h1", filename); } if(attachment.attachData != null) { handleEmbeddedResource( TikaInputStream.get(attachment.attachData.getValue()), filename, null, xhtml, true ); } if(attachment.attachmentDirectory != null) { handleEmbededOfficeDoc( attachment.attachmentDirectory.getDirectory(), xhtml ); } xhtml.endElement("div"); } Thanks -AarKay