Hey, I am currently playing with Tika to see how it works with regards to extraction of subfiles.
The requirement I have is to have Tika take in a parent document, a .docx or .eml for example, and extract out the text content, metadata and all subfiles so that I can save them to disk. So far I have worked out the metadata and content extraction but I haven't been able to find any tutorials on the subfile extraction. If you could point me at resources I could use to work this out or examples of sample code doing this already it would be much appreciated. Thanks, Anthony
