On Sun, 25 Mar 2018, McGreevy, Anthony wrote:
I am currently playing with Tika to see how it works with regards to extraction of subfiles.

Do you mean files or resources embedded within another file?

If so... With the Tika App, you want -z to have these extracted. With the Tika java classes, you want to pop something like a https://tika.apache.org/1.17/api/org/apache/tika/parser/RecursiveParserWrapper.htmlhttps://tika.apache.org/1.17/api/org/apache/tika/parser/RecursiveParserWrapper.html
or a
https://tika.apache.org/1.17/api/org/apache/tika/extractor/ContainerExtractor.html
on your ParseContext to get called for embedded resources. See
https://wiki.apache.org/tika/RecursiveMetadata for more on how it works and how to have Tika parse + return all the embedded files and resources

Nick

Reply via email to