Hey team,
I'm wondering if there is a way to filter the content being extracted by Tika using filenames for example. Let say I have a zip file with foo.js, foo.pdf, foo.html, foo.png and I only want to extract text from the pdf and html files. Also, I can see that a Zip is extracted this way as a full String: """ doc/ab1.js CONTENT1 abc/abc2.pdf CONTENT2 ... """ Would it be possible to extract the content as separated Objects, something like: ``` [ { "name": "doc/ab1.js", "content": "CONTENT1", "metadata": [ /* ... */ ] }, { "name": "abc/abc2.pdf", "content": "CONTENT2", "metadata": [ /* ... */ ] }, ... ] ``` Thanks!