( I’m starting a new thread because I did not want to hijack the previous discussion on Metadata obj reuse, etc.)
My original intent was to know if the Tika Project has a tabulation of Parsers ~ mapping a file type to a parser to a Maven artifact. Maven artifacts have proliferated and its now more important to know how it all ties together because you have to get your `tika-config.xml` just right … More thoughts below. From: Tim Allison [email protected]<mailto:[email protected]> Date: Tuesday, March 7, 2023 at 12:48 PM Subject: Re: [EXT] Re: Best practice for extracting content and metadata repeatedly // Thank you, Marc. // // Please let us know how we can improve the documentation here: // https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0 // // and/or if we need to add documentation elsewhere. // // Tim Thanks Tim A few more ideas below – and yes I think a new Parser Index page is needed to tie this altogether or update the Parsers page below: https://cwiki.apache.org/confluence/display/TIKA/Parsers – This page looks close, but its Jargon-based. Possibly not comprehensive, and more a list of worked examples? The “Migratiing to Tika 2.x” page is also fun reading – if you are migrating. For those finding Tika now and using 2.x the concept of migration is not relevant. Can you provide a simple tabulation of File type, Parser(s) and Maven plugin as a new page? Possible Model: Maven Plugins table is like this, https://maven.apache.org/plugins/index.html Marc
