( I’m starting a new thread because I did not want to hijack the previous 
discussion on Metadata obj reuse, etc.)

My original intent was to know if the Tika Project has a tabulation of Parsers  
~ mapping a file type to a parser to a Maven artifact.  Maven artifacts have 
proliferated and its now more important to know how it all ties together 
because you have to get your `tika-config.xml` just right …   More thoughts 
below.

From: Tim Allison [email protected]<mailto:[email protected]>
Date: Tuesday, March 7, 2023 at 12:48 PM
Subject: Re: [EXT] Re: Best practice for extracting content and metadata 
repeatedly
// Thank you, Marc.
//
// Please let us know how we can improve the documentation here:
// https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0
//
// and/or if we need to add documentation elsewhere.
//
// Tim


Thanks Tim A few more ideas below – and yes I think a new Parser Index page is 
needed to tie this altogether or update the Parsers page below:

https://cwiki.apache.org/confluence/display/TIKA/Parsers – This page looks 
close, but its Jargon-based.  Possibly not comprehensive, and more a list of 
worked examples?

The “Migratiing to Tika 2.x” page is also fun reading – if you are migrating.  
For those finding Tika now and using 2.x the concept of migration is not 
relevant.

Can you provide a simple tabulation of File type, Parser(s) and Maven plugin as 
a new page?
Possible Model:  Maven Plugins table is like this, 
https://maven.apache.org/plugins/index.html
Marc





Reply via email to