Hi Brian,
  A few thoughts:

1) tika-app is basically tika-core + tika-parsers-standard-package. Which
components are you trying to avoid? tika-serialization and jackson?
boilerpipecontenthandler and some of its dependencies? I ask, because we
could factor out a tika-app-core with no parsers in Tika 3.x, which is what
we do now with tika-server-core and tika-server-standard.

2) Unrelated, there are probably more efficient ways of running Tika than
calling it per file on the commandline. That is a robust option, at least!

If all you want is detect and text extraction, and you want to run it from
the commandline, write two classes, whose main()s call:
System.out.println(Tika.detect(File f));

or

System.out.println(Tika.parseToString(File f))

On Thu, Mar 7, 2024 at 5:04 PM Brian Laskey <[email protected]> wrote:

> Hello Tika community,
>
>
>
> Our team is migrating away from usage of tika-app.jar (2.6 currently) to
> something with more minimal third party dependencies which we can control.
>
>
>
> Is there any good documentation or pathway to describe how a team could map 
> the tika-app functionality we use to the same behavior using just tika-core 
> and tika-parsers-standard-package
>
> (I assume)?
>
>
>
> The tika-app functions we use today are:
>
>
>
> Mime-type detection
>
> java -jar tika-app.jar -d <file>
>
>
>
> and
>
> Text extraction attempts
>
> java -jar tika-app.jar -t <file>
>
>
>
> Is there a subset of tika parser jars we would need to include to have
> equivalent functionality if we wrote our own wrapper main class?
>
>
>
> Thank you,
>
> Brian Laskey
>
>
>

Reply via email to