Hi Brian, A few thoughts: 1) tika-app is basically tika-core + tika-parsers-standard-package. Which components are you trying to avoid? tika-serialization and jackson? boilerpipecontenthandler and some of its dependencies? I ask, because we could factor out a tika-app-core with no parsers in Tika 3.x, which is what we do now with tika-server-core and tika-server-standard.
2) Unrelated, there are probably more efficient ways of running Tika than calling it per file on the commandline. That is a robust option, at least! If all you want is detect and text extraction, and you want to run it from the commandline, write two classes, whose main()s call: System.out.println(Tika.detect(File f)); or System.out.println(Tika.parseToString(File f)) On Thu, Mar 7, 2024 at 5:04 PM Brian Laskey <[email protected]> wrote: > Hello Tika community, > > > > Our team is migrating away from usage of tika-app.jar (2.6 currently) to > something with more minimal third party dependencies which we can control. > > > > Is there any good documentation or pathway to describe how a team could map > the tika-app functionality we use to the same behavior using just tika-core > and tika-parsers-standard-package > > (I assume)? > > > > The tika-app functions we use today are: > > > > Mime-type detection > > java -jar tika-app.jar -d <file> > > > > and > > Text extraction attempts > > java -jar tika-app.jar -t <file> > > > > Is there a subset of tika parser jars we would need to include to have > equivalent functionality if we wrote our own wrapper main class? > > > > Thank you, > > Brian Laskey > > >
