I should add that the safest way to use Tika is to isolate it in its
own jvm, with tika-batch, the ForkParser or tika-server. This
prevents jar hell and will keep Tika from crashing your application in
rare cases where things go wrong [1].
That said, if this requires a respin of 1.26.1, I'm happy to do so.
Thank you, again.
Best,
Tim
[1]
https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika
On Wed, Mar 31, 2021 at 9:52 AM Tim Allison <[email protected]> wrote:
>
> Sorry about that and thank you for raising this issue. I upgraded the
> version 28 days ago as part of general upgrades (TIKA-3244).
>
> This dependency is only required for org.apache.cxf:cxf-rt-rs-client
> and org.apache.cxf:cxf-rt-frontend-jaxrs.
>
> In branch_1x (and 1.26), we use the cxf client in:
>
> tika-parsers:
> GrobidRestParser which is used by the JournalParser
> NLTKNERecogniser
> TensorflowRESTVideoRecogniser
>
> tika-langdetect:
> Lingo24LangDetector
> TextLangDetector
>
>
> tika-server:
> cxf frontend
>
> If you don't use those parsers or those langdetectors, you'll be ok
> excluding that dependency. If we need to fix this and respin a
> 1.26.1, I'm happy to do so.
>
> Again, I'm sorry for the surprise, and thank you for the notification.
>
> Best,
>
> Tim
>
> On Wed, Mar 31, 2021 at 4:15 AM Andreas Hubold
> <[email protected]> wrote:
> >
> > Hi,
> >
> > from version 1.25 to 1.26, tika-parsers' dependency
> > org.glassfish.jaxb:jaxb-runtime:jar was updated from 2.3.3 to 3.0.0.
> >
> > The new version is part of Jakarta EE 9 and uses the Java package
> > org.glassfish.jaxb, while version 2.3.3 (Jakarta EE 8) used package
> > com.sun.xml.bind.
> > Its transitive dependency jakarta.xml.bind:jakarta.xml.bind-api was also
> > updated from 2.3.3 to 3.0.0, which changed the package from
> > javax.xml.bind to jakarta.xml.bin.
> > (And it also pulls in jakarta.activation from Jakarta EE 9 now)
> >
> > This now causes some problems in a project that already uses Jakarta EE
> > 8 bind-api, version 2.3.3.
> >
> > I found some background info on these versions here:
> > https://www.eclipse.org/community/eclipse_newsletter/2020/november/1.php
> >
> > > mixing Jakarta EE 8 and Jakarta EE 9 APIs will cause issues with
> > Maven because they both use the same Maven coordinates.
> >
> > For such a case, they propose to use the old Java EE 8 artifacts with
> > different Maven coordinates instead of Jakarta EE 8 artifacts. Maybe I
> > could do that, but it can get very complicated in our project, because
> > these artifacts are again transitive dependencies of other libraries.
> >
> > This lead me to the questions:
> >
> > 1) Was this change intended, or did you just increase the version as
> > part of general updates?
> >
> > 2) Which parsers need this dependency? We've excluded some parsers, and
> > might be able to simply exclude the dependency as well. I haven't found
> > any direct usage of this dependency in Tika. Maybe it was just added for
> > version management, and is used transitively by some parser?
> >
> > Thanks and regards,
> > Andreas
> >
> >
> >
> >