I would add that there is a surprising dependency on javax.xml.bind:jaxb-api 2.3.1 in tika-parent pom but only when building with Java 9+ (did not notice it before since XWiki is built with Java 8).
> If we need to fix this and respin a 1.26.1, I'm happy to do so. I don't think we use those specific parsers, so we are probably ok on our side (now I guess it's theoretically possible to hit those parses depending on the kind of file people are going to attach since we don't really limit it) for now but if your plan is to go back to jabx 2.x dependency anyway would probably be better to do it ASAP before people adapt their project based on this change :) On Wed, Mar 31, 2021 at 3:59 PM Tim Allison <[email protected]> wrote: > > I should add that the safest way to use Tika is to isolate it in its > own jvm, with tika-batch, the ForkParser or tika-server. This > prevents jar hell and will keep Tika from crashing your application in > rare cases where things go wrong [1]. > > That said, if this requires a respin of 1.26.1, I'm happy to do so. > > Thank you, again. > > Best, > > Tim > > [1] > https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika > > On Wed, Mar 31, 2021 at 9:52 AM Tim Allison <[email protected]> wrote: > > > > Sorry about that and thank you for raising this issue. I upgraded the > > version 28 days ago as part of general upgrades (TIKA-3244). > > > > This dependency is only required for org.apache.cxf:cxf-rt-rs-client > > and org.apache.cxf:cxf-rt-frontend-jaxrs. > > > > In branch_1x (and 1.26), we use the cxf client in: > > > > tika-parsers: > > GrobidRestParser which is used by the JournalParser > > NLTKNERecogniser > > TensorflowRESTVideoRecogniser > > > > tika-langdetect: > > Lingo24LangDetector > > TextLangDetector > > > > > > tika-server: > > cxf frontend > > > > If you don't use those parsers or those langdetectors, you'll be ok > > excluding that dependency. If we need to fix this and respin a > > 1.26.1, I'm happy to do so. > > > > Again, I'm sorry for the surprise, and thank you for the notification. > > > > Best, > > > > Tim > > > > On Wed, Mar 31, 2021 at 4:15 AM Andreas Hubold > > <[email protected]> wrote: > > > > > > Hi, > > > > > > from version 1.25 to 1.26, tika-parsers' dependency > > > org.glassfish.jaxb:jaxb-runtime:jar was updated from 2.3.3 to 3.0.0. > > > > > > The new version is part of Jakarta EE 9 and uses the Java package > > > org.glassfish.jaxb, while version 2.3.3 (Jakarta EE 8) used package > > > com.sun.xml.bind. > > > Its transitive dependency jakarta.xml.bind:jakarta.xml.bind-api was also > > > updated from 2.3.3 to 3.0.0, which changed the package from > > > javax.xml.bind to jakarta.xml.bin. > > > (And it also pulls in jakarta.activation from Jakarta EE 9 now) > > > > > > This now causes some problems in a project that already uses Jakarta EE > > > 8 bind-api, version 2.3.3. > > > > > > I found some background info on these versions here: > > > https://www.eclipse.org/community/eclipse_newsletter/2020/november/1.php > > > > > > > mixing Jakarta EE 8 and Jakarta EE 9 APIs will cause issues with > > > Maven because they both use the same Maven coordinates. > > > > > > For such a case, they propose to use the old Java EE 8 artifacts with > > > different Maven coordinates instead of Jakarta EE 8 artifacts. Maybe I > > > could do that, but it can get very complicated in our project, because > > > these artifacts are again transitive dependencies of other libraries. > > > > > > This lead me to the questions: > > > > > > 1) Was this change intended, or did you just increase the version as > > > part of general updates? > > > > > > 2) Which parsers need this dependency? We've excluded some parsers, and > > > might be able to simply exclude the dependency as well. I haven't found > > > any direct usage of this dependency in Tika. Maybe it was just added for > > > version management, and is used transitively by some parser? > > > > > > Thanks and regards, > > > Andreas > > > > > > > > > > > > -- Thomas
