Tim, Thanks for your timely reply and the the information on where to find the tika nlp jar. To answer your question, For me personally I don't think it would be necessary to add those jars to the Apache mirrors. I would have been happy with a note of some sort telling I could find the jars in the maven repository. If a ticket is necessary for that please advise and I'll submit one. Thanks again.
John ________________________________ From: Tim Allison <[email protected]> Sent: Thursday, November 18, 2021 1:36 PM To: [email protected] <[email protected]> Subject: [EXTERNAL] Re: missing parsers in tika 2.X Hi John, We're not currently releasing those modules via our Apache mirrors (well, Apache CDN) as you found (https://dlcdn.apache.org/tika/2.1.0/). You can manually grab the jar from maven -- e.g. https://search.maven.org/artifact/org.apache.tika/tika-parser-nlp-module/2.1.0/jar or https://mvnrepository.com/artifact/org.apache.tika/tika-parser-nlp-module. I confirmed that adding a maven dependency does bring in the parsers...see below. If you feel that we should include the tika-parser-nlp-module as part of the release and that it should be available on the Apache CDN, please open an issue on our JIRA. Thank you! Best, Tim <dependencies> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-core</artifactId> <version>2.1.0</version> </dependency> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parser-nlp-module</artifactId> <version>2.1.0</version> </dependency> <dependencies> @Test public void testNLPParsers() throws Exception { Tika tika = new Tika(); CompositeParser parser = (CompositeParser)tika.getParser(); CompositeParser defaultParser = (CompositeParser) parser.getAllComponentParsers().get(0); for (Parser p : defaultParser.getAllComponentParsers()) { System.out.println(p); } } results in: org.apache.tika.parser.geo.GeoParser@134d26af org.apache.tika.parser.journal.JournalParser@66ac5762 org.apache.tika.parser.pdf.PDFParser@797cf65c org.apache.tika.parser.sentiment.SentimentAnalysisParser@31bcf236 On Thu, Nov 18, 2021 at 12:48 PM Bankert, John (CTFV) <[email protected]> wrote: > > HI all, > > I'm trying to upgrade from Tika 1.18 to Tika 2.1.0, and am having a problem. > GeoParser.java has been moved into the tika-parsers-ml/tika-parser-nlp-module > source tree. I can't seem to find any of the modules under tika-parsers-ml in > any of the available tika-2.1.0 jar files. I also checked the Tika 2.0 jar > files, and they seem to be in the state as the 2.1 jar files. Am I simply not > looking in the right place, are those modules under tika-parsers-ml being > deprecated and removed, or is this perhaps an oversite in the build process? > Any information anyone can shed on the issue is appreciated. Thanks! > > John Bankert
