Tim,

Thanks for your timely reply and the the information on where to find the tika 
nlp jar. To answer your question, For me personally I don't think it would be 
necessary to add those jars to the Apache mirrors. I would have been happy with 
a note of some sort telling I could find the jars in the maven repository. If a 
ticket is necessary for that please advise and I'll submit one. Thanks again.

John
________________________________
From: Tim Allison <[email protected]>
Sent: Thursday, November 18, 2021 1:36 PM
To: [email protected] <[email protected]>
Subject: [EXTERNAL] Re: missing parsers in tika 2.X

Hi John,
  We're not currently releasing those modules via our Apache mirrors
(well, Apache CDN) as you found
(https://dlcdn.apache.org/tika/2.1.0/). You can manually grab the jar
from maven -- e.g.
https://search.maven.org/artifact/org.apache.tika/tika-parser-nlp-module/2.1.0/jar
or https://mvnrepository.com/artifact/org.apache.tika/tika-parser-nlp-module.
I confirmed that adding a maven dependency does bring in the
parsers...see below.
  If you feel that we should include the tika-parser-nlp-module as
part of the release and that it should be available on the Apache CDN,
please open an issue on our JIRA.

 Thank you!

Best,

           Tim

<dependencies>
  <dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-core</artifactId>
    <version>2.1.0</version>
  </dependency>
  <dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-parser-nlp-module</artifactId>
    <version>2.1.0</version>
  </dependency>
<dependencies>

@Test
public void testNLPParsers() throws Exception {
    Tika tika = new Tika();
    CompositeParser parser = (CompositeParser)tika.getParser();
    CompositeParser defaultParser = (CompositeParser)
parser.getAllComponentParsers().get(0);
    for (Parser p : defaultParser.getAllComponentParsers()) {
        System.out.println(p);
    }
}

results in:
org.apache.tika.parser.geo.GeoParser@134d26af
org.apache.tika.parser.journal.JournalParser@66ac5762
org.apache.tika.parser.pdf.PDFParser@797cf65c
org.apache.tika.parser.sentiment.SentimentAnalysisParser@31bcf236

On Thu, Nov 18, 2021 at 12:48 PM Bankert, John (CTFV)
<[email protected]> wrote:
>
> HI all,
>
> I'm trying to upgrade from Tika 1.18 to Tika 2.1.0, and am having a problem. 
> GeoParser.java has been moved into the tika-parsers-ml/tika-parser-nlp-module 
> source tree. I can't seem to find any of the modules under tika-parsers-ml in 
> any of the available tika-2.1.0 jar files. I also checked the Tika 2.0 jar 
> files, and they seem to be in the state as the 2.1 jar files. Am I simply not 
> looking in the right place, are those modules under tika-parsers-ml being 
> deprecated and removed, or is this perhaps an oversite in the build process? 
> Any information anyone can shed on the issue is appreciated. Thanks!
>
> John Bankert

Reply via email to