Hi David,
W00t! You should definitely also look into the async/pipes option
for FSCrawler once I get the documentation in order. I'm in the
process of putting together the minimal config files for
fileshare->fileshare, and then I'll put together an example of
fileshare->OpenSearch, which, um, should work for a bit at least with
Elasticsearch. If it doesn't work with Elasticsearch, it should be
fairly easy to write your own emitter.
The benefit of the pipes package is that all of the parsing is done
in isolated jvms so that catastrophic problems aren't catastrophic for
the indexing process or the indexer. :D The other benefit is that we
have fetchers for fileshare, S3 and http so that you can easily add
new data sources.
The new pipes module takes a bit of explanation (in lieu of tbd
documentation), but not much. I'm always happy to chat.
Cheers,
Tim
On Wed, Jul 21, 2021 at 10:16 AM David Pilato <[email protected]> wrote:
>
> Ha. Found it...
>
> <dependency>
> <groupId>org.apache.tika</groupId>
> <artifactId>tika-parsers-standard-package</artifactId>
> </dependency>
> <dependency>
> <groupId>org.apache.tika</groupId>
> <artifactId>tika-parser-scientific-module</artifactId>
> </dependency>
> <dependency>
> <groupId>org.apache.tika</groupId>
> <artifactId>tika-parser-sqlite3-module</artifactId>
> </dependency>
>
>
>
> I guess we just need to update the documentation?
>
> David
> Le 21 juil. 2021 à 16:10 +0200, David Pilato <[email protected]>, a écrit :
>
> Hey team
>
>
> I'm trying to upgrade my project to 2.0.0.
> I'm confused. The doc says to include:
>
> <dependency>
> <groupId>org.apache.tika</groupId>
> <artifactId>tika-parsers</artifactId>
> <version>2.0.0</version>
> </dependency>
>
>
> But the release note says to include modules like:
>
> <dependency>
> <groupId>org.apache.tika</groupId>
> <artifactId>tika-parsers-standard</artifactId>
> <version>2.0.0</version>
> </dependency>
> <dependency>
> <groupId>org.apache.tika</groupId>
> <artifactId>tika-parsers-extended</artifactId>
> <version>2.0.0</version>
> </dependency>
> <dependency>
> <groupId>org.apache.tika</groupId>
> <artifactId>tika-parser-scientific-module</artifactId>
> <version>2.0.0</version>
> </dependency>
> <dependency>
> <groupId>org.apache.tika</groupId>
> <artifactId>tika-parser-sqlite3-module</artifactId>
> <version>2.0.0</version>
> </dependency>
>
>
>
> But AFAICS all those modules are marked as pom not as jar. So maven is
> failing when I'm trying to use them.
>
> What am I missing here?
>
>
> David