Thank Tim!

For whatever reason, this thread went into my spam box. :(
I'll look at async/pipes indeed as this could help me to add parallelism to 
FSCrawler.
But I'll most likely think about it for FSCrawler v3 where I wanted to redesign 
everything.

Now with the async/pipes and fetchers and emitters (which is basically what I 
wanted to implement in v3), I have to think about it.


Best

David
Le 21 juil. 2021 à 16:43 +0200, Tim Allison <[email protected]>, a écrit :
> Hi David,
> W00t! You should definitely also look into the async/pipes option
> for FSCrawler once I get the documentation in order. I'm in the
> process of putting together the minimal config files for
> fileshare->fileshare, and then I'll put together an example of
> fileshare->OpenSearch, which, um, should work for a bit at least with
> Elasticsearch. If it doesn't work with Elasticsearch, it should be
> fairly easy to write your own emitter.
> The benefit of the pipes package is that all of the parsing is done
> in isolated jvms so that catastrophic problems aren't catastrophic for
> the indexing process or the indexer. :D The other benefit is that we
> have fetchers for fileshare, S3 and http so that you can easily add
> new data sources.
> The new pipes module takes a bit of explanation (in lieu of tbd
> documentation), but not much. I'm always happy to chat.
>
> Cheers,
>
> Tim
>
>
> On Wed, Jul 21, 2021 at 10:16 AM David Pilato <[email protected]> wrote:
> >
> > Ha. Found it...
> >
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parsers-standard-package</artifactId>
> > </dependency>
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parser-scientific-module</artifactId>
> > </dependency>
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parser-sqlite3-module</artifactId>
> > </dependency>
> >
> >
> >
> > I guess we just need to update the documentation?
> >
> > David
> > Le 21 juil. 2021 à 16:10 +0200, David Pilato <[email protected]>, a écrit :
> >
> > Hey team
> >
> >
> > I'm trying to upgrade my project to 2.0.0.
> > I'm confused. The doc says to include:
> >
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parsers</artifactId>
> > <version>2.0.0</version>
> > </dependency>
> >
> >
> > But the release note says to include modules like:
> >
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parsers-standard</artifactId>
> > <version>2.0.0</version>
> > </dependency>
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parsers-extended</artifactId>
> > <version>2.0.0</version>
> > </dependency>
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parser-scientific-module</artifactId>
> > <version>2.0.0</version>
> > </dependency>
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parser-sqlite3-module</artifactId>
> > <version>2.0.0</version>
> > </dependency>
> >
> >
> >
> > But AFAICS all those modules are marked as pom not as jar. So maven is 
> > failing when I'm trying to use them.
> >
> > What am I missing here?
> >
> >
> > David

Reply via email to