It's often much easier to approach this by running Tika separately.
Here's a blog on both the reasoning and sample code:

Among other things, you have a lot more control over how Tika operates.


On Tue, Mar 6, 2018 at 12:36 AM, lala <> wrote:
> Hi,
> I am working with solr7, indexing multilingual files existing in a folder,
> using DIH (FileListEntityProcessor for the basic entity, &
> TikaEntityProcessor for the child entity in configuration file).
> My problem relies here: I want to extract texts from images inside PDF
> files, that works fine with the /update/extract request handler where I set
> the "parseContext.config" attribute to an xml file lets say "context.xml"
> where I set the property "extractInlineImages" for the entry
> [PDFParserConfig] to true. But I have no Idea how to set the
> parseContext.Config in the DIH configuration??
> I tried these approaches, none of them worked:
>     - set tikaConfig attribute in dih config file to my "context.xml",
> obviously won't work since tika config is different :.
>     - set the parseContext.config attribute to my "\dataImport"
> requestHandler, didn't work
> I googled a lot with no result...I really really appreciate any help here!!
> --
> Sent from:

Reply via email to