I am working with solr7, indexing multilingual files existing in a folder,
using DIH (FileListEntityProcessor for the basic entity, &
TikaEntityProcessor for the child entity in configuration file).
My problem relies here: I want to extract texts from images inside PDF
files, that works fine with the /update/extract request handler where I set
the "parseContext.config" attribute to an xml file lets say "context.xml"
where I set the property "extractInlineImages" for the entry
[PDFParserConfig] to true. But I have no Idea how to set the
parseContext.Config in the DIH configuration??
I tried these approaches, none of them worked:
- set tikaConfig attribute in dih config file to my "context.xml",
obviously won't work since tika config is different :.
- set the parseContext.config attribute to my "\dataImport"
requestHandler, didn't work
I googled a lot with no result...I really really appreciate any help here!!
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html