May need to extract outside SolR and index pure text with an external ingestion process. You have much more control over the Tika attributes and behaviors.
-- Rahul Singh rahul.si...@anant.us Anant Corporation On Apr 9, 2018, 10:23 PM -0400, Zheng Lin Edwin Yeo <edwinye...@gmail.com>, wrote: > Hi, > > Currently I am facing issue whereby the text in images file like jpg, bmp > are not being extracted out and indexed. After the indexing, Tika did > extract all the meta data out and index them under the fields attr_*. > However, the content field is always empty for images file. For other types > of document files like .doc, the content is extracted correctly. > > I have already updated the tika-parsers-1.17.jar, under > \prg\apache\tika\parser\pdf\ for extractInlineImages to true. > > > What could be the reason? > > I have just upgraded to Solr 7.3.0. > > Regards, > Edwin