Anyone knows? I guess if no one I need to look at the code or use log debug. :)
David -- David Pilato, elastic.co Developer | Evangelist, Le 18 déc. 2018 à 21:43 +0100, David Pilato <[email protected]>, a écrit : > Heya > > > When OCR is available, what should happen when I have a document containing > both text and images with text. > > For example I have a PDF with a text "hello world" and an image containing > "foo bar". > When I run Tika with Tesseract to extract text, I can see that only the text > part is extracted, "hello world" that is. > > If I run the same configuration on a PDF which contains only an image with > "foo bar" then "foo bar" is extracted. > > Is that expected? > If so, does this mean that as soon as some text is extracted from a document > we don't run OCR at all? > > Thanks for your insights. > > > David > > -- > David Pilato, elastic.co > Developer | Evangelist,
