Y.  That's the idea.  I've seen some PDFs, and I am not making this up,
where alternate pages were image only or text.

On Mon, Jan 11, 2021 at 9:41 AM Peter Kronenberg <[email protected]>
wrote:

> Can you check my understanding of OCR_STRATEGY=AUTO?  Looking at the code
> in AbstractPDF2XHTML, it appears to be done on a page by page basis.  So if
> the page satisfies the criteria of having a small amount of text, then the
> entire page is OCRed.  If the page is mostly searchable text, however, then
> the text will be extracted.  Is this correct?  Each page is processed
> independently?
>

Reply via email to