Re: OCR_STRATEGY=AUTO

Tim Allison Mon, 11 Jan 2021 06:50:37 -0800

Y.  That's the idea.  I've seen some PDFs, and I am not making this up,
where alternate pages were image only or text.


On Mon, Jan 11, 2021 at 9:41 AM Peter Kronenberg <[email protected]>
wrote:

> Can you check my understanding of OCR_STRATEGY=AUTO?  Looking at the code
> in AbstractPDF2XHTML, it appears to be done on a page by page basis.  So if
> the page satisfies the criteria of having a small amount of text, then the
> entire page is OCRed.  If the page is mostly searchable text, however, then
> the text will be extracted.  Is this correct?  Each page is processed
> independently?
>

Re: OCR_STRATEGY=AUTO

Reply via email to