OCR_STRATEGY=AUTO

Peter Kronenberg Mon, 11 Jan 2021 06:41:20 -0800

Can you check my understanding of OCR_STRATEGY=AUTO?  Looking at the code in 
AbstractPDF2XHTML, it appears to be done on a page by page basis.  So if the 
page satisfies the criteria of having a small amount of text, then the entire 
page is OCRed.  If the page is mostly searchable text, however, then the text 
will be extracted.  Is this correct?  Each page is processed independently?

OCR_STRATEGY=AUTO

Reply via email to