tesseract image001.png - --psm 0 Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 625 Warning. Invalid resolution 0 dpi. Using 70 instead. Page number: 0 Orientation in degrees: 0 Rotate: 0 Orientation confidence: 5.30 *Script: Latin* Script confidence: 3.64
On Monday, November 25, 2019 at 8:13:43 PM UTC+5:30, Jeetendra Ahuja wrote: > > So before processing a document, we want to rejects ones which are CJK so > I've used Tesseract for this.. It does pretty good job but some times when > document quality is low then from "Table of Contents" page, most of the > dots are recognized as "CJK" characters. I am planning to create own > training data but wanted to get advice from experts. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4e10c5f5-0667-439f-ab4e-1af42b97a5b7%40googlegroups.com.