Nopes, I will do it. Thanks. On Monday, November 25, 2019 at 9:48:08 AM UTC-5, shree wrote: > > have you tried `osd` - orientation and script detection? > > On Mon, Nov 25, 2019 at 8:13 PM Jeetendra Ahuja <[email protected] > <javascript:>> wrote: > >> So before processing a document, we want to rejects ones which are CJK so >> I've used Tesseract for this.. It does pretty good job but some times when >> document quality is low then from "Table of Contents" page, most of the >> dots are recognized as "CJK" characters. I am planning to create own >> training data but wanted to get advice from experts. >> >> *Config:* >> >> - Tesseract 4.0 >> - instance.setLanguage("chi_simB+chi_traB+korB+jpnB+engB"); >> - instance.setOcrEngineMode(1); >> >> >> Image is zoomed to 600% in Adobe PDF reader. >> >> Please let me know. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/95138faa-307f-4417-b72c-648ab84993d9%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/95138faa-307f-4417-b72c-648ab84993d9%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2656ebd0-6116-4f5b-9a8e-975730ba44c1%40googlegroups.com.

