I tried with 400 DPI and had set page segmentation mode to 1 - AUTO_OSD No improvement, problem is PDF itself is of low quality.
On Monday, November 25, 2019 at 11:36:48 AM UTC-5, shree wrote: > > Also try with 300 dpi > > On Mon, Nov 25, 2019 at 9:45 PM Jeetendra Ahuja <[email protected] > <javascript:>> wrote: > >> Nopes, I will do it. Thanks. >> >> On Monday, November 25, 2019 at 9:48:08 AM UTC-5, shree wrote: >>> >>> have you tried `osd` - orientation and script detection? >>> >>> On Mon, Nov 25, 2019 at 8:13 PM Jeetendra Ahuja <[email protected]> >>> wrote: >>> >>>> So before processing a document, we want to rejects ones which are CJK >>>> so I've used Tesseract for this.. It does pretty good job but some times >>>> when document quality is low then from "Table of Contents" page, most of >>>> the dots are recognized as "CJK" characters. I am planning to create own >>>> training data but wanted to get advice from experts. >>>> >>>> *Config:* >>>> >>>> - Tesseract 4.0 >>>> - instance.setLanguage("chi_simB+chi_traB+korB+jpnB+engB"); >>>> - instance.setOcrEngineMode(1); >>>> >>>> >>>> Image is zoomed to 600% in Adobe PDF reader. >>>> >>>> Please let me know. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/95138faa-307f-4417-b72c-648ab84993d9%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/95138faa-307f-4417-b72c-648ab84993d9%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/2656ebd0-6116-4f5b-9a8e-975730ba44c1%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/2656ebd0-6116-4f5b-9a8e-975730ba44c1%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2e18cbca-5910-4240-97da-5dffa5e57525%40googlegroups.com.

