After several testing, I think "line removal" is the reason instead of the binarization.
xian於 2020年7月2日星期四 UTC+8下午5時54分42秒寫道: > > For the Chinese words, I found that binarization in tesseract makes really > bad results. > I use -c tessedit_write_image=1 to get the result image from tesseract's > binarization. > > As attachments, > original > tess_bin -> tesseract binarize the original.png > my_bin -> my preprocessing to the original.png > tess_my_bin -> tesseract binarize the my_bin.png > > You can find that some characters disappear. > Before I pass all the images to the tesseract, I want to use my own > function (pre-processing) first. > But tesseract's binarization make result worse. > > > I want to handle the image preprocessing part by mysl > How can I disable tesseract's image preprocessing? ....Or the only chance > to do this is to modify the source code? > Thanks!! > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/cdeebed4-ad10-44aa-8d22-cfa5911d03c3o%40googlegroups.com.