First of all: you do not mention any important information like which
tesseract version you use, which language model etc.

Next: " -c tessedit_write_image=1" produces Could not set option:
tessedit_write_image=1 ;-)

Next: If you want to avoid tesseract binarization (Otsu), you must provide
realy binarized image [1] as input. Yours my_bin.png image is using format
256 color/ 8 BitsPerPixel image

And last: I am not able to reproduce your problem with the latest tesseract
code:

tesseract real_bin.png real_bin2 -c tessedit_write_images=1 -l chi_tra

see attached tessinput.tif - it is different from yours tess_my_bin.tif....

[1]
https://github.com/tesseract-ocr/tesseract/blob/e910b3c20b831017b3152378bdaa4c567e62c65a/src/ccmain/thresholder.cpp#L185-L199

Zdenko


št 2. 7. 2020 o 11:54 xian <chenux...@gmail.com> napísal(a):

> For the Chinese words, I found that binarization in tesseract makes really
> bad results.
> I use -c tessedit_write_image=1 to get the result image from tesseract's
> binarization.
>
> As attachments,
> original
> tess_bin -> tesseract binarize the original.png
> my_bin -> my preprocessing to the original.png
> tess_my_bin ->  tesseract binarize the my_bin.png
>
> You can find that some characters disappear.
> Before I pass all the images to the tesseract, I want to use my own
> function (pre-processing) first.
> But tesseract's binarization make result worse.
>
>
> I want to handle the image preprocessing part by mysl
> How can I disable tesseract's image preprocessing? ....Or the only chance
> to do this is to modify the source code?
> Thanks!!
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/fe0850ae-6138-4736-a855-fb691b16056co%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/fe0850ae-6138-4736-a855-fb691b16056co%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xxzxaj%2Byeas_pyMt9vXn%3DWnf2WAerv%2BR3VXYUyEp9Zsg%40mail.gmail.com.

Reply via email to