[tesseract-ocr] Re: Tesseract joins characters that are not touching

Mohit Jain Tue, 27 Nov 2018 03:46:50 -0800

Hi,
   Can you tell me how did you extract the binary-intermediate image 
created by Tesseract?


On Saturday, June 18, 2016 at 10:16:57 PM UTC+5:30, Julian Einhaus wrote:
>
> Hi,
> I am trying to read three lines of text on a well defined image (pretty 
> much no background noise, characters seperated very clearly).
> The original image (Original.bmp) looks gets preprocessed by Tesseract to 
> a good binary image (Binary.bmp). 
> All characters are seperated clearly and no artifacts are present.
>
> When I am using Tesseract 3.0.2 the output of the OCR is correct in 
> detecting the first line as: "FfVvZzmrnebocC 10 cm2"
>
> When I update to Tesseract 3.0.4 the output is:
> "FvaszrnebocC 10 cm2"
>
> It joins the characters "fVv" to a single symbol "va" and the characters 
> "Zzm" to the single symbol "sz". If I cut each character and set the engine 
> to only detect one character, each is recognized correctly.
> I tried a lot of settings, like not loading the dictionary and setting
> "edges_max_children_per_outline 1"
> but nothing helped so far.
>
> Does anyone have any Idea how to improve the output? It's kind of strange, 
> that the old Tesseract Version performs so much better on these images.
>
> Thank you for your help!
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/55ac48d9-d54b-468f-8c38-bd8a57424ff1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Tesseract joins characters that are not touching

Reply via email to