Hi, Can you tell me how did you extract the binary-intermediate image created by Tesseract?
On Saturday, June 18, 2016 at 10:16:57 PM UTC+5:30, Julian Einhaus wrote: > > Hi, > I am trying to read three lines of text on a well defined image (pretty > much no background noise, characters seperated very clearly). > The original image (Original.bmp) looks gets preprocessed by Tesseract to > a good binary image (Binary.bmp). > All characters are seperated clearly and no artifacts are present. > > When I am using Tesseract 3.0.2 the output of the OCR is correct in > detecting the first line as: "FfVvZzmrnebocC 10 cm2" > > When I update to Tesseract 3.0.4 the output is: > "FvaszrnebocC 10 cm2" > > It joins the characters "fVv" to a single symbol "va" and the characters > "Zzm" to the single symbol "sz". If I cut each character and set the engine > to only detect one character, each is recognized correctly. > I tried a lot of settings, like not loading the dictionary and setting > "edges_max_children_per_outline 1" > but nothing helped so far. > > Does anyone have any Idea how to improve the output? It's kind of strange, > that the old Tesseract Version performs so much better on these images. > > Thank you for your help! > > > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/55ac48d9-d54b-468f-8c38-bd8a57424ff1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.