Hi, I went through this discussion thread and updated to Tesseract 3.05.02. Previously I was working with version 3.05. I was getting the same error of "FAILURE: Couldn't find a matching blob" for about 15% of my training characters.
But even after updating, I am still getting the exact same number of errors as before. Could there be any other reason for this? I have about 174 training images, which are fairly identical in terms of brightness, sharpness, background noise and have identical character spacing, resolution. Out of 174 images, 48 images had no such error. 106 images had 5 or less such errors. Each image has, on an average, 170 characters. So I am fairly certain that the image type or other factors such as character size, scaling, spacing has nothing to do with it. Any recommended tests to identify the issue will be very appreciated. Best Regards Mehul On Tuesday, June 5, 2018 at 9:23:16 PM UTC+5:30, Paul Kitchen wrote: > > Thank you for your help with these issues. The 3.05 branch now has all the > issues fixed that I found. > > On Tuesday, June 5, 2018 at 8:59:08 AM UTC-6, zdenop wrote: >> >> Yes, it is ok, but you do not have to create separate issue for PR (PR is >> a issue too) >> >> Zdenko >> >> >> ut 5. 6. 2018 o 16:52 Paul Kitchen <[email protected]> >> napísal(a): >> >>> ZDenko, >>> >>> I'm new to this so hopefully I did everything correctly. Here is the >>> issue I created: >>> >>> https://github.com/tesseract-ocr/tesseract/issues/1636 >>> >>> And here is the pull request: >>> >>> https://github.com/tesseract-ocr/tesseract/pull/1637 >>> >>> On Tuesday, June 5, 2018 at 7:23:41 AM UTC-6, zdenop wrote: >>>> >>>> You need to fork official repository and then you have all permission >>>> you need. When you make your changes you can send pull request to official >>>> repository with your changes. >>>> >>>> Zdenko >>>> >>>> >>>> ut 5. 6. 2018 o 15:06 Paul Kitchen <[email protected]> >>>> napísal(a): >>>> >>>>> ZDenko, >>>>> >>>>> Unfortunately I don't seem to have write permissions on the tesseract >>>>> repo so I am unable to create a branch off of master to make the changes. >>>>> Who do I need to lobby to get write permission? >>>>> >>>>> On Tuesday, June 5, 2018 at 3:00:23 AM UTC-6, zdenop wrote: >>>>>> >>>>>> Please make PR for master (4.0) branch and I will cherry-pick for >>>>>> 3.05... >>>>>> >>>>>> Zdenko >>>>>> >>>>>> >>>>>> ut 5. 6. 2018 o 4:38 Paul Kitchen <[email protected]> >>>>>> napísal(a): >>>>>> >>>>>>> ZDenko, >>>>>>> >>>>>>> I checked out the latest tesseract code and updated to branch 3.05. >>>>>>> I see that the int64_t area bug is already fixed (thanks!). I also see >>>>>>> that >>>>>>> the buffer read overrun is partially fixed. There is this line >>>>>>> in ReadAllBoxes(): >>>>>>> >>>>>>> box_data.push_back('\0'); >>>>>>> >>>>>>> Since the memory will have to be deleted and reallocated, this will >>>>>>> be quite inefficient. That is why I added this line to >>>>>>> LoadDataFromFile(): >>>>>>> >>>>>>> data->reserve(size + 1); >>>>>>> >>>>>>> I'm willing to make the change in a feature branch then create the >>>>>>> pull request. I tried to create a branch in github but apparently I >>>>>>> don't >>>>>>> have branch creation privilege. I thought about forking but I'm not >>>>>>> familiar with how that works, or if it would even be appropriate. Can >>>>>>> you >>>>>>> either make the change yourself or grant me branch creation privilege >>>>>>> in >>>>>>> the repo so I can make the change in a branch then create a pull >>>>>>> request? >>>>>>> >>>>>>> By the way, I checked out master branch and it also has the same >>>>>>> problem in LoadDataFromFile(). >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/307a7e38-bb5d-4870-ac12-29c735c3c9f8%40googlegroups.com >>>>>>> >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/307a7e38-bb5d-4870-ac12-29c735c3c9f8%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/37ea9a46-ae6a-4782-b151-9edf90b6f532%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/37ea9a46-ae6a-4782-b151-9edf90b6f532%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/c048b1a4-759e-4e88-8675-a73ef62b69e1%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/c048b1a4-759e-4e88-8675-a73ef62b69e1%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/cf634bae-84e4-40e6-b2f3-c9ff2302d40e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

