Hi,

I went through this discussion thread and updated to Tesseract 3.05.02. 
Previously I was working with version 3.05. I was getting the same error of 
"FAILURE: Couldn't find a matching blob" for about 15% of my training 
characters. 

But even after updating, I am still getting the exact same number of errors 
as before.

Could there be any other reason for this?

I have about 174 training images, which are fairly identical in terms of 
brightness, sharpness, background noise and have identical character 
spacing, resolution.
Out of 174 images, 48 images had no such error. 106 images had 5 or less 
such errors. Each image has, on an average, 170 characters. So I am fairly 
certain that the image type or other factors such as character size, 
scaling, spacing has nothing to do with it.

Any recommended tests to identify the issue will be very appreciated.

Best Regards
Mehul

On Tuesday, June 5, 2018 at 9:23:16 PM UTC+5:30, Paul Kitchen wrote:
>
> Thank you for your help with these issues. The 3.05 branch now has all the 
> issues fixed that I found.
>
> On Tuesday, June 5, 2018 at 8:59:08 AM UTC-6, zdenop wrote:
>>
>> Yes, it is ok, but you do not have to create separate issue for PR (PR is 
>> a issue too)
>>
>> Zdenko
>>
>>
>> ut 5. 6. 2018 o 16:52 Paul Kitchen <paul.k...@hexagonmetrology.com> 
>> napísal(a):
>>
>>> ZDenko,
>>>
>>> I'm new to this so hopefully I did everything correctly. Here is the 
>>> issue I created:
>>>
>>> https://github.com/tesseract-ocr/tesseract/issues/1636
>>>
>>> And here is the pull request:
>>>
>>> https://github.com/tesseract-ocr/tesseract/pull/1637
>>>
>>> On Tuesday, June 5, 2018 at 7:23:41 AM UTC-6, zdenop wrote:
>>>>
>>>> You need to fork official repository and then you have all permission 
>>>> you need. When you make your changes you can send pull request to official 
>>>> repository with your changes.
>>>>
>>>> Zdenko
>>>>
>>>>
>>>> ut 5. 6. 2018 o 15:06 Paul Kitchen <paul.k...@hexagonmetrology.com> 
>>>> napísal(a):
>>>>
>>>>> ZDenko,
>>>>>
>>>>> Unfortunately I don't seem to have write permissions on the tesseract 
>>>>> repo so I am unable to create a branch off of master to make the changes. 
>>>>> Who do I need to lobby to get write permission?
>>>>>
>>>>> On Tuesday, June 5, 2018 at 3:00:23 AM UTC-6, zdenop wrote:
>>>>>>
>>>>>> Please make PR for master (4.0) branch and I will cherry-pick for 
>>>>>> 3.05...
>>>>>>
>>>>>> Zdenko
>>>>>>
>>>>>>
>>>>>> ut 5. 6. 2018 o 4:38 Paul Kitchen <paul.k...@hexagonmetrology.com> 
>>>>>> napísal(a):
>>>>>>
>>>>>>> ZDenko,
>>>>>>>
>>>>>>> I checked out the latest tesseract code and updated to branch 3.05. 
>>>>>>> I see that the int64_t area bug is already fixed (thanks!). I also see 
>>>>>>> that 
>>>>>>> the buffer read overrun is partially fixed. There is this line 
>>>>>>> in ReadAllBoxes():
>>>>>>>
>>>>>>> box_data.push_back('\0');
>>>>>>>
>>>>>>> Since the memory will have to be deleted and reallocated, this will 
>>>>>>> be quite inefficient. That is why I added this line to 
>>>>>>> LoadDataFromFile():
>>>>>>>
>>>>>>> data->reserve(size + 1);
>>>>>>>
>>>>>>> I'm willing to make the change in a feature branch then create the 
>>>>>>> pull request. I tried to create a branch in github but apparently I 
>>>>>>> don't 
>>>>>>> have branch creation privilege. I thought about forking but I'm not 
>>>>>>> familiar with how that works, or if it would even be appropriate. Can 
>>>>>>> you 
>>>>>>> either make the change yourself or grant me branch creation privilege 
>>>>>>> in 
>>>>>>> the repo so I can make the change in a branch then create a pull 
>>>>>>> request?
>>>>>>>
>>>>>>> By the way, I checked out master branch and it also has the same 
>>>>>>> problem in LoadDataFromFile().
>>>>>>>
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/307a7e38-bb5d-4870-ac12-29c735c3c9f8%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/307a7e38-bb5d-4870-ac12-29c735c3c9f8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/37ea9a46-ae6a-4782-b151-9edf90b6f532%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/37ea9a46-ae6a-4782-b151-9edf90b6f532%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/c048b1a4-759e-4e88-8675-a73ef62b69e1%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/c048b1a4-759e-4e88-8675-a73ef62b69e1%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/cf634bae-84e4-40e6-b2f3-c9ff2302d40e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to