Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-08-10 Thread Mehul Bhardwaj
Hi,

I went through this discussion thread and updated to Tesseract 3.05.02. 
Previously I was working with version 3.05. I was getting the same error of 
"FAILURE: Couldn't find a matching blob" for about 15% of my training 
characters. 

But even after updating, I am still getting the exact same number of errors 
as before.

Could there be any other reason for this?

I have about 174 training images, which are fairly identical in terms of 
brightness, sharpness, background noise and have identical character 
spacing, resolution.
Out of 174 images, 48 images had no such error. 106 images had 5 or less 
such errors. Each image has, on an average, 170 characters. So I am fairly 
certain that the image type or other factors such as character size, 
scaling, spacing has nothing to do with it.

Any recommended tests to identify the issue will be very appreciated.

Best Regards
Mehul

On Tuesday, June 5, 2018 at 9:23:16 PM UTC+5:30, Paul Kitchen wrote:
>
> Thank you for your help with these issues. The 3.05 branch now has all the 
> issues fixed that I found.
>
> On Tuesday, June 5, 2018 at 8:59:08 AM UTC-6, zdenop wrote:
>>
>> Yes, it is ok, but you do not have to create separate issue for PR (PR is 
>> a issue too)
>>
>> Zdenko
>>
>>
>> ut 5. 6. 2018 o 16:52 Paul Kitchen  
>> napísal(a):
>>
>>> ZDenko,
>>>
>>> I'm new to this so hopefully I did everything correctly. Here is the 
>>> issue I created:
>>>
>>> https://github.com/tesseract-ocr/tesseract/issues/1636
>>>
>>> And here is the pull request:
>>>
>>> https://github.com/tesseract-ocr/tesseract/pull/1637
>>>
>>> On Tuesday, June 5, 2018 at 7:23:41 AM UTC-6, zdenop wrote:

 You need to fork official repository and then you have all permission 
 you need. When you make your changes you can send pull request to official 
 repository with your changes.

 Zdenko


 ut 5. 6. 2018 o 15:06 Paul Kitchen  
 napísal(a):

> ZDenko,
>
> Unfortunately I don't seem to have write permissions on the tesseract 
> repo so I am unable to create a branch off of master to make the changes. 
> Who do I need to lobby to get write permission?
>
> On Tuesday, June 5, 2018 at 3:00:23 AM UTC-6, zdenop wrote:
>>
>> Please make PR for master (4.0) branch and I will cherry-pick for 
>> 3.05...
>>
>> Zdenko
>>
>>
>> ut 5. 6. 2018 o 4:38 Paul Kitchen  
>> napísal(a):
>>
>>> ZDenko,
>>>
>>> I checked out the latest tesseract code and updated to branch 3.05. 
>>> I see that the int64_t area bug is already fixed (thanks!). I also see 
>>> that 
>>> the buffer read overrun is partially fixed. There is this line 
>>> in ReadAllBoxes():
>>>
>>> box_data.push_back('\0');
>>>
>>> Since the memory will have to be deleted and reallocated, this will 
>>> be quite inefficient. That is why I added this line to 
>>> LoadDataFromFile():
>>>
>>> data->reserve(size + 1);
>>>
>>> I'm willing to make the change in a feature branch then create the 
>>> pull request. I tried to create a branch in github but apparently I 
>>> don't 
>>> have branch creation privilege. I thought about forking but I'm not 
>>> familiar with how that works, or if it would even be appropriate. Can 
>>> you 
>>> either make the change yourself or grant me branch creation privilege 
>>> in 
>>> the repo so I can make the change in a branch then create a pull 
>>> request?
>>>
>>> By the way, I checked out master branch and it also has the same 
>>> problem in LoadDataFromFile().
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, 
>>> send an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/307a7e38-bb5d-4870-ac12-29c735c3c9f8%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> -- 
> You received this message because you are subscribed to the Google 
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to tesseract-oc...@googlegroups.com.
> To post to this group, send email to tesser...@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit 
> 

[tesseract-ocr] Python wrappers for Tesseract

2018-04-13 Thread Mehul Bhardwaj
Hi, 

I am trying to train Tesseract-3.05 for a new language.

I am processing images using PIL and then need some post processing on the 
predicted text. For this I am using python. I use a python-tesseract 
wrapper in my python code to return the box file data in the form of a 
dictionary.

My question is this: When I train tesseract on my local machine, do such 
wrappers also give me an updated output? Or will training tesseract 
separately have no effect on the output of the wrapper? If this is true (no 
effect) how can I update the wrapper so that it gives me updated output 
after every round of training?

I am working on Ubuntu 16.04, tesseract-3.05, python-2.7 and using 
Pytesseract/tesserocr. 

Best Regards
Mehul

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a5493a0b-d30c-4e7c-96af-3911d68e2fb7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.