Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2019-03-19 Thread ryan
Wondering if this issue was fixed in Tesseract 3.05.02. Any ideas?

On Friday, August 10, 2018 at 7:51:59 AM UTC-4, Mehul Bhardwaj wrote:
>
> Hi,
>
> I went through this discussion thread and updated to Tesseract 3.05.02. 
> Previously I was working with version 3.05. I was getting the same error of 
> "FAILURE: Couldn't find a matching blob" for about 15% of my training 
> characters. 
>
> But even after updating, I am still getting the exact same number of 
> errors as before.
>
> Could there be any other reason for this?
>
> I have about 174 training images, which are fairly identical in terms of 
> brightness, sharpness, background noise and have identical character 
> spacing, resolution.
> Out of 174 images, 48 images had no such error. 106 images had 5 or less 
> such errors. Each image has, on an average, 170 characters. So I am fairly 
> certain that the image type or other factors such as character size, 
> scaling, spacing has nothing to do with it.
>
> Any recommended tests to identify the issue will be very appreciated.
>
> Best Regards
> Mehul
>
> On Tuesday, June 5, 2018 at 9:23:16 PM UTC+5:30, Paul Kitchen wrote:
>>
>> Thank you for your help with these issues. The 3.05 branch now has all 
>> the issues fixed that I found.
>>
>> On Tuesday, June 5, 2018 at 8:59:08 AM UTC-6, zdenop wrote:
>>>
>>> Yes, it is ok, but you do not have to create separate issue for PR (PR 
>>> is a issue too)
>>>
>>> Zdenko
>>>
>>>
>>> ut 5. 6. 2018 o 16:52 Paul Kitchen  
>>> napísal(a):
>>>
 ZDenko,

 I'm new to this so hopefully I did everything correctly. Here is the 
 issue I created:

 https://github.com/tesseract-ocr/tesseract/issues/1636

 And here is the pull request:

 https://github.com/tesseract-ocr/tesseract/pull/1637

 On Tuesday, June 5, 2018 at 7:23:41 AM UTC-6, zdenop wrote:
>
> You need to fork official repository and then you have all permission 
> you need. When you make your changes you can send pull request to 
> official 
> repository with your changes.
>
> Zdenko
>
>
> ut 5. 6. 2018 o 15:06 Paul Kitchen  
> napísal(a):
>
>> ZDenko,
>>
>> Unfortunately I don't seem to have write permissions on the tesseract 
>> repo so I am unable to create a branch off of master to make the 
>> changes. 
>> Who do I need to lobby to get write permission?
>>
>> On Tuesday, June 5, 2018 at 3:00:23 AM UTC-6, zdenop wrote:
>>>
>>> Please make PR for master (4.0) branch and I will cherry-pick for 
>>> 3.05...
>>>
>>> Zdenko
>>>
>>>
>>> ut 5. 6. 2018 o 4:38 Paul Kitchen  
>>> napísal(a):
>>>
 ZDenko,

 I checked out the latest tesseract code and updated to branch 3.05. 
 I see that the int64_t area bug is already fixed (thanks!). I also see 
 that 
 the buffer read overrun is partially fixed. There is this line 
 in ReadAllBoxes():

 box_data.push_back('\0');

 Since the memory will have to be deleted and reallocated, this will 
 be quite inefficient. That is why I added this line to 
 LoadDataFromFile():

 data->reserve(size + 1);

 I'm willing to make the change in a feature branch then create the 
 pull request. I tried to create a branch in github but apparently I 
 don't 
 have branch creation privilege. I thought about forking but I'm not 
 familiar with how that works, or if it would even be appropriate. Can 
 you 
 either make the change yourself or grant me branch creation privilege 
 in 
 the repo so I can make the change in a branch then create a pull 
 request?

 By the way, I checked out master branch and it also has the same 
 problem in LoadDataFromFile().

 -- 
 You received this message because you are subscribed to the Google 
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, 
 send an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/tesseract-ocr/307a7e38-bb5d-4870-ac12-29c735c3c9f8%40googlegroups.com
  
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>> -- 
>> You received this message because you are subscribed to the Google 
>> Groups "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, 
>> send an email to 

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-08-10 Thread Mehul Bhardwaj
Hi,

I went through this discussion thread and updated to Tesseract 3.05.02. 
Previously I was working with version 3.05. I was getting the same error of 
"FAILURE: Couldn't find a matching blob" for about 15% of my training 
characters. 

But even after updating, I am still getting the exact same number of errors 
as before.

Could there be any other reason for this?

I have about 174 training images, which are fairly identical in terms of 
brightness, sharpness, background noise and have identical character 
spacing, resolution.
Out of 174 images, 48 images had no such error. 106 images had 5 or less 
such errors. Each image has, on an average, 170 characters. So I am fairly 
certain that the image type or other factors such as character size, 
scaling, spacing has nothing to do with it.

Any recommended tests to identify the issue will be very appreciated.

Best Regards
Mehul

On Tuesday, June 5, 2018 at 9:23:16 PM UTC+5:30, Paul Kitchen wrote:
>
> Thank you for your help with these issues. The 3.05 branch now has all the 
> issues fixed that I found.
>
> On Tuesday, June 5, 2018 at 8:59:08 AM UTC-6, zdenop wrote:
>>
>> Yes, it is ok, but you do not have to create separate issue for PR (PR is 
>> a issue too)
>>
>> Zdenko
>>
>>
>> ut 5. 6. 2018 o 16:52 Paul Kitchen  
>> napísal(a):
>>
>>> ZDenko,
>>>
>>> I'm new to this so hopefully I did everything correctly. Here is the 
>>> issue I created:
>>>
>>> https://github.com/tesseract-ocr/tesseract/issues/1636
>>>
>>> And here is the pull request:
>>>
>>> https://github.com/tesseract-ocr/tesseract/pull/1637
>>>
>>> On Tuesday, June 5, 2018 at 7:23:41 AM UTC-6, zdenop wrote:

 You need to fork official repository and then you have all permission 
 you need. When you make your changes you can send pull request to official 
 repository with your changes.

 Zdenko


 ut 5. 6. 2018 o 15:06 Paul Kitchen  
 napísal(a):

> ZDenko,
>
> Unfortunately I don't seem to have write permissions on the tesseract 
> repo so I am unable to create a branch off of master to make the changes. 
> Who do I need to lobby to get write permission?
>
> On Tuesday, June 5, 2018 at 3:00:23 AM UTC-6, zdenop wrote:
>>
>> Please make PR for master (4.0) branch and I will cherry-pick for 
>> 3.05...
>>
>> Zdenko
>>
>>
>> ut 5. 6. 2018 o 4:38 Paul Kitchen  
>> napísal(a):
>>
>>> ZDenko,
>>>
>>> I checked out the latest tesseract code and updated to branch 3.05. 
>>> I see that the int64_t area bug is already fixed (thanks!). I also see 
>>> that 
>>> the buffer read overrun is partially fixed. There is this line 
>>> in ReadAllBoxes():
>>>
>>> box_data.push_back('\0');
>>>
>>> Since the memory will have to be deleted and reallocated, this will 
>>> be quite inefficient. That is why I added this line to 
>>> LoadDataFromFile():
>>>
>>> data->reserve(size + 1);
>>>
>>> I'm willing to make the change in a feature branch then create the 
>>> pull request. I tried to create a branch in github but apparently I 
>>> don't 
>>> have branch creation privilege. I thought about forking but I'm not 
>>> familiar with how that works, or if it would even be appropriate. Can 
>>> you 
>>> either make the change yourself or grant me branch creation privilege 
>>> in 
>>> the repo so I can make the change in a branch then create a pull 
>>> request?
>>>
>>> By the way, I checked out master branch and it also has the same 
>>> problem in LoadDataFromFile().
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, 
>>> send an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/307a7e38-bb5d-4870-ac12-29c735c3c9f8%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> -- 
> You received this message because you are subscribed to the Google 
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to tesseract-oc...@googlegroups.com.
> To post to this group, send email to tesser...@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit 
> 

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-05 Thread Paul Kitchen
Thank you for your help with these issues. The 3.05 branch now has all the 
issues fixed that I found.

On Tuesday, June 5, 2018 at 8:59:08 AM UTC-6, zdenop wrote:
>
> Yes, it is ok, but you do not have to create separate issue for PR (PR is 
> a issue too)
>
> Zdenko
>
>
> ut 5. 6. 2018 o 16:52 Paul Kitchen  > napísal(a):
>
>> ZDenko,
>>
>> I'm new to this so hopefully I did everything correctly. Here is the 
>> issue I created:
>>
>> https://github.com/tesseract-ocr/tesseract/issues/1636
>>
>> And here is the pull request:
>>
>> https://github.com/tesseract-ocr/tesseract/pull/1637
>>
>> On Tuesday, June 5, 2018 at 7:23:41 AM UTC-6, zdenop wrote:
>>>
>>> You need to fork official repository and then you have all permission 
>>> you need. When you make your changes you can send pull request to official 
>>> repository with your changes.
>>>
>>> Zdenko
>>>
>>>
>>> ut 5. 6. 2018 o 15:06 Paul Kitchen  
>>> napísal(a):
>>>
 ZDenko,

 Unfortunately I don't seem to have write permissions on the tesseract 
 repo so I am unable to create a branch off of master to make the changes. 
 Who do I need to lobby to get write permission?

 On Tuesday, June 5, 2018 at 3:00:23 AM UTC-6, zdenop wrote:
>
> Please make PR for master (4.0) branch and I will cherry-pick for 
> 3.05...
>
> Zdenko
>
>
> ut 5. 6. 2018 o 4:38 Paul Kitchen  
> napísal(a):
>
>> ZDenko,
>>
>> I checked out the latest tesseract code and updated to branch 3.05. I 
>> see that the int64_t area bug is already fixed (thanks!). I also see 
>> that 
>> the buffer read overrun is partially fixed. There is this line 
>> in ReadAllBoxes():
>>
>> box_data.push_back('\0');
>>
>> Since the memory will have to be deleted and reallocated, this will 
>> be quite inefficient. That is why I added this line to 
>> LoadDataFromFile():
>>
>> data->reserve(size + 1);
>>
>> I'm willing to make the change in a feature branch then create the 
>> pull request. I tried to create a branch in github but apparently I 
>> don't 
>> have branch creation privilege. I thought about forking but I'm not 
>> familiar with how that works, or if it would even be appropriate. Can 
>> you 
>> either make the change yourself or grant me branch creation privilege in 
>> the repo so I can make the change in a branch then create a pull request?
>>
>> By the way, I checked out master branch and it also has the same 
>> problem in LoadDataFromFile().
>>
>> -- 
>> You received this message because you are subscribed to the Google 
>> Groups "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, 
>> send an email to tesseract-oc...@googlegroups.com.
>> To post to this group, send email to tesser...@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/307a7e38-bb5d-4870-ac12-29c735c3c9f8%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> -- 
 You received this message because you are subscribed to the Google 
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/tesseract-ocr/37ea9a46-ae6a-4782-b151-9edf90b6f532%40googlegroups.com
  
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/c048b1a4-759e-4e88-8675-a73ef62b69e1%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" 

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-05 Thread Zdenko Podobny
Yes, it is ok, but you do not have to create separate issue for PR (PR is a
issue too)

Zdenko


ut 5. 6. 2018 o 16:52 Paul Kitchen 
napísal(a):

> ZDenko,
>
> I'm new to this so hopefully I did everything correctly. Here is the issue
> I created:
>
> https://github.com/tesseract-ocr/tesseract/issues/1636
>
> And here is the pull request:
>
> https://github.com/tesseract-ocr/tesseract/pull/1637
>
> On Tuesday, June 5, 2018 at 7:23:41 AM UTC-6, zdenop wrote:
>>
>> You need to fork official repository and then you have all permission you
>> need. When you make your changes you can send pull request to official
>> repository with your changes.
>>
>> Zdenko
>>
>>
>> ut 5. 6. 2018 o 15:06 Paul Kitchen 
>> napísal(a):
>>
>>> ZDenko,
>>>
>>> Unfortunately I don't seem to have write permissions on the tesseract
>>> repo so I am unable to create a branch off of master to make the changes.
>>> Who do I need to lobby to get write permission?
>>>
>>> On Tuesday, June 5, 2018 at 3:00:23 AM UTC-6, zdenop wrote:

 Please make PR for master (4.0) branch and I will cherry-pick for
 3.05...

 Zdenko


 ut 5. 6. 2018 o 4:38 Paul Kitchen 
 napísal(a):

> ZDenko,
>
> I checked out the latest tesseract code and updated to branch 3.05. I
> see that the int64_t area bug is already fixed (thanks!). I also see that
> the buffer read overrun is partially fixed. There is this line
> in ReadAllBoxes():
>
> box_data.push_back('\0');
>
> Since the memory will have to be deleted and reallocated, this will be
> quite inefficient. That is why I added this line to LoadDataFromFile():
>
> data->reserve(size + 1);
>
> I'm willing to make the change in a feature branch then create the
> pull request. I tried to create a branch in github but apparently I don't
> have branch creation privilege. I thought about forking but I'm not
> familiar with how that works, or if it would even be appropriate. Can you
> either make the change yourself or grant me branch creation privilege in
> the repo so I can make the change in a branch then create a pull request?
>
> By the way, I checked out master branch and it also has the same
> problem in LoadDataFromFile().
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to tesseract-oc...@googlegroups.com.
> To post to this group, send email to tesser...@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/307a7e38-bb5d-4870-ac12-29c735c3c9f8%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
 --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/37ea9a46-ae6a-4782-b151-9edf90b6f532%40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/c048b1a4-759e-4e88-8675-a73ef62b69e1%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-05 Thread Paul Kitchen
ZDenko,

I'm new to this so hopefully I did everything correctly. Here is the issue 
I created:

https://github.com/tesseract-ocr/tesseract/issues/1636

And here is the pull request:

https://github.com/tesseract-ocr/tesseract/pull/1637

On Tuesday, June 5, 2018 at 7:23:41 AM UTC-6, zdenop wrote:
>
> You need to fork official repository and then you have all permission you 
> need. When you make your changes you can send pull request to official 
> repository with your changes.
>
> Zdenko
>
>
> ut 5. 6. 2018 o 15:06 Paul Kitchen  > napísal(a):
>
>> ZDenko,
>>
>> Unfortunately I don't seem to have write permissions on the tesseract 
>> repo so I am unable to create a branch off of master to make the changes. 
>> Who do I need to lobby to get write permission?
>>
>> On Tuesday, June 5, 2018 at 3:00:23 AM UTC-6, zdenop wrote:
>>>
>>> Please make PR for master (4.0) branch and I will cherry-pick for 3.05...
>>>
>>> Zdenko
>>>
>>>
>>> ut 5. 6. 2018 o 4:38 Paul Kitchen  
>>> napísal(a):
>>>
 ZDenko,

 I checked out the latest tesseract code and updated to branch 3.05. I 
 see that the int64_t area bug is already fixed (thanks!). I also see that 
 the buffer read overrun is partially fixed. There is this line 
 in ReadAllBoxes():

 box_data.push_back('\0');

 Since the memory will have to be deleted and reallocated, this will be 
 quite inefficient. That is why I added this line to LoadDataFromFile():

 data->reserve(size + 1);

 I'm willing to make the change in a feature branch then create the pull 
 request. I tried to create a branch in github but apparently I don't have 
 branch creation privilege. I thought about forking but I'm not familiar 
 with how that works, or if it would even be appropriate. Can you either 
 make the change yourself or grant me branch creation privilege in the repo 
 so I can make the change in a branch then create a pull request?

 By the way, I checked out master branch and it also has the same 
 problem in LoadDataFromFile().

 -- 
 You received this message because you are subscribed to the Google 
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/tesseract-ocr/307a7e38-bb5d-4870-ac12-29c735c3c9f8%40googlegroups.com
  
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/37ea9a46-ae6a-4782-b151-9edf90b6f532%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c048b1a4-759e-4e88-8675-a73ef62b69e1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-05 Thread Zdenko Podobny
You need to fork official repository and then you have all permission you
need. When you make your changes you can send pull request to official
repository with your changes.

Zdenko


ut 5. 6. 2018 o 15:06 Paul Kitchen 
napísal(a):

> ZDenko,
>
> Unfortunately I don't seem to have write permissions on the tesseract repo
> so I am unable to create a branch off of master to make the changes. Who do
> I need to lobby to get write permission?
>
> On Tuesday, June 5, 2018 at 3:00:23 AM UTC-6, zdenop wrote:
>>
>> Please make PR for master (4.0) branch and I will cherry-pick for 3.05...
>>
>> Zdenko
>>
>>
>> ut 5. 6. 2018 o 4:38 Paul Kitchen 
>> napísal(a):
>>
>>> ZDenko,
>>>
>>> I checked out the latest tesseract code and updated to branch 3.05. I
>>> see that the int64_t area bug is already fixed (thanks!). I also see that
>>> the buffer read overrun is partially fixed. There is this line
>>> in ReadAllBoxes():
>>>
>>> box_data.push_back('\0');
>>>
>>> Since the memory will have to be deleted and reallocated, this will be
>>> quite inefficient. That is why I added this line to LoadDataFromFile():
>>>
>>> data->reserve(size + 1);
>>>
>>> I'm willing to make the change in a feature branch then create the pull
>>> request. I tried to create a branch in github but apparently I don't have
>>> branch creation privilege. I thought about forking but I'm not familiar
>>> with how that works, or if it would even be appropriate. Can you either
>>> make the change yourself or grant me branch creation privilege in the repo
>>> so I can make the change in a branch then create a pull request?
>>>
>>> By the way, I checked out master branch and it also has the same problem
>>> in LoadDataFromFile().
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/307a7e38-bb5d-4870-ac12-29c735c3c9f8%40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/37ea9a46-ae6a-4782-b151-9edf90b6f532%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xfBAw67ZZH5QxvYu%3DPAE8LmwsAxACEN%2BUyexrWMWxgYQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-05 Thread Paul Kitchen
ZDenko,

Unfortunately I don't seem to have write permissions on the tesseract repo 
so I am unable to create a branch off of master to make the changes. Who do 
I need to lobby to get write permission?

On Tuesday, June 5, 2018 at 3:00:23 AM UTC-6, zdenop wrote:
>
> Please make PR for master (4.0) branch and I will cherry-pick for 3.05...
>
> Zdenko
>
>
> ut 5. 6. 2018 o 4:38 Paul Kitchen  > napísal(a):
>
>> ZDenko,
>>
>> I checked out the latest tesseract code and updated to branch 3.05. I see 
>> that the int64_t area bug is already fixed (thanks!). I also see that the 
>> buffer read overrun is partially fixed. There is this line 
>> in ReadAllBoxes():
>>
>> box_data.push_back('\0');
>>
>> Since the memory will have to be deleted and reallocated, this will be 
>> quite inefficient. That is why I added this line to LoadDataFromFile():
>>
>> data->reserve(size + 1);
>>
>> I'm willing to make the change in a feature branch then create the pull 
>> request. I tried to create a branch in github but apparently I don't have 
>> branch creation privilege. I thought about forking but I'm not familiar 
>> with how that works, or if it would even be appropriate. Can you either 
>> make the change yourself or grant me branch creation privilege in the repo 
>> so I can make the change in a branch then create a pull request?
>>
>> By the way, I checked out master branch and it also has the same problem 
>> in LoadDataFromFile().
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/307a7e38-bb5d-4870-ac12-29c735c3c9f8%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/37ea9a46-ae6a-4782-b151-9edf90b6f532%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-05 Thread Zdenko Podobny
Please make PR for master (4.0) branch and I will cherry-pick for 3.05...

Zdenko


ut 5. 6. 2018 o 4:38 Paul Kitchen 
napísal(a):

> ZDenko,
>
> I checked out the latest tesseract code and updated to branch 3.05. I see
> that the int64_t area bug is already fixed (thanks!). I also see that the
> buffer read overrun is partially fixed. There is this line
> in ReadAllBoxes():
>
> box_data.push_back('\0');
>
> Since the memory will have to be deleted and reallocated, this will be
> quite inefficient. That is why I added this line to LoadDataFromFile():
>
> data->reserve(size + 1);
>
> I'm willing to make the change in a feature branch then create the pull
> request. I tried to create a branch in github but apparently I don't have
> branch creation privilege. I thought about forking but I'm not familiar
> with how that works, or if it would even be appropriate. Can you either
> make the change yourself or grant me branch creation privilege in the repo
> so I can make the change in a branch then create a pull request?
>
> By the way, I checked out master branch and it also has the same problem
> in LoadDataFromFile().
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/307a7e38-bb5d-4870-ac12-29c735c3c9f8%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xmWtEb3Rv0yWTdWBcrtSmFbtzhWn7mbx5y%3DGtjg1E%3DxA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-04 Thread Paul Kitchen
Here is a sample of the problem it causes. I run the following to train the 
attached image and box file:

tesseract gdt.symbols.exp0.tif gdt.symbols.exp0 box.train

And here is the output:

Tesseract Open Source OCR Engine v3.05.00dev with Leptonica

Page 1
Bad box coordinates in boxfile string! ▌▌▌╦ÇƧ≡¿←
APPLY_BOXES:
   Boxes read from boxfile:   7
   Found 7 good blobs.
Generated training data for 3 words

The message about the bad box coordinates is caused because function 
ReadMemBoxes() reads memory past the end of the const char* box_data 
parameter.

With the fix I suggested, this is the output:

Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
Page 1
APPLY_BOXES:
   Boxes read from boxfile:   7
   Found 7 good blobs.
Generated training data for 3 words


On Monday, June 4, 2018 at 12:42:05 AM UTC-6, zdenop wrote:
>
> Paul,
>
> at the moment focus is on 4.0 release. But I understand that some user 
> still need/prefer to use 3.05.
>
> Can you create some test/demonstration case for you last bugfix? Is it not 
> fixed in 4.00...
>
> Zdenko
>
>
> ne 3. 6. 2018 o 4:03 Paul Kitchen  > napísal(a):
>
>> Zdenko,
>>
>> Thanks for making that fix. I am currently running tesseract from source 
>> on my computer. I've already made the fix on my source. However, if the fix 
>> were in an official release, then I could go back to using the officially 
>> released product.
>>
>> I did find one other bug that I fixed locally in my tesseract code. 
>> Unless this other bug were also fixed in the official version, then I 
>> wouldn't be able to leave my custom code. Here are the bug details:
>>
>> 1)  In file boxread.cpp, function ReadAllBoxes(), we convert 
>> GenericVector to const char* without a trailing ‘\0’. This can cause 
>> buffer read overrun inside the call to ReadMemBoxes(). To fix this, change 
>> function LoadDataFromFile() to always reserve an extra byte so the caller 
>> can add a ‘\0’ if they want. Then in ReadAllBoxes(), append ‘\0’ to the 
>> vector after calling LoadDataFromFile(). Here are the fixed functions:
>>
>>
>> inline bool LoadDataFromFile(const STRING& filename,
>>  GenericVector* data) {
>>   bool result = false;
>>   FILE* fp = fopen(filename.string(), "rb");
>>   if (fp != NULL) {
>> fseek(fp, 0, SEEK_END);
>> size_t size = ftell(fp);
>> fseek(fp, 0, SEEK_SET);
>> if (size > 0) {
>>   // reserve an extra byte in case caller wants to append a '\0' 
>> character
>>   data->reserve(size + 1);
>>   data->resize_no_init(size);
>>   result = fread(&(*data)[0], 1, size, fp) == size;
>> }
>> fclose(fp);
>>   }
>>   return result;
>> }
>>
>> bool ReadAllBoxes(int target_page, bool skip_blanks, const STRING& 
>> filename,
>>   GenericVector* boxes,
>>   GenericVector* texts,
>>   GenericVector* box_texts,
>>   GenericVector* pages) {
>>   GenericVector box_data;
>>   if (!tesseract::LoadDataFromFile(BoxFileName(filename), _data))
>> return false;
>>   box_data.push_back('\0');
>>   return ReadMemBoxes(target_page, skip_blanks, _data[0], boxes, 
>> texts,
>>   box_texts, pages);
>> }
>>
>>
>>
>> On Saturday, June 2, 2018 at 2:22:16 AM UTC-6, zdenop wrote:
>>>
>>> Please check if this is ok now. If yes, I am willing to make 3.05.02 
>>> release ;-)
>>>
>>> Zdenko
>>>
>>>
>>> so 2. 6. 2018 o 10:16 Zdenko Podobny  napísal(a):
>>>
 done in 
 https://github.com/tesseract-ocr/tesseract/commit/bc5dfc4b953babcc865f68a55c3bf415f4280b1a
 Zdenko


 št 31. 5. 2018 o 22:39 shree  napísal(a):

> This has been an issue for long. Thanks for finding the problem.
>
> Please submit a PR on github.
>
> On Friday, June 1, 2018 at 1:55:25 AM UTC+5:30, Paul Kitchen wrote:
>>
>> After a lot of stepping through tesseract code, I found the problem. 
>>
>> 1)  In file coutln.cpp, function C_OUTLINE::IsLegallyNested(), 
>> we assign outer_area() to an inT32, parent_area. Then lower in the 
>> function, we multiple child->outer_area() by parent_area. This caused an 
>> integer overflow which resulted in a bad sign for the multiplication. 
>> The 
>> fix was to make parent_area an inT64 so that integer overflow cannot 
>> happen.
>>
>>
>> The two 32-bit integers being multiplied were -51874 and 60218. The 
>> true result should be -3123748532 but the maximum result cannot be 
>> greater 
>> than 2^31 or you will have sign/overflow problems, which is the case 
>> here. 
>> The computer result was 1171218764, causing the if-statement to go down 
>> the 
>> wrong path.
>>
>> dfs
>>
>>
>>
>>
>>
>>
>>
>> -- 
> You received this message because you are subscribed to the Google 
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop 

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-04 Thread Zdenko Podobny
Stefan,

Paul suggest to modified also LoadDataFromFile (ccutil/genericvector.h).
That modification is not needed?

Zdenko


po 4. 6. 2018 o 17:32 'Stefan Weil' via tesseract-ocr <
tesseract-ocr@googlegroups.com> napísal(a):

> As far as I see 4.0.0 is good. I have sent a pull request which backports
> the fix from 4.0.0 (a simplified variant of Paul's fix) to 3.05.
>
> Stefan
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/e94cad01-0aa2-44c7-8f02-b20188afe91f%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wCf3a8VzfKrw-k%2BXvY86OirS9%3DXYp4k2RuS7W4skUNUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-04 Thread 'Stefan Weil' via tesseract-ocr
As far as I see 4.0.0 is good. I have sent a pull request which backports 
the fix from 4.0.0 (a simplified variant of Paul's fix) to 3.05.

Stefan

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e94cad01-0aa2-44c7-8f02-b20188afe91f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-04 Thread Zdenko Podobny
Paul,

at the moment focus is on 4.0 release. But I understand that some user
still need/prefer to use 3.05.

Can you create some test/demonstration case for you last bugfix? Is it not
fixed in 4.00...

Zdenko


ne 3. 6. 2018 o 4:03 Paul Kitchen 
napísal(a):

> Zdenko,
>
> Thanks for making that fix. I am currently running tesseract from source
> on my computer. I've already made the fix on my source. However, if the fix
> were in an official release, then I could go back to using the officially
> released product.
>
> I did find one other bug that I fixed locally in my tesseract code. Unless
> this other bug were also fixed in the official version, then I wouldn't be
> able to leave my custom code. Here are the bug details:
>
> 1)  In file boxread.cpp, function ReadAllBoxes(), we convert
> GenericVector to const char* without a trailing ‘\0’. This can cause
> buffer read overrun inside the call to ReadMemBoxes(). To fix this, change
> function LoadDataFromFile() to always reserve an extra byte so the caller
> can add a ‘\0’ if they want. Then in ReadAllBoxes(), append ‘\0’ to the
> vector after calling LoadDataFromFile(). Here are the fixed functions:
>
>
> inline bool LoadDataFromFile(const STRING& filename,
>  GenericVector* data) {
>   bool result = false;
>   FILE* fp = fopen(filename.string(), "rb");
>   if (fp != NULL) {
> fseek(fp, 0, SEEK_END);
> size_t size = ftell(fp);
> fseek(fp, 0, SEEK_SET);
> if (size > 0) {
>   // reserve an extra byte in case caller wants to append a '\0'
> character
>   data->reserve(size + 1);
>   data->resize_no_init(size);
>   result = fread(&(*data)[0], 1, size, fp) == size;
> }
> fclose(fp);
>   }
>   return result;
> }
>
> bool ReadAllBoxes(int target_page, bool skip_blanks, const STRING&
> filename,
>   GenericVector* boxes,
>   GenericVector* texts,
>   GenericVector* box_texts,
>   GenericVector* pages) {
>   GenericVector box_data;
>   if (!tesseract::LoadDataFromFile(BoxFileName(filename), _data))
> return false;
>   box_data.push_back('\0');
>   return ReadMemBoxes(target_page, skip_blanks, _data[0], boxes, texts
> ,
>   box_texts, pages);
> }
>
>
>
> On Saturday, June 2, 2018 at 2:22:16 AM UTC-6, zdenop wrote:
>>
>> Please check if this is ok now. If yes, I am willing to make 3.05.02
>> release ;-)
>>
>> Zdenko
>>
>>
>> so 2. 6. 2018 o 10:16 Zdenko Podobny  napísal(a):
>>
>>> done in
>>> https://github.com/tesseract-ocr/tesseract/commit/bc5dfc4b953babcc865f68a55c3bf415f4280b1a
>>> Zdenko
>>>
>>>
>>> št 31. 5. 2018 o 22:39 shree  napísal(a):
>>>
 This has been an issue for long. Thanks for finding the problem.

 Please submit a PR on github.

 On Friday, June 1, 2018 at 1:55:25 AM UTC+5:30, Paul Kitchen wrote:
>
> After a lot of stepping through tesseract code, I found the problem.
>
> 1)  In file coutln.cpp, function C_OUTLINE::IsLegallyNested(), we
> assign outer_area() to an inT32, parent_area. Then lower in the function,
> we multiple child->outer_area() by parent_area. This caused an integer
> overflow which resulted in a bad sign for the multiplication. The fix was
> to make parent_area an inT64 so that integer overflow cannot happen.
>
>
> The two 32-bit integers being multiplied were -51874 and 60218. The
> true result should be -3123748532 but the maximum result cannot be greater
> than 2^31 or you will have sign/overflow problems, which is the case here.
> The computer result was 1171218764, causing the if-statement to go down 
> the
> wrong path.
>
> dfs
>
>
>
>
>
>
>
> --
 You received this message because you are subscribed to the Google
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/tesseract-ocr/1ef0e822-9518-4cbb-af39-5a8ec6370d00%40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> 

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-02 Thread Paul Kitchen
Zdenko,

Thanks for making that fix. I am currently running tesseract from source on 
my computer. I've already made the fix on my source. However, if the fix 
were in an official release, then I could go back to using the officially 
released product.

I did find one other bug that I fixed locally in my tesseract code. Unless 
this other bug were also fixed in the official version, then I wouldn't be 
able to leave my custom code. Here are the bug details:

1)  In file boxread.cpp, function ReadAllBoxes(), we convert 
GenericVector to const char* without a trailing ‘\0’. This can cause 
buffer read overrun inside the call to ReadMemBoxes(). To fix this, change 
function LoadDataFromFile() to always reserve an extra byte so the caller 
can add a ‘\0’ if they want. Then in ReadAllBoxes(), append ‘\0’ to the 
vector after calling LoadDataFromFile(). Here are the fixed functions:


inline bool LoadDataFromFile(const STRING& filename,
 GenericVector* data) {
  bool result = false;
  FILE* fp = fopen(filename.string(), "rb");
  if (fp != NULL) {
fseek(fp, 0, SEEK_END);
size_t size = ftell(fp);
fseek(fp, 0, SEEK_SET);
if (size > 0) {
  // reserve an extra byte in case caller wants to append a '\0' 
character
  data->reserve(size + 1);
  data->resize_no_init(size);
  result = fread(&(*data)[0], 1, size, fp) == size;
}
fclose(fp);
  }
  return result;
}

bool ReadAllBoxes(int target_page, bool skip_blanks, const STRING& filename,
  GenericVector* boxes,
  GenericVector* texts,
  GenericVector* box_texts,
  GenericVector* pages) {
  GenericVector box_data;
  if (!tesseract::LoadDataFromFile(BoxFileName(filename), _data))
return false;
  box_data.push_back('\0');
  return ReadMemBoxes(target_page, skip_blanks, _data[0], boxes, texts,
  box_texts, pages);
}



On Saturday, June 2, 2018 at 2:22:16 AM UTC-6, zdenop wrote:
>
> Please check if this is ok now. If yes, I am willing to make 3.05.02 
> release ;-)
>
> Zdenko
>
>
> so 2. 6. 2018 o 10:16 Zdenko Podobny > 
> napísal(a):
>
>> done in 
>> https://github.com/tesseract-ocr/tesseract/commit/bc5dfc4b953babcc865f68a55c3bf415f4280b1a
>> Zdenko
>>
>>
>> št 31. 5. 2018 o 22:39 shree > 
>> napísal(a):
>>
>>> This has been an issue for long. Thanks for finding the problem.
>>>
>>> Please submit a PR on github.
>>>
>>> On Friday, June 1, 2018 at 1:55:25 AM UTC+5:30, Paul Kitchen wrote:

 After a lot of stepping through tesseract code, I found the problem. 

 1)  In file coutln.cpp, function C_OUTLINE::IsLegallyNested(), we 
 assign outer_area() to an inT32, parent_area. Then lower in the function, 
 we multiple child->outer_area() by parent_area. This caused an integer 
 overflow which resulted in a bad sign for the multiplication. The fix was 
 to make parent_area an inT64 so that integer overflow cannot happen.


 The two 32-bit integers being multiplied were -51874 and 60218. The 
 true result should be -3123748532 but the maximum result cannot be greater 
 than 2^31 or you will have sign/overflow problems, which is the case here. 
 The computer result was 1171218764, causing the if-statement to go down 
 the 
 wrong path.

 dfs







 -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com .
>>> To post to this group, send email to tesser...@googlegroups.com 
>>> .
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/1ef0e822-9518-4cbb-af39-5a8ec6370d00%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a1b4da88-cb3f-4663-8ffd-d0c911e7b351%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-02 Thread Zdenko Podobny
Please check if this is ok now. If yes, I am willing to make 3.05.02
release ;-)

Zdenko


so 2. 6. 2018 o 10:16 Zdenko Podobny  napísal(a):

> done in
> https://github.com/tesseract-ocr/tesseract/commit/bc5dfc4b953babcc865f68a55c3bf415f4280b1a
> Zdenko
>
>
> št 31. 5. 2018 o 22:39 shree  napísal(a):
>
>> This has been an issue for long. Thanks for finding the problem.
>>
>> Please submit a PR on github.
>>
>> On Friday, June 1, 2018 at 1:55:25 AM UTC+5:30, Paul Kitchen wrote:
>>>
>>> After a lot of stepping through tesseract code, I found the problem.
>>>
>>> 1)  In file coutln.cpp, function C_OUTLINE::IsLegallyNested(), we
>>> assign outer_area() to an inT32, parent_area. Then lower in the function,
>>> we multiple child->outer_area() by parent_area. This caused an integer
>>> overflow which resulted in a bad sign for the multiplication. The fix was
>>> to make parent_area an inT64 so that integer overflow cannot happen.
>>>
>>>
>>> The two 32-bit integers being multiplied were -51874 and 60218. The true
>>> result should be -3123748532 but the maximum result cannot be greater than
>>> 2^31 or you will have sign/overflow problems, which is the case here. The
>>> computer result was 1171218764, causing the if-statement to go down the
>>> wrong path.
>>>
>>> dfs
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/1ef0e822-9518-4cbb-af39-5a8ec6370d00%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zGR4pWG1u0cgy8POkv9w2EJjCp_-ZGez%2BUWTgif3W9BQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-02 Thread Zdenko Podobny
done in
https://github.com/tesseract-ocr/tesseract/commit/bc5dfc4b953babcc865f68a55c3bf415f4280b1a
Zdenko


št 31. 5. 2018 o 22:39 shree  napísal(a):

> This has been an issue for long. Thanks for finding the problem.
>
> Please submit a PR on github.
>
> On Friday, June 1, 2018 at 1:55:25 AM UTC+5:30, Paul Kitchen wrote:
>>
>> After a lot of stepping through tesseract code, I found the problem.
>>
>> 1)  In file coutln.cpp, function C_OUTLINE::IsLegallyNested(), we
>> assign outer_area() to an inT32, parent_area. Then lower in the function,
>> we multiple child->outer_area() by parent_area. This caused an integer
>> overflow which resulted in a bad sign for the multiplication. The fix was
>> to make parent_area an inT64 so that integer overflow cannot happen.
>>
>>
>> The two 32-bit integers being multiplied were -51874 and 60218. The true
>> result should be -3123748532 but the maximum result cannot be greater than
>> 2^31 or you will have sign/overflow problems, which is the case here. The
>> computer result was 1171218764, causing the if-statement to go down the
>> wrong path.
>>
>> dfs
>>
>>
>>
>>
>>
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/1ef0e822-9518-4cbb-af39-5a8ec6370d00%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xcNewAfPEXSj1UdmG-8WiD7sEcF9EgETeqWn2xKyVWhQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.