Paul,

at the moment focus is on 4.0 release. But I understand that some user
still need/prefer to use 3.05.

Can you create some test/demonstration case for you last bugfix? Is it not
fixed in 4.00...

Zdenko


ne 3. 6. 2018 o 4:03 Paul Kitchen <[email protected]>
napísal(a):

> Zdenko,
>
> Thanks for making that fix. I am currently running tesseract from source
> on my computer. I've already made the fix on my source. However, if the fix
> were in an official release, then I could go back to using the officially
> released product.
>
> I did find one other bug that I fixed locally in my tesseract code. Unless
> this other bug were also fixed in the official version, then I wouldn't be
> able to leave my custom code. Here are the bug details:
>
> 1)      In file boxread.cpp, function ReadAllBoxes(), we convert
> GenericVector<char> to const char* without a trailing ‘\0’. This can cause
> buffer read overrun inside the call to ReadMemBoxes(). To fix this, change
> function LoadDataFromFile() to always reserve an extra byte so the caller
> can add a ‘\0’ if they want. Then in ReadAllBoxes(), append ‘\0’ to the
> vector after calling LoadDataFromFile(). Here are the fixed functions:
>
>
> inline bool LoadDataFromFile(const STRING& filename,
>                              GenericVector<char>* data) {
>   bool result = false;
>   FILE* fp = fopen(filename.string(), "rb");
>   if (fp != NULL) {
>     fseek(fp, 0, SEEK_END);
>     size_t size = ftell(fp);
>     fseek(fp, 0, SEEK_SET);
>     if (size > 0) {
>       // reserve an extra byte in case caller wants to append a '\0'
> character
>       data->reserve(size + 1);
>       data->resize_no_init(size);
>       result = fread(&(*data)[0], 1, size, fp) == size;
>     }
>     fclose(fp);
>   }
>   return result;
> }
>
> bool ReadAllBoxes(int target_page, bool skip_blanks, const STRING&
> filename,
>                   GenericVector<TBOX>* boxes,
>                   GenericVector<STRING>* texts,
>                   GenericVector<STRING>* box_texts,
>                   GenericVector<int>* pages) {
>   GenericVector<char> box_data;
>   if (!tesseract::LoadDataFromFile(BoxFileName(filename), &box_data))
>     return false;
>   box_data.push_back('\0');
>   return ReadMemBoxes(target_page, skip_blanks, &box_data[0], boxes, texts
> ,
>                       box_texts, pages);
> }
>
>
>
> On Saturday, June 2, 2018 at 2:22:16 AM UTC-6, zdenop wrote:
>>
>> Please check if this is ok now. If yes, I am willing to make 3.05.02
>> release ;-)
>>
>> Zdenko
>>
>>
>> so 2. 6. 2018 o 10:16 Zdenko Podobny <[email protected]> napísal(a):
>>
>>> done in
>>> https://github.com/tesseract-ocr/tesseract/commit/bc5dfc4b953babcc865f68a55c3bf415f4280b1a
>>> Zdenko
>>>
>>>
>>> št 31. 5. 2018 o 22:39 shree <[email protected]> napísal(a):
>>>
>>>> This has been an issue for long. Thanks for finding the problem.
>>>>
>>>> Please submit a PR on github.
>>>>
>>>> On Friday, June 1, 2018 at 1:55:25 AM UTC+5:30, Paul Kitchen wrote:
>>>>>
>>>>> After a lot of stepping through tesseract code, I found the problem.
>>>>>
>>>>> 1)      In file coutln.cpp, function C_OUTLINE::IsLegallyNested(), we
>>>>> assign outer_area() to an inT32, parent_area. Then lower in the function,
>>>>> we multiple child->outer_area() by parent_area. This caused an integer
>>>>> overflow which resulted in a bad sign for the multiplication. The fix was
>>>>> to make parent_area an inT64 so that integer overflow cannot happen.
>>>>>
>>>>>
>>>>> The two 32-bit integers being multiplied were -51874 and 60218. The
>>>>> true result should be -3123748532 but the maximum result cannot be greater
>>>>> than 2^31 or you will have sign/overflow problems, which is the case here.
>>>>> The computer result was 1171218764, causing the if-statement to go down 
>>>>> the
>>>>> wrong path.
>>>>>
>>>>> dfs
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/1ef0e822-9518-4cbb-af39-5a8ec6370d00%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/1ef0e822-9518-4cbb-af39-5a8ec6370d00%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/a1b4da88-cb3f-4663-8ffd-d0c911e7b351%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/a1b4da88-cb3f-4663-8ffd-d0c911e7b351%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wM_He39n2ZUjbsvC28su-O%2BXHm%3D%2BkaPvwYiMZQcgwshw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to