Paul, at the moment focus is on 4.0 release. But I understand that some user still need/prefer to use 3.05.
Can you create some test/demonstration case for you last bugfix? Is it not fixed in 4.00... Zdenko ne 3. 6. 2018 o 4:03 Paul Kitchen <[email protected]> napísal(a): > Zdenko, > > Thanks for making that fix. I am currently running tesseract from source > on my computer. I've already made the fix on my source. However, if the fix > were in an official release, then I could go back to using the officially > released product. > > I did find one other bug that I fixed locally in my tesseract code. Unless > this other bug were also fixed in the official version, then I wouldn't be > able to leave my custom code. Here are the bug details: > > 1) In file boxread.cpp, function ReadAllBoxes(), we convert > GenericVector<char> to const char* without a trailing ‘\0’. This can cause > buffer read overrun inside the call to ReadMemBoxes(). To fix this, change > function LoadDataFromFile() to always reserve an extra byte so the caller > can add a ‘\0’ if they want. Then in ReadAllBoxes(), append ‘\0’ to the > vector after calling LoadDataFromFile(). Here are the fixed functions: > > > inline bool LoadDataFromFile(const STRING& filename, > GenericVector<char>* data) { > bool result = false; > FILE* fp = fopen(filename.string(), "rb"); > if (fp != NULL) { > fseek(fp, 0, SEEK_END); > size_t size = ftell(fp); > fseek(fp, 0, SEEK_SET); > if (size > 0) { > // reserve an extra byte in case caller wants to append a '\0' > character > data->reserve(size + 1); > data->resize_no_init(size); > result = fread(&(*data)[0], 1, size, fp) == size; > } > fclose(fp); > } > return result; > } > > bool ReadAllBoxes(int target_page, bool skip_blanks, const STRING& > filename, > GenericVector<TBOX>* boxes, > GenericVector<STRING>* texts, > GenericVector<STRING>* box_texts, > GenericVector<int>* pages) { > GenericVector<char> box_data; > if (!tesseract::LoadDataFromFile(BoxFileName(filename), &box_data)) > return false; > box_data.push_back('\0'); > return ReadMemBoxes(target_page, skip_blanks, &box_data[0], boxes, texts > , > box_texts, pages); > } > > > > On Saturday, June 2, 2018 at 2:22:16 AM UTC-6, zdenop wrote: >> >> Please check if this is ok now. If yes, I am willing to make 3.05.02 >> release ;-) >> >> Zdenko >> >> >> so 2. 6. 2018 o 10:16 Zdenko Podobny <[email protected]> napísal(a): >> >>> done in >>> https://github.com/tesseract-ocr/tesseract/commit/bc5dfc4b953babcc865f68a55c3bf415f4280b1a >>> Zdenko >>> >>> >>> št 31. 5. 2018 o 22:39 shree <[email protected]> napísal(a): >>> >>>> This has been an issue for long. Thanks for finding the problem. >>>> >>>> Please submit a PR on github. >>>> >>>> On Friday, June 1, 2018 at 1:55:25 AM UTC+5:30, Paul Kitchen wrote: >>>>> >>>>> After a lot of stepping through tesseract code, I found the problem. >>>>> >>>>> 1) In file coutln.cpp, function C_OUTLINE::IsLegallyNested(), we >>>>> assign outer_area() to an inT32, parent_area. Then lower in the function, >>>>> we multiple child->outer_area() by parent_area. This caused an integer >>>>> overflow which resulted in a bad sign for the multiplication. The fix was >>>>> to make parent_area an inT64 so that integer overflow cannot happen. >>>>> >>>>> >>>>> The two 32-bit integers being multiplied were -51874 and 60218. The >>>>> true result should be -3123748532 but the maximum result cannot be greater >>>>> than 2^31 or you will have sign/overflow problems, which is the case here. >>>>> The computer result was 1171218764, causing the if-statement to go down >>>>> the >>>>> wrong path. >>>>> >>>>> dfs >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/1ef0e822-9518-4cbb-af39-5a8ec6370d00%40googlegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/1ef0e822-9518-4cbb-af39-5a8ec6370d00%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/a1b4da88-cb3f-4663-8ffd-d0c911e7b351%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/a1b4da88-cb3f-4663-8ffd-d0c911e7b351%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wM_He39n2ZUjbsvC28su-O%2BXHm%3D%2BkaPvwYiMZQcgwshw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

