[tesseract-ocr] Re: Mathematical equation detection & recognition

2020-05-28 Thread Владимир Калачихин
четверг, 28 мая 2020 г., 14:59:05 UTC+3 пользователь Weslley Torres написал: > I though we should use "equ" instead of "eng" for equations detection. I > mean, how "eng" would recognise Greek letters? And Greek letters are > commonly used in equations. > No. Base concept of my naive

[tesseract-ocr] Re: Mathematical equation detection & recognition

2020-05-28 Thread Weslley Torres
Hi.. Yes, indeed the equations are recognised very bad =/. You are correct, "equ" only works with legacy engine, but I though we should use "equ" instead of "eng" for equations detection. I mean, how "eng" would recognise Greek letters? And Greek letters are commonly used in equations. In

[tesseract-ocr] Re: Mathematical equation detection & recognition

2020-05-28 Thread Владимир Калачихин
Hi Weslley! четверг, 28 мая 2020 г., 2:42:23 UTC+3 пользователь Weslley Torres написал: > > probably you have done it already, but in any case.. > Yes, I did. The equations are recognized very bad, with textord_equation_detect=1 or without. This works with the legacy engine only, LSTM does not

[tesseract-ocr] Re: Mathematical equation detection & recognition

2020-05-27 Thread Weslley Torres
Hi, probably you have done it already, but in any case.. in line 40, try it: ocrData = pytesseract.image_to_data(thresh, output_type=Output.DICT, config='--tessdata-dir /new/folder/address/Share/ --oem 0 -c textord_equation_detect=1', lang='equ') Please create one folder with the files

[tesseract-ocr] Re: Mathematical equation detection & recognition

2020-05-27 Thread Weslley Torres
thank you very much, I will have a look at it =). Kind regards, Em quarta-feira, 27 de maio de 2020 23:01:48 UTC+2, Владимир Калачихин escreveu: > > This is not a production code, just sketch. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr"

[tesseract-ocr] Re: Mathematical equation detection & recognition

2020-05-27 Thread Владимир Калачихин
This is not a production code, just sketch. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion

[tesseract-ocr] Re: Mathematical equation detection & recognition

2020-05-27 Thread Weslley Torres
Hi!! I think what you accomplished is good enough for me. Do you mind sharing your code/script? Kind regards Em quarta-feira, 27 de maio de 2020 18:20:43 UTC+2, Владимир Калачихин escreveu: > > Hi Weslley > среда, 27 мая 2020 г., 18:02:59 UTC+3 пользователь Weslley Torres написал: >> >> >>

[tesseract-ocr] Re: Mathematical equation detection & recognition

2020-05-27 Thread Владимир Калачихин
Hi Weslley среда, 27 мая 2020 г., 18:02:59 UTC+3 пользователь Weslley Torres написал: > > > Did you manage to detect the area of equations in a picture? > > I did it by naive approsh via consolidate areas with bad recognited symbols: [image: Снимок экрана в 2020-05-18 00-10-39.png] It is no so

[tesseract-ocr] Re: Mathematical equation detection & recognition

2020-05-27 Thread Weslley Torres
Hi, I have a similar situation, in my case I "just" need to identify/detect the equation in the picture. I don't need to "read" it. Known the location is enough for me, just like the paper you mentioned "A Simple Equation Region Detector for Printed Document Images in Tesseract

[tesseract-ocr] Re: Mathematical equation detection & recognition

2020-05-27 Thread Владимир Калачихин
Heh, "equ" language is not present on language-specific.sh, so training Tesseract 4 to math symbols impossible. Common question: Is there a real way to create a language model from scratch? For new, unknown language? -- You received this message because you are subscribed to the Google

[tesseract-ocr] Re: Mathematical equation detection & recognition

2020-05-20 Thread Владимир Калачихин
As point in https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html : "equ Math / equation detection module" not present in Tesseract 4. But trainerdata present. Does this mean that I must retrain the equ module from scratch? -- You received this message because you are