четверг, 28 мая 2020 г., 14:59:05 UTC+3 пользователь Weslley Torres
написал:
> I though we should use "equ" instead of "eng" for equations detection. I
> mean, how "eng" would recognise Greek letters? And Greek letters are
> commonly used in equations.
>
No. Base concept of my naive
Hi..
Yes, indeed the equations are recognised very bad =/. You are correct,
"equ" only works with legacy engine, but I though we should use "equ"
instead of "eng" for equations detection. I mean, how "eng" would recognise
Greek letters? And Greek letters are commonly used in equations.
In
Hi Weslley!
четверг, 28 мая 2020 г., 2:42:23 UTC+3 пользователь Weslley Torres написал:
>
> probably you have done it already, but in any case..
>
Yes, I did.
The equations are recognized very bad, with textord_equation_detect=1 or
without. This works with the legacy engine only, LSTM does not
Hi,
probably you have done it already, but in any case.. in line 40, try it:
ocrData = pytesseract.image_to_data(thresh, output_type=Output.DICT,
config='--tessdata-dir /new/folder/address/Share/ --oem 0 -c
textord_equation_detect=1', lang='equ')
Please create one folder with the files
thank you very much, I will have a look at it =).
Kind regards,
Em quarta-feira, 27 de maio de 2020 23:01:48 UTC+2, Владимир Калачихин
escreveu:
>
> This is not a production code, just sketch.
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr"
This is not a production code, just sketch.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion
Hi!!
I think what you accomplished is good enough for me.
Do you mind sharing your code/script?
Kind regards
Em quarta-feira, 27 de maio de 2020 18:20:43 UTC+2, Владимир Калачихин
escreveu:
>
> Hi Weslley
> среда, 27 мая 2020 г., 18:02:59 UTC+3 пользователь Weslley Torres написал:
>>
>>
>>
Hi Weslley
среда, 27 мая 2020 г., 18:02:59 UTC+3 пользователь Weslley Torres написал:
>
>
> Did you manage to detect the area of equations in a picture?
>
>
I did it by naive approsh via consolidate areas with bad recognited symbols:
[image: Снимок экрана в 2020-05-18 00-10-39.png]
It is no so
Hi,
I have a similar situation, in my case I "just" need to identify/detect the
equation in the picture. I don't need to "read" it. Known the location is
enough for me, just like the paper you mentioned "A Simple Equation Region
Detector for Printed Document Images in Tesseract
Heh, "equ" language is not present on language-specific.sh, so training
Tesseract 4 to math symbols impossible.
Common question:
Is there a real way to create a language model from scratch? For new,
unknown language?
--
You received this message because you are subscribed to the Google
As point in
https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
:
"equ Math / equation detection module" not present in Tesseract 4. But
trainerdata present.
Does this mean that I must retrain the equ module from scratch?
--
You received this message because you are
11 matches
Mail list logo