What is the current situation with subj?
I find only "A Simple Equation Region Detector for Printed Document Images
in Tesseract
As point in
https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
:
"equ Math / equation detection module" not present in Tesseract 4. But
trainerdata present.
Does this mean that I must retrain the equ module from scratch?
--
You received this message because you are
Heh, "equ" language is not present on language-specific.sh, so training
Tesseract 4 to math symbols impossible.
Common question:
Is there a real way to create a language model from scratch? For new,
unknown language?
--
You received this message because you are subscribed to the Google
Hi Weslley
среда, 27 мая 2020 г., 18:02:59 UTC+3 пользователь Weslley Torres написал:
>
>
> Did you manage to detect the area of equations in a picture?
>
>
I did it by naive approsh via consolidate areas with bad recognited symbols:
[image: Снимок экрана в 2020-05-18 00-10-39.png]
It is no so
This is not a production code, just sketch.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion
Hi!
четверг, 28 мая 2020 г., 8:04:03 UTC+3 пользователь Piyush Chandra написал:
>
> Hope below information helps: :)
>
>
Pls, some questions:
Is it required: "--words...", "--numbers..." and "--puncs"?
Why do need "--net_spec..."?
--
You received this message because you are subscribed to the
Hi Weslley!
четверг, 28 мая 2020 г., 2:42:23 UTC+3 пользователь Weslley Torres написал:
>
> probably you have done it already, but in any case..
>
Yes, I did.
The equations are recognized very bad, with textord_equation_detect=1 or
without. This works with the legacy engine only, LSTM does not
четверг, 28 мая 2020 г., 14:59:05 UTC+3 пользователь Weslley Torres
написал:
> I though we should use "equ" instead of "eng" for equations detection. I
> mean, how "eng" would recognise Greek letters? And Greek letters are
> commonly used in equations.
>
No. Base concept of my naive
Hi!
Another question:
четверг, 28 мая 2020 г., 8:04:03 UTC+3 пользователь Piyush Chandra написал:
>
>
> Create box files: tesseract /path/to/image.tif
> path/and/nameof/boxfile/imgae lstmbox
>
>
>
On this step tesseract recognize the image? What if this does it badly?
Can I specify what text is
четверг, 28 мая 2020 г., 14:46:10 UTC+3 пользователь Piyush Chandra написал:
>
> Read about --Net spec here:
> https://tesseract-ocr.github.io/tessdoc/VGSLSpecs
>
> Yes, but why custom net configuration for common task?
And, which net configuration well suited for trainning to math symbols?
I'm trying to
https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00.html#tesstutorial
I repeat all the points as given.
On
src/training/tesstrain.sh...
I have error:
ERROR: /tmp/eng-2020-05-25.QY7/eng.Century_Schoolbook_L_Bold.exp0.lstmf
does not exist or is not readable
Both
четверг, 28 мая 2020 г., 16:36:14 UTC+3 пользователь shree написал:
> Alternately you can use wordstrbox config file.
>
> What is "wordstrbox config file"?
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and
creates line level box files.
>
> If using wordstrbox, please use the groundtruth text for creating
> unicharset instead of the box files.
>
> On Thu, May 28, 2020, 20:49 Владимир Калачихин > wrote:
>
>>
>> четверг, 28 мая 2020 г., 16:36:14 UTC+3 пользователь shree
Hi !
I still don't understand.
пятница, 29 мая 2020 г., 15:02:22 UTC+3 пользователь shree написал:
> Input Files
>
> myfile1.png
> myfile1.gt.txt
>
>
Is "myfile1.png" - the picture with training text?
What is "myfile1.gt.txt"?
--
You received this message because you are subscribed to the
Ok, I want to train from training text and fonts.
Whats method must be?
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
воскресенье, 31 мая 2020 г., 19:16:55 UTC+3 пользователь shree написал:
>
> Use tesstrain.sh or tesstrain.py
>
> On Sun, May 31, 2020 at 6:45 PM Владимир Калачихин > wrote:
>
>> Ok, I want to train from training text and fonts.
>> Whats method must be?
>>
Hi!
понедельник, 1 июня 2020 г., 11:23:39 UTC+3 пользователь shree написал:
>
>
> ### create tif and box using fonts and training text
> text2image --fonts_dir=/home/ubuntu/.fonts
> --outputbase=/mylang.myfont.exp0 --max_pages=0 --font=myfont
> --text=../langdata/mylang/mylang.training_text
>
I don't see any problems.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit
понедельник, 1 июня 2020 г., 19:36:07 UTC+3 пользователь shree написал:
This is for Latin script not Latin language.
> wget the file from
> https://github.com/tesseract-ocr/langdata_lstm/blob/master/Latin.unicharset
>
>
Ok, I did it, and some next steps.
On step
### Train:
> lstmtraining .
понедельник, 1 июня 2020 г., 19:37:25 UTC+3 пользователь shree написал:
>
> You may find this repo useful
>
> https://github.com/UYousafzai/easy_train_tesseract
>
> You don't understand. I don't want training to new fonts of existing
language. I want a new language.
--
You received this
Subj
Numbers, for example.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit
I returned to this job.
четверг, 4 июня 2020 г., 19:13:58 UTC+3 пользователь Piyush Chandra написал:
>
> This is what is missing : --net_spec . Check the line below that I
> mentioned before.
>
> lstmtraining --traineddata ./out/own/own.traineddata --model_output
> ./output/own --net_spec
Heh. It's an old issue.
For 100% accuracy, you must use a digit-only language model. But there is
no such thing.
Besides trivial perceptron shows good results on digits recognition.
суббота, 30 января 2021 г. в 18:41:13 UTC+3, Benek:
> Hello! I'm trying to read some digits and I thought it was
Digits included in language model with letters. And model most trained to
phrase recognition, not separate digits. Mistakes on digits unavoidable.
суббота, 30 января 2021 г. в 19:12:39 UTC+3, Benek:
> I still need to read the dot in the correct place which makes it a bit
> harder. So you
Are there any examples of the recognition of code-stamped digits, such as
ZIP codes?
Or a real approach to recognize handwritten digits?
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails
25 matches
Mail list logo