thanks for your reply.
i will test these as soon as possible.
one of the weakness of tesseract is when we want ocr multiple languages.
for example, if we have an image with persian and english text, the
tesseract can't recogize those as well as we have a single language.
Do you have any
Hi Reza,
Attached are two scripts and one log file. You will need to change the
directories in the scripts.
finetune.sh and finetune log file are for a sample finetuning for eng. By
changing the language code you can run it for fas.
You can use that as a test.
plus-fas.sh is for plusminus type
hi ShreeDevi
Thanks.
I tested the 2 models that you have provided. The accuracy on samples
without noise were about 98% but on scanned samples or captured images,
were about 80%.
but still it didn't work on different fonts.
Could u send all files that needed for training models? I want fine
I have posted a couple of test models for Farsi at
https://github.com/Shreeshrii/tessdata_shreetest
These have not been trained on text with diacritics as the normalization
and training process was giving error on the combining marks.
Please give them a try and see if they provide better
hi again
thanks for your reply.
i need more fonts. for examples :
B Koodak
B Lotus
B Titr
B Zar
B Yekan
Iran Nastaliq
if needs, i send the .ttf files of that fonts ?
thanks
On Tuesday, May 15, 2018 at 5:35:10 PM UTC+4:30, shree wrote:
>
> I will try to put together complete steps.
>
> I am
I will try to put together complete steps.
I am doing a test run for training persian.
Are the following fonts ok for it?
'55_Sarchia_Kurdish' \
'56_Sarchia_Kurdish_Bold Bold' \
'Amiri' \
'Arabic Typesetting' \
'Arial' \
'Arial Unicode MS' \
'B Nazanin' \
'B Nazanin Bold' \
i test it on ubuntu , that raised error too.
could u help me and send me a new bash file for fine tuning with new fonts ?
i put "eng.traineddata" fil in tessdata_best folder
and "eng.training_text" and "eng.traineddata" in langdata\eng
is it true and sufficient ? or need more file ?
thanks
Please use the latest windows binaries from
https://github.com/UB-Mannheim/tesseract/wiki provided by @stweil
How do you run bash script on windows10?
@stweil I have not tried training on windows? Do you have feedback from
others who have tried it.
ShreeDevi
thanks for reply
tesseract 4 beta
windows 10
On Tuesday, May 15, 2018 at 1:12:20 PM UTC+4:30, shree wrote:
>
> What o/s are you running it on?
>
> Which version of tesseract?
>
> > ICU ERROR: U_FILE_ACCESS_ERRORERROR: /tmp/tmp.6m4B2TUln1/eng/eng.unicharset
> does not exist or is not readable
windows 10
tesseract 4 alpha
On Tuesday, May 15, 2018 at 1:12:20 PM UTC+4:30, shree wrote:
>
> What o/s are you running it on?
>
> Which version of tesseract?
>
> > ICU ERROR: U_FILE_ACCESS_ERRORERROR: /tmp/tmp.6m4B2TUln1/eng/eng.unicharset
> does not exist or is not readable
>
> which version
What o/s are you running it on?
Which version of tesseract?
> ICU ERROR: U_FILE_ACCESS_ERRORERROR: /tmp/tmp.6m4B2TUln1/eng/eng.unicharset
does not exist or is not readable
which version of icu library?
ShreeDevi
भजन - कीर्तन - आरती @
i used this attached finetune.sh file ... but that raised error. could u
help me ?
thanks
> ## MAKING TRAINING DATA ##
>
>
>> === Starting training for language 'eng'
>
> [Tue, May 15, 2018 11:42:36 AM] /c/Program Files
>> (x86)/Tesseract-OCR/text2image --fonts_dir=C:WindowsFonts
thanks for your reply.
I read that but i confused. could u send me a bash file for fine tune for
impact ?
thanks
On Monday, May 14, 2018 at 6:18:11 PM UTC+4:30, shree wrote:
>
> please see
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact
>
>
hi
i tested tesseract 4 beta on persian lang , the results was good. but i
think needs more training on more fonts and texts.
how could we train more fonts and texts on model that exist in tesseract 4
beta for persian lang ?
and last question is, how could we apply dictionary to correct that
14 matches
Mail list logo