You may find this repo useful
https://github.com/UYousafzai/easy_train_tesseract
On Mon, Jun 1, 2020 at 10:05 PM Shree Devi Kumar
wrote:
> >Failed to load script unicharset from:./langdata/Latin.unicharset"
>
> This is for Latin script not Latin language.
> wget the file from
>
>Failed to load script unicharset from:./langdata/Latin.unicharset"
This is for Latin script not Latin language.
wget the file from
https://github.com/tesseract-ocr/langdata_lstm/blob/master/Latin.unicharset
On Mon, Jun 1, 2020 at 8:16 PM Владимир Калачихин
wrote:
> Hi!
> понедельник, 1 июня
Hi!
понедельник, 1 июня 2020 г., 11:23:39 UTC+3 пользователь shree написал:
>
>
> ### create tif and box using fonts and training text
> text2image --fonts_dir=/home/ubuntu/.fonts
> --outputbase=/mylang.myfont.exp0 --max_pages=0 --font=myfont
> --text=../langdata/mylang/mylang.training_text
>
Ah, right ..
there ...
Thank you so much!
Truly appreciate the quick reply.
On Monday, June 1, 2020 at 12:01:15 PM UTC+2, Mike Dewul wrote:
>
> I am trying "(a9t9)FreeOcrWindowsDesktop" which perform OCR of images
> (batch)
> However, I need the Dutch (NLD) language pack.
>
> Where to get
https://github.com/tesseract-ocr/tessdata_fast
https://github.com/tesseract-ocr/tessdoc/blob/master/Data-Files.md
On Mon, Jun 1, 2020 at 3:31 PM Mike Dewul wrote:
> I am trying "(a9t9)FreeOcrWindowsDesktop" which perform OCR of images
> (batch)
> However, I need the Dutch (NLD) language pack.
I am trying "(a9t9)FreeOcrWindowsDesktop" which perform OCR of images
(batch)
However, I need the Dutch (NLD) language pack.
Where to get it?
Vainly searched for hours ...
Any other free tool similar to the (a9t9)FreeOcrWindowsDesktop ?
i.e. batch, images, using Tesseract.
Thanks.
--
You
Thank you very much. I want to know is the tesstrain has the same logic as
tesseract-4.0.0 after getting data set.Thank you.
在 2020年6月1日星期一 UTC+8下午4:09:42,shree写道:
>
> If your image dataset and groundtruth is for line images you can use
>
> https://github.com/tesseract-ocr/tesstrain
>
> On
So, modify the info given by Piyush Chandra earlier in this thread. The
paths needs to based on where you have the files.
### create tif and box using fonts and training text
text2image --fonts_dir=/home/ubuntu/.fonts
--outputbase=/mylang.myfont.exp0 --max_pages=0 --font=myfont
If your image dataset and groundtruth is for line images you can use
https://github.com/tesseract-ocr/tesstrain
On Mon, Jun 1, 2020 at 11:16 AM 易鑫 wrote:
> Hello,everyone:
> As we all know,after teseract v4.0,it can generate dataset
> automatically.But for me ,the accuracy is not as good
I am trying to scan a Santali book with multiple character (Ol chiki script
+ English script + Odia script) with gImageReader 3.3.1 (17fa17) which uses
Tesseract 4.1.0 but unable to get satisfactory results.
I have tried with English + Odia script are working fine they are giving
very good
10 matches
Mail list logo