Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-06-01 Thread Shree Devi Kumar
You may find this repo useful https://github.com/UYousafzai/easy_train_tesseract On Mon, Jun 1, 2020 at 10:05 PM Shree Devi Kumar wrote: > >Failed to load script unicharset from:./langdata/Latin.unicharset" > > This is for Latin script not Latin language. > wget the file from >

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-06-01 Thread Shree Devi Kumar
>Failed to load script unicharset from:./langdata/Latin.unicharset" This is for Latin script not Latin language. wget the file from https://github.com/tesseract-ocr/langdata_lstm/blob/master/Latin.unicharset On Mon, Jun 1, 2020 at 8:16 PM Владимир Калачихин wrote: > Hi! > понедельник, 1 июня

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-06-01 Thread Владимир Калачихин
Hi! понедельник, 1 июня 2020 г., 11:23:39 UTC+3 пользователь shree написал: > > > ### create tif and box using fonts and training text > text2image --fonts_dir=/home/ubuntu/.fonts > --outputbase=/mylang.myfont.exp0 --max_pages=0 --font=myfont > --text=../langdata/mylang/mylang.training_text >

[tesseract-ocr] Re: Where to download the dutch language pack?

2020-06-01 Thread Mike Dewul
Ah, right .. there ... Thank you so much! Truly appreciate the quick reply. On Monday, June 1, 2020 at 12:01:15 PM UTC+2, Mike Dewul wrote: > > I am trying "(a9t9)FreeOcrWindowsDesktop" which perform OCR of images > (batch) > However, I need the Dutch (NLD) language pack. > > Where to get

Re: [tesseract-ocr] Where to download the dutch language pack?

2020-06-01 Thread Shree Devi Kumar
https://github.com/tesseract-ocr/tessdata_fast https://github.com/tesseract-ocr/tessdoc/blob/master/Data-Files.md On Mon, Jun 1, 2020 at 3:31 PM Mike Dewul wrote: > I am trying "(a9t9)FreeOcrWindowsDesktop" which perform OCR of images > (batch) > However, I need the Dutch (NLD) language pack.

[tesseract-ocr] Where to download the dutch language pack?

2020-06-01 Thread Mike Dewul
I am trying "(a9t9)FreeOcrWindowsDesktop" which perform OCR of images (batch) However, I need the Dutch (NLD) language pack. Where to get it? Vainly searched for hours ... Any other free tool similar to the (a9t9)FreeOcrWindowsDesktop ? i.e. batch, images, using Tesseract. Thanks. -- You

Re: [tesseract-ocr] How can use tessract for training using my own image dataset

2020-06-01 Thread 易鑫
Thank you very much. I want to know is the tesstrain has the same logic as tesseract-4.0.0 after getting data set.Thank you. 在 2020年6月1日星期一 UTC+8下午4:09:42,shree写道: > > If your image dataset and groundtruth is for line images you can use > > https://github.com/tesseract-ocr/tesstrain > > On

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-06-01 Thread Shree Devi Kumar
So, modify the info given by Piyush Chandra earlier in this thread. The paths needs to based on where you have the files. ### create tif and box using fonts and training text text2image --fonts_dir=/home/ubuntu/.fonts --outputbase=/mylang.myfont.exp0 --max_pages=0 --font=myfont

Re: [tesseract-ocr] How can use tessract for training using my own image dataset

2020-06-01 Thread Shree Devi Kumar
If your image dataset and groundtruth is for line images you can use https://github.com/tesseract-ocr/tesstrain On Mon, Jun 1, 2020 at 11:16 AM 易鑫 wrote: > Hello,everyone: > As we all know,after teseract v4.0,it can generate dataset > automatically.But for me ,the accuracy is not as good

[tesseract-ocr] Multiple language OCR (Santali+Odia+English) combination is not working with gImageReader

2020-06-01 Thread Prasanta Hembram
I am trying to scan a Santali book with multiple character (Ol chiki script + English script + Odia script) with gImageReader 3.3.1 (17fa17) which uses Tesseract 4.1.0 but unable to get satisfactory results. I have tried with English + Odia script are working fine they are giving very good