Re: [tesseract-ocr] Adding Modi Script to Tesseract

2020-01-28 Thread Shree Devi Kumar
The default language that tesseract uses when none are specified is eng. Hence you get box file with English characters. There is currently no `Modi` traineddata so you can't use that, You could use `-l mar` to use Marathi but obviously the recognition will not be correct. I suggest that you use

[tesseract-ocr] lstmtraining creates an unuseable .traineddata file

2020-01-28 Thread Amory Kisch
I followed the instructions for Fine Tuning in the "TrainingTesseract 4.00" tutorial. The first time I did this process, it worked fine; I ended up with a new model that improved performance. However, whenever I have subsequently tried to train a new model, after running through the process I

[tesseract-ocr] Re: Announcement: Python package pytesstrain (Tesseract training helpers)

2020-01-28 Thread shree
Hi Wincent, Thank you for sharing these tools. I find create-dictdata to be very useful. I wanted to know if you have modified any ocr-evaluation tools to handle the high unicode range such as for Akkadian language. I was trying to test regarding Modi script (*Range*‎: ‎U+11600..U+1165F; (96

Re: [tesseract-ocr] Pros and cons of .tiff vs .png

2020-01-28 Thread Thad Guidry
There's a few Wiki pages that cover some of this. You can see the pages that have "png" mentioned by doing a search on Github and then filtering on Wiki (instead of default Code) Here's the filtered result pages from the Wiki that talk about "png".

[tesseract-ocr] Pros and cons of .tiff vs .png

2020-01-28 Thread teksts
Hi all, I'm fairly new to tesseract (and to programming work in general), and am trying to get my bearings. Almost everything I have seen recommends/assumes that I feed .tiff files into tesseract to be ocr'd, but I recently came across some posts suggesting that .png is less finicky, and might

Re: [tesseract-ocr] Adding Modi Script to Tesseract

2020-01-28 Thread 'Nilambari Joshi' via tesseract-ocr
I tried using MarathiCursiveT Medium as font in fontlist and it worked. Thanks for that. It created traineddata and unicharset files in the destination folder. I hope now I can continue with further instructions as mentioned at https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00

Re: [tesseract-ocr] Adding Modi Script to Tesseract

2020-01-28 Thread Shree Devi Kumar
*MarthiCursiveT Medium* *Use the above as the font with tesstrain.sh* *How are you creating the box file for the image?* On Tue, Jan 28, 2020, 21:56 'Nilambari Joshi' via tesseract-ocr < tesseract-ocr@googlegroups.com> wrote: > I was trying to do with image. I got one image online with all

Re: [tesseract-ocr] Incremental Training Tesseract 4.0+ for fraktur

2020-01-28 Thread Shree Devi Kumar
Please see https://github.com/tesseract-ocr/tesstrain/wiki There are already newly trained models by @stweil for Fraktur. On Tue, Jan 28, 2020, 22:46 Val LNB wrote: > *How to perform incremental training on Tesseract 4.0+?* > > > I want to improve the existing fraktur (frk) model with some

[tesseract-ocr] Incremental Training Tesseract 4.0+ for fraktur

2020-01-28 Thread Val LNB
*How to perform incremental training on Tesseract 4.0+?* I want to improve the existing fraktur (frk) model with some 6000 hand curated lines from our library. Ground truth for these lines has 10 new unicode characters not present in German fraktur model. How can I continue training from

Re: [tesseract-ocr] Adding Modi Script to Tesseract

2020-01-28 Thread 'Nilambari Joshi' via tesseract-ocr
I was trying to do with image. I got one image online with all modi script characters and tried to create Box file for that image. In the box file I can see that it is considering each character as English character. *My question is how to make it realise that it should refer to it as a modi

Re: [tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2020-01-28 Thread Shree Devi Kumar
Please see https://github.com/Shreeshrii/tesstrain-ckb This is for finetune training from script/Arabic, using text and fonts. You would need to do steps similar to https://github.com/Shreeshrii/tesstrain-ckb/blob/master/0-setup.sh