The default language that tesseract uses when none are specified is eng.
Hence you get box file with English characters.
There is currently no `Modi` traineddata so you can't use that, You could
use `-l mar` to use Marathi but obviously the recognition will not be
correct.
I suggest that you use
I followed the instructions for Fine Tuning in the "TrainingTesseract 4.00"
tutorial. The first time I did this process, it worked fine; I ended up
with a new model that improved performance. However, whenever I have
subsequently tried to train a new model, after running through the process
I
Hi Wincent,
Thank you for sharing these tools. I find create-dictdata to be very useful.
I wanted to know if you have modified any ocr-evaluation tools to handle
the high unicode range such as for Akkadian language.
I was trying to test regarding Modi script (*Range*: U+11600..U+1165F;
(96
There's a few Wiki pages that cover some of this.
You can see the pages that have "png" mentioned by doing a search on Github
and then filtering on Wiki (instead of default Code)
Here's the filtered result pages from the Wiki that talk about "png".
Hi all,
I'm fairly new to tesseract (and to programming work in general), and am
trying to get my bearings. Almost everything I have seen recommends/assumes
that I feed .tiff files into tesseract to be ocr'd, but I recently came
across some posts suggesting that .png is less finicky, and might
I tried using MarathiCursiveT Medium as font in fontlist and it worked.
Thanks for that.
It created traineddata and unicharset files in the destination folder.
I hope now I can continue with further instructions as mentioned at
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
*MarthiCursiveT Medium*
*Use the above as the font with tesstrain.sh*
*How are you creating the box file for the image?*
On Tue, Jan 28, 2020, 21:56 'Nilambari Joshi' via tesseract-ocr <
tesseract-ocr@googlegroups.com> wrote:
> I was trying to do with image. I got one image online with all
Please see https://github.com/tesseract-ocr/tesstrain/wiki
There are already newly trained models by @stweil for Fraktur.
On Tue, Jan 28, 2020, 22:46 Val LNB wrote:
> *How to perform incremental training on Tesseract 4.0+?*
>
>
> I want to improve the existing fraktur (frk) model with some
*How to perform incremental training on Tesseract 4.0+?*
I want to improve the existing fraktur (frk) model with some 6000 hand
curated lines from our library.
Ground truth for these lines has 10 new unicode characters not present in
German fraktur model.
How can I continue training from
I was trying to do with image. I got one image online with all modi script
characters and tried to create Box file for that image.
In the box file I can see that it is considering each character as English
character.
*My question is how to make it realise that it should refer to it as a modi
Please see https://github.com/Shreeshrii/tesstrain-ckb
This is for finetune training from script/Arabic, using text and fonts.
You would need to do steps similar to
https://github.com/Shreeshrii/tesstrain-ckb/blob/master/0-setup.sh
11 matches
Mail list logo