Re: [tesseract-ocr] Making custom traineddata

2019-04-09 Thread shree
Correction: fast version is *ocrb_int (not ocrb-int).* -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to

Re: [tesseract-ocr] Making custom traineddata

2019-04-09 Thread shree
see https://github.com/Shreeshrii/tessdata_ocrb Retrained to add missing X using 3 fonts at 3 exposures and a larger training text compared to previous version. Both float/best and integer/fast versions are provided. - Download best version

Re: [tesseract-ocr] Making custom traineddata

2019-04-08 Thread Shree Devi Kumar
If you can provide another 40-50 lines of training data (text file) I will rerun the training On Mon, 8 Apr 2019, 22:11 Jankees Korstanje, wrote: > Hi Shree, > > We have tried your traineddata file for MRZ and noticed that it does not > detect the character X. > > Looking at >

Re: [tesseract-ocr] Making custom traineddata

2019-04-08 Thread Jankees Korstanje
Hi Shree, We have tried your traineddata file for MRZ and noticed that it does not detect the character X. Looking at https://github.com/Shreeshrii/tessdata_ocrb/blob/master/eng.MRZ.training_text We see that there are no X in there. In addition it might be good to add a couple of lines that

Re: [tesseract-ocr] Making custom traineddata

2018-10-16 Thread Vinod Gattani
Thanks everyone. With suggestions and following this link " https://www.youtube.com/watch?v=WZLJucXZy-g;, I was able to run a demo training for a font. I used Shreeshrii' github repo "https://github.com/Shreeshrii/tessdata_ocrb ". Need some help on below points: If there any documentation

Re: [tesseract-ocr] Making custom traineddata

2018-10-16 Thread Soumik Ranjan Dasgupta
You should uninstall (purge) v3 first. Then build the v4 from scratch. On Tue, Oct 16, 2018 at 12:23 PM Vinod Gattani wrote: > Robert/ Zdenko > > Yes, in the log I see version "3.4v". > > To install v4, I used the link "https://github.com/tesseract-ocr/tesseract;. > I thought it has tesseract

Re: [tesseract-ocr] Making custom traineddata

2018-10-16 Thread Zdenko Podobny
You forget to uninstall tesseract 3.04 obviously. You can not have 2 installation of tesseract or you should know your system and have knowledge how to handle this kind of situation. What ever you do, you should understand what are you doing. Zdenko ut 16. 10. 2018 o 8:53 Vinod Gattani

Re: [tesseract-ocr] Making custom traineddata

2018-10-16 Thread Vinod Gattani
Robert/ Zdenko Yes, in the log I see version "3.4v". To install v4, I used the link "https://github.com/tesseract-ocr/tesseract;. I thought it has tesseract v4, as the Readme file say "Source code for the new LSTM based 4.0 version is available from the master branch on GitHub." So, I did a git

Re: [tesseract-ocr] Making custom traineddata

2018-10-16 Thread Zdenko Podobny
Robert is pointing you to right direction. Did you read the log you post here? " Tesseract Open Source OCR Engine v3.04.01 with Leptonica" You are mixing tesseract versions so no surprise of problems. Zdenko ut 16. 10. 2018 o 8:26 Vinod Gattani napísal(a): > Hi, > Typo: " Why the version is

Re: [tesseract-ocr] Making custom traineddata

2018-10-16 Thread Vinod Gattani
Hi, Typo: " Why the version is not 4.0.? I installed using "git pull https://github.com/tesseract-ocr/tesseract;. And then followed the instructions on training page. Regards On Tue, Oct 16, 2018 at 11:53 AM Robert Kamiński < kaminski.robert...@gmail.com> wrote: > Hi, > " Why the version is

Re: [tesseract-ocr] Making custom traineddata

2018-10-16 Thread Robert Kamiński
Hi, " Why the version is 4.0." What do you mean by that? In logs it states that it's 3.04v. "Tesseract Open Source OCR Engine v3.04.01 with Leptonica". The problem might be the fact that 4th version is using lstm files whereas you have version 3.04 using box files instead. Try to check the version

Re: [tesseract-ocr] Making custom traineddata

2018-10-16 Thread Vinod Gattani
Hi All, I have started a project to do OCR on Identity Cards. I am learning to train tesseract models with custom fonts. Please help me on this. Steps till now: 1. git pull https://github.com/tesseract-ocr/tesseract 2. Then I followed instructions on training package till command "sudo make

Re: [tesseract-ocr] Making custom traineddata

2018-09-10 Thread kaminski . robert . it
Thank you Shreeshrii for reply! Manual customization of theese files might be kinda annoying. If i will need to experiment with the dawg files and I'll achieve something I'll surely let you know if there is any difference. Again thank you for your help and time :) > -- You received this

Re: [tesseract-ocr] Making custom traineddata

2018-09-06 Thread Shree Devi Kumar
> When it's combining language model I've spotted that it's making some dawg files. Yes, it takes the files from langdata repo specified in the training command. You could change langdata/pol/pol.wordlist to have only the LAST NAMES and GIVEN NAMES, pol.punc to have only < and change number

Re: [tesseract-ocr] Making custom traineddata

2018-09-06 Thread kaminski . robert . it
Thank you for your reply Shreeshrii! Indeed finetune method is much much better solution for my problem. Thanks to your logs and data provided in repo I realized that I don't need to generate every single MRZ code separately (I'm sure it was mentioned somewhere ). In fact the process of making

Re: [tesseract-ocr] Making custom traineddata

2018-09-05 Thread Shree Devi Kumar
See https://github.com/Shreeshrii/tessdata_ocrb for the files and traineddata. On Wed, Sep 5, 2018 at 8:51 PM, Shree Devi Kumar wrote: > I think finetune will be a better option than training from scratch. > > Using a small training/test text - 40 lines, I get > >

Re: [tesseract-ocr] Making custom traineddata

2018-09-05 Thread Shree Devi Kumar
I think finetune will be a better option than training from scratch. Using a small training/test text - 40 lines, I get - + lstmeval --verbosity 0 --model /home/ubuntu/ *tessdata_best/script/Latin.traineddata* --eval_listfile

[tesseract-ocr] Making custom traineddata

2018-09-05 Thread kaminski . robert . it
Hi, (I might butcher English grammar- you have been warned!) For some time I'm trying to teach tesseract to read MRZ codes.Unfortunately it's not going very well. I'm using the latest version of tesseract (4.0) soI'mm trying to train it by lstm method. I've managed to pull it off and got