Thanks a lot, shree.  It seems you know everything.

I tried the MICR0.traineddata and the first two mcr.traineddata.  The last 
one was blocked by the browser.  Each of the traineddata had mixed 
results.  All of them are getting symbols fairly good but getting spaces 
randomly and reading some numbers wrong.

MICR0 seems the best among them.  Did you suggest that you'd be able to 
update it?  It gets tripple D very often where there's only one, and so on.

Also, I tried to fine tune from MICR0 but I found that I need to change the 
language-specific.sh.  It specifies some parameters for each language.  Do 
you have any guidance for it?

2019年6月14日金曜日 1時48分40秒 UTC+9 shree:
>
> see 
> http://www.devscope.net/Content/ocrchecks.aspx 
> https://github.com/BigPino67/Tesseract-MICR-OCR
> https://groups.google.com/d/msg/tesseract-ocr/obWI4cz8rXg/6l82hEySgOgJ 
>
> On Mon, Jun 10, 2019 at 11:21 AM ElGato ElMago <[email protected] 
> <javascript:>> wrote:
>
>> That'll be nice if there's traineddata out there but I didn't find any.  
>> I see free fonts and commercial OCR software but not traineddata.  Tessdata 
>> repository obviously doesn't have one, either.
>>
>> 2019年6月8日土曜日 1時52分10秒 UTC+9 shree:
>>>
>>> Please also search for existing MICR traineddata files.
>>>
>>> On Thu, Jun 6, 2019 at 1:09 PM ElGato ElMago <[email protected]> 
>>> wrote:
>>>
>>>> So I did several tests from scratch.  In the last attempt, I made a 
>>>> training text with 4,000 lines in the following format,
>>>>
>>>> 110004310510<   <02 :4002=0181:801= 0008752 <00039 ;0000001000;
>>>>
>>>>
>>>> and combined it with eng.digits.training_text in which symbols are 
>>>> converted to E13B symbols.  This makes about 12,000 lines of training 
>>>> text.  It's amazing that this thing generates a good reader out of 
>>>> nowhere.  But then it is not very good.  For example:
>>>>
>>>> <01 :1901=1386:021= 1111001<10001< ;0000090134;
>>>>
>>>> is a result on the image attached.  It's close but the last '<' in the 
>>>> result text doesn't exist on the image.  It's a small failure but it 
>>>> causes 
>>>> a greater trouble in parsing.
>>>>
>>>> What would you suggest from here to increase accuracy?  
>>>>
>>>>    - Increase the number of lines in the training text
>>>>    - Mix up more variations in the training text
>>>>    - Increase the number of iterations
>>>>    - Investigate wrong reads one by one
>>>>    - Or else?
>>>>
>>>> Also, I referred to engrestrict*.* and could generate similar result 
>>>> with the fine-tuning-from-full method.  It seems a bit faster to get to 
>>>> the 
>>>> same level but it also stops at a 'good' level.  I can go with either way 
>>>> if it takes me to the bright future.
>>>>
>>>> Regards,
>>>> ElMagoElGato
>>>>
>>>> 2019年5月30日木曜日 15時56分02秒 UTC+9 ElGato ElMago:
>>>>>
>>>>> Thanks a lot, Shree. I'll look it in.
>>>>>
>>>>> 2019年5月30日木曜日 14時39分52秒 UTC+9 shree:
>>>>>>
>>>>>> See https://github.com/Shreeshrii/tessdata_shreetest
>>>>>>
>>>>>> Look at the files engrestrict*.* and also 
>>>>>> https://github.com/Shreeshrii/tessdata_shreetest/blob/master/eng.digits.training_text
>>>>>>
>>>>>> Create training text of about 100 lines and finetune for 400 lines 
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, May 30, 2019 at 9:38 AM ElGato ElMago <[email protected]> 
>>>>>> wrote:
>>>>>>
>>>>>>> I had about 14 lines as attached.  How many lines would you 
>>>>>>> recommend?
>>>>>>>
>>>>>>> Fine tuning gives much better result but it tends to pick other 
>>>>>>> character than in E13B that only has 14 characters, 0 through 9 and 4 
>>>>>>> symbols.  I thought training from scratch would eliminate such 
>>>>>>> confusion.
>>>>>>>
>>>>>>> 2019年5月30日木曜日 10時43分08秒 UTC+9 shree:
>>>>>>>>
>>>>>>>> For training from scratch a large training text and hundreds of 
>>>>>>>> thousands of iterations are recommended. 
>>>>>>>>
>>>>>>>> If you are just fine tuning for a font try to follow instructions 
>>>>>>>> for training for impact, with your font.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, 30 May 2019, 06:05 ElGato ElMago, <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks, Shree.
>>>>>>>>>
>>>>>>>>> Yes, I saw the instruction.  The steps I made are as follows:
>>>>>>>>>
>>>>>>>>> Using tesstrain.sh:
>>>>>>>>> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng 
>>>>>>>>> --linedata_only \
>>>>>>>>>   --noextract_font_properties --langdata_dir ../langdata \
>>>>>>>>>   --tessdata_dir ./tessdata \
>>>>>>>>>   --fontlist "E13Bnsd" --output_dir ~/tesstutorial/e13beval \
>>>>>>>>>   --training_text ../langdata/eng/eng.training_e13b_text
>>>>>>>>>
>>>>>>>>> Training from scratch:
>>>>>>>>> mkdir -p ~/tesstutorial/e13boutput
>>>>>>>>> src/training/lstmtraining --debug_interval 100 \
>>>>>>>>>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>>>>>>>>>   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 
>>>>>>>>> O1c111]' \
>>>>>>>>>   --model_output ~/tesstutorial/e13boutput/base --learning_rate 
>>>>>>>>> 20e-4 \
>>>>>>>>>   --train_listfile ~/tesstutorial/e13beval/eng.training_files.txt \
>>>>>>>>>   --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt \
>>>>>>>>>   --max_iterations 5000 &>~/tesstutorial/e13boutput/basetrain.log
>>>>>>>>>
>>>>>>>>> Test with base_checkpoint:
>>>>>>>>> src/training/lstmeval --model 
>>>>>>>>> ~/tesstutorial/e13boutput/base_checkpoint \
>>>>>>>>>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>>>>>>>>>   --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt
>>>>>>>>>
>>>>>>>>> Combining output files:
>>>>>>>>> src/training/lstmtraining --stop_training \
>>>>>>>>>   --continue_from ~/tesstutorial/e13boutput/base_checkpoint \
>>>>>>>>>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>>>>>>>>>   --model_output ~/tesstutorial/e13boutput/eng.traineddata
>>>>>>>>>
>>>>>>>>> Test with eng.traineddata:
>>>>>>>>> tesseract e13b.png out --tessdata-dir 
>>>>>>>>> /home/koichi/tesstutorial/e13boutput
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The training from scratch ended as:
>>>>>>>>>
>>>>>>>>> At iteration 561/2500/2500, Mean rms=0.159%, delta=0%, char 
>>>>>>>>> train=0%, word train=0%, skip ratio=0%,  New best char error = 0 
>>>>>>>>> wrote best 
>>>>>>>>> model:/home/koichi/tesstutorial/e13boutput/base0_561.checkpoint wrote 
>>>>>>>>> checkpoint.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The test with base_checkpoint returns nothing as:
>>>>>>>>>
>>>>>>>>> At iteration 0, stage 0, Eval Char error rate=0, Word error rate=0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The test with eng.traineddata and e13b.png returns out.txt.  Both 
>>>>>>>>> files are attached.
>>>>>>>>>
>>>>>>>>> Training seems to have worked fine.  I don't know how to translate 
>>>>>>>>> the test result from base_checkpoint.  The generated eng.traineddata 
>>>>>>>>> obviously doesn't work well. I suspect the choice of --traineddata in 
>>>>>>>>> combining output files is bad but I have no clue.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> ElMagoElGato
>>>>>>>>>
>>>>>>>>> BTW, I referred to your tess4training in the process.  It helped a 
>>>>>>>>> lot.
>>>>>>>>>
>>>>>>>>> 2019年5月29日水曜日 19時14分08秒 UTC+9 shree:
>>>>>>>>>>
>>>>>>>>>> see 
>>>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files
>>>>>>>>>>
>>>>>>>>>> On Wed, May 29, 2019 at 3:18 PM ElGato ElMago <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I wish to make a trained data for E13B font.
>>>>>>>>>>>
>>>>>>>>>>> I read the training tutorial and made a base_checkpoint file 
>>>>>>>>>>> according to the method in Training From Scratch.  Now, how can I 
>>>>>>>>>>> make a 
>>>>>>>>>>> trained data from the base_checkpoint file?
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>> it, send an email to [email protected].
>>>>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>>>>> Visit this group at 
>>>>>>>>>>> https://groups.google.com/group/tesseract-ocr.
>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com
>>>>>>>>>>>  
>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>> .
>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>>
>>>>>>>>>> ____________________________________________________________
>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com
>>>>>>>>>  
>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to [email protected].
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>>
>>>>>> ____________________________________________________________
>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/5b151e61-5b41-4191-8d26-784809ef8e10%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/5b151e61-5b41-4191-8d26-784809ef8e10%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>> -- 
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/09d3119c-d093-4269-bf3a-3ddb467ed0ed%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/09d3119c-d093-4269-bf3a-3ddb467ed0ed%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> -- 
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/856a44a7-5127-45cd-9c7d-b9684eba8089%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to