I have uploaded modified nor.traineddata at

https://github.com/Shreeshrii/tessdata4alpha/blob/master/nor.traineddata

See attached log and info file for commands used in training. It took about
9 hours on my pc - about 1700 iterations only and then my PC froze so I
rebooted and created the traineddata for norlayer0.853_1615.lstm i.e. 0.853
% character error rate at iteration number 1615.


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Jan 6, 2017 at 5:59 PM, ShreeDevi Kumar <shreesh...@gmail.com>
wrote:

> @Peter, Have you tried the 4.0.0alpha version yet?
>
> @Ludvig F. Aarstad - Add a layer training worked for adding 'Æ' - I will
> upload the new traineddata so that you can test. You will need 4.0.alpha
> version for testing.
>
> Here is couple of the training tifs and OCRed text.
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Fri, Jan 6, 2017 at 5:01 PM, Peter <pe...@peterkrantz.se> wrote:
>
>>
>>
>> Den torsdag 5 januari 2017 kl. 04:39:01 UTC+1 skrev shree:
>>>
>>> Ray is planning to retrain the languages for the new 4.0.0 version
>>> sometime in January. So it would be helpful if you could open an issue on
>>> https://github.com/tesseract-ocr/langdata/issues with this information.
>>>
>>
>> Is it possible to contribute training data for this effort? I realise
>> swedish will not be on top of the list but I think it would be easy to
>> involve some of the research community here in contributing training data
>> if it could improve the language model.
>>
>> /Peter
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXOW8gDtXxKSmavVBocM7ErH3MMOcdZe9ehEYUUW0VNzQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
---------------------------------
// Error rate at which to transition to stage 1.
const double kStageTransitionThreshold = 10.0;

// Appends <intro_str> iteration learning_iteration()/training_iteration()/
// sample_iteration() to the log_msg.

 // Delta error is the fraction of timesteps with >0.5 error in the top choice
  // score. If zero, then the top choice characters are guaranteed correct,
  // even when there is residue in the RMS error.

  // Skip ratio measures the difference between sample_iteration_ and
  // training_iteration_, which reflects the number of unusable samples,
  // usually due to unencodable truth text, or the text not fitting in the
  // space for the output.

-------------------------------
$ mkdir -p ~/tesstutorial/nor_layer
$ combine_tessdata -e ../tessdata/nor.traineddata \
>   ~/tesstutorial/nor_layer/nor.lstm
Extracting tessdata components from ../tessdata/nor.traineddata
Wrote /home/shree/tesstutorial/nor_layer/nor.lstm
$  lstmtraining -U ~/tesstutorial/nor/nor.unicharset \
>   --script_dir ../langdata  --debug_interval 0 \
>   --continue_from ~/tesstutorial/nor_layer/nor.lstm \
>   --append_index 5 --net_spec '[Lfx256 O1c105]' \
>   --model_output ~/tesstutorial/nor_layer/norlayer \
>   --train_listfile ~/tesstutorial/nor/nor.training_files.txt \
>   --max_iterations 50000
Loaded file /home/shree/tesstutorial/nor_layer/nor.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from /home/shree/tesstutorial/nor_layer/nor.lstm
Other case É of é is not in unicharset
Other case Ö of ö is not in unicharset
Other case Ä of ä is not in unicharset
Appending a new network to an old one!!Setting unichar properties
Setting properties for script Common
Setting properties for script Latin
Warning: given outputs 105 not equal to unicharset of 101.
Num outputs,weights in serial:
  Lfx256:256, 394240
  Fc101:101, 25957
Total weights = 420197
Built network:[1,0,0,1[C5,5Ft16]Mp3,3Lfys64Lfx128Lrx128Lfx256Fc101] from 
request [Lfx256 O1c105]
Training parameters:
  Debug interval = 0, weights = 0.1, learning rate = 0.0001, momentum=0.9
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Arial_Bold.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Arial.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Arial_Bold_Italic.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Arial_Italic.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Century_Schoolbook_L_Medium.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Century_Schoolbook_L_Bold.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Century_Schoolbook_L_Bold_Italic.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Courier_New_Bold.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Century_Schoolbook_L_Italic.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Courier_New_Bold_Italic.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Courier_New.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Courier_New_Italic.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Georgia_Bold.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Georgia_Bold_Italic.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Georgia.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Georgia_Italic.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Times_New_Roman_Bold.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Times_New_Roman_Bold_Italic.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Times_New_Roman.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Times_New_Roman_Italic.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Trebuchet_MS_Bold.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Trebuchet_MS_Bold_Italic.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Trebuchet_MS.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Trebuchet_MS_Italic.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.URW_Bookman_L_Bold.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.URW_Bookman_L_Bold_Italic.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.URW_Bookman_L_Italic.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Verdana_Bold.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Verdana_Bold_Italic.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Verdana.exp0.lstmf
Loaded 241/241 pages (1-241) of document 
/home/shree/tesstutorial/nor/nor.Verdana_Italic.exp0.lstmf
At iteration 100/100/100, Mean rms=6.173%, delta=57.061%, char train=121.258%, 
word train=100%, skip ratio=0%,  New worst char error = 121.258 wrote 
checkpoint.

At iteration 200/200/200, Mean rms=5.509%, delta=43.627%, char train=102.285%, 
word train=99.873%, skip ratio=0%,  New worst char error = 102.285 wrote 
checkpoint.

2 Percent improvement time=300, best error was 100 @ 0
At iteration 300/300/300, Mean rms=4.804%, delta=33.195%, char train=83.028%, 
word train=95.586%, skip ratio=0%,  New best char error = 83.028 wrote 
checkpoint.

2 Percent improvement time=100, best error was 83.028 @ 300
At iteration 400/400/400, Mean rms=4.237%, delta=26.762%, char train=68.349%, 
word train=85.796%, skip ratio=0%,  New best char error = 68.349 wrote best 
model:/home/shree/tesstutorial/nor_layer/norlayer68.349_400.lstm wrote 
checkpoint.
2 Percent improvement time=100, best error was 68.349 @ 400

At iteration 500/500/500, Mean rms=3.798%, delta=22.466%, char train=57.949%, 
word train=76.222%, skip ratio=0%,  New best char error = 57.949 wrote best 
model:/home/shree/tesstutorial/nor_layer/norlayer57.949_500.lstm wrote 
checkpoint.

2 Percent improvement time=100, best error was 57.949 @ 500
At iteration 600/600/600, Mean rms=3.478%, delta=19.53%, char train=50.964%, 
word train=69.69%, skip ratio=0%,  New best char error = 50.964 wrote best 
model:/home/shree/tesstutorial/nor_layer/norlayer50.964_600.lstm wrote 
checkpoint.

2 Percent improvement time=97, best error was 50.964 @ 600
At iteration 697/700/700, Mean rms=3.217%, delta=17.259%, char train=45.254%, 
word train=63.256%, skip ratio=0%,  New best char error = 45.254 wrote best 
model:/home/shree/tesstutorial/nor_layer/norlayer45.254_697.lstm wrote 
checkpoint.

2 Percent improvement time=91, best error was 45.254 @ 697
At iteration 788/800/800, Mean rms=2.98%, delta=15.395%, char train=40.512%, 
word train=57.614%, skip ratio=0%,  New best char error = 40.512 wrote best 
model:/home/shree/tesstutorial/nor_layer/norlayer40.512_788.lstm wrote 
checkpoint.

2 Percent improvement time=78, best error was 40.512 @ 788
At iteration 866/900/900, Mean rms=2.785%, delta=13.893%, char train=36.821%, 
word train=53.239%, skip ratio=0%,  New best char error = 36.821 wrote best 
model:/home/shree/tesstutorial/nor_layer/norlayer36.821_866.lstm wrote 
checkpoint.

2 Percent improvement time=67, best error was 36.821 @ 866
At iteration 933/1000/1000, Mean rms=2.618%, delta=12.657%, char train=33.723%, 
word train=49.372%, skip ratio=0%,  New best char error = 33.723 wrote best 
model:/home/shree/tesstutorial/nor_layer/norlayer33.723_933.lstm wrote 
checkpoint.

2 Percent improvement time=65, best error was 33.723 @ 933
At iteration 998/1100/1100, Mean rms=2.105%, delta=7.091%, char train=22.057%, 
word train=40.57%, skip ratio=0%,  New best char error = 22.057 wrote best 
model:/home/shree/tesstutorial/nor_layer/norlayer22.057_998.lstm wrote 
checkpoint.

2 Percent improvement time=67, best error was 22.057 @ 998
At iteration 1065/1200/1200, Mean rms=1.717%, delta=4.18%, char train=14.09%, 
word train=31.585%, skip ratio=0%,  New best char error = 14.09 wrote best 
model:/home/shree/tesstutorial/nor_layer/norlayer14.09_1065.lstm wrote 
checkpoint.

2 Percent improvement time=61, best error was 14.09 @ 1065
At iteration 1126/1300/1300, Mean rms=1.469%, delta=3.05%, char train=10.061%, 
word train=23.982%, skip ratio=0%,  New best char error = 10.061 wrote best 
model:/home/shree/tesstutorial/nor_layer/norlayer10.061_1126.lstm wrote 
checkpoint.

2 Percent improvement time=46, best error was 10.061 @ 1126
At iteration 1172/1400/1400, Mean rms=1.296%, delta=2.367%, char train=7.89%, 
word train=18.967%, skip ratio=0%,  New best char error = 7.89 
Transitioned to stage 1 wrote best 
model:/home/shree/tesstutorial/nor_layer/norlayer7.89_1172.lstm wrote 
checkpoint.

2 Percent improvement time=90, best error was 10.061 @ 1126
At iteration 1216/1500/1500, Mean rms=1.165%, delta=1.895%, char train=6.419%, 
word train=15.675%, skip ratio=0%,  New best char error = 6.419 wrote best 
model:/home/shree/tesstutorial/nor_layer/norlayer6.419_1216.lstm wrote 
checkpoint.

------------------------
created traineddata with norlayer0.853_1615.lstm using following commands

lstmtraining --model_output ~/tesstutorial/nor_layer/norlayer.lstm \
  --continue_from ~/tesstutorial/nor_layer/norlayer0.853_1615.lstm \
  --stop_training

cp ../tessdata/nor.traineddata ./tessdata
  
combine_tessdata -o ./tessdata/nor.traineddata \
~/tesstutorial/nor_layer/norlayer.lstm \
  ~/tesstutorial/nor/nor.lstm-number-dawg \
  ~/tesstutorial/nor/nor.lstm-punc-dawg \
  ~/tesstutorial/nor/nor.lstm-word-dawg    



Reply via email to