I have uploaded modified nor.traineddata at https://github.com/Shreeshrii/tessdata4alpha/blob/master/nor.traineddata
See attached log and info file for commands used in training. It took about 9 hours on my pc - about 1700 iterations only and then my PC froze so I rebooted and created the traineddata for norlayer0.853_1615.lstm i.e. 0.853 % character error rate at iteration number 1615. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Jan 6, 2017 at 5:59 PM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > @Peter, Have you tried the 4.0.0alpha version yet? > > @Ludvig F. Aarstad - Add a layer training worked for adding 'Æ' - I will > upload the new traineddata so that you can test. You will need 4.0.alpha > version for testing. > > Here is couple of the training tifs and OCRed text. > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Fri, Jan 6, 2017 at 5:01 PM, Peter <pe...@peterkrantz.se> wrote: > >> >> >> Den torsdag 5 januari 2017 kl. 04:39:01 UTC+1 skrev shree: >>> >>> Ray is planning to retrain the languages for the new 4.0.0 version >>> sometime in January. So it would be helpful if you could open an issue on >>> https://github.com/tesseract-ocr/langdata/issues with this information. >>> >> >> Is it possible to contribute training data for this effort? I realise >> swedish will not be on top of the list but I think it would be easy to >> involve some of the research community here in contributing training data >> if it could improve the language model. >> >> /Peter >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit https://groups.google.com/d/ms >> gid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXOW8gDtXxKSmavVBocM7ErH3MMOcdZe9ehEYUUW0VNzQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
--------------------------------- // Error rate at which to transition to stage 1. const double kStageTransitionThreshold = 10.0; // Appends <intro_str> iteration learning_iteration()/training_iteration()/ // sample_iteration() to the log_msg. // Delta error is the fraction of timesteps with >0.5 error in the top choice // score. If zero, then the top choice characters are guaranteed correct, // even when there is residue in the RMS error. // Skip ratio measures the difference between sample_iteration_ and // training_iteration_, which reflects the number of unusable samples, // usually due to unencodable truth text, or the text not fitting in the // space for the output. ------------------------------- $ mkdir -p ~/tesstutorial/nor_layer $ combine_tessdata -e ../tessdata/nor.traineddata \ > ~/tesstutorial/nor_layer/nor.lstm Extracting tessdata components from ../tessdata/nor.traineddata Wrote /home/shree/tesstutorial/nor_layer/nor.lstm $ lstmtraining -U ~/tesstutorial/nor/nor.unicharset \ > --script_dir ../langdata --debug_interval 0 \ > --continue_from ~/tesstutorial/nor_layer/nor.lstm \ > --append_index 5 --net_spec '[Lfx256 O1c105]' \ > --model_output ~/tesstutorial/nor_layer/norlayer \ > --train_listfile ~/tesstutorial/nor/nor.training_files.txt \ > --max_iterations 50000 Loaded file /home/shree/tesstutorial/nor_layer/nor.lstm, unpacking... Warning: LSTMTrainer deserialized an LSTMRecognizer! Continuing from /home/shree/tesstutorial/nor_layer/nor.lstm Other case É of é is not in unicharset Other case Ö of ö is not in unicharset Other case Ä of ä is not in unicharset Appending a new network to an old one!!Setting unichar properties Setting properties for script Common Setting properties for script Latin Warning: given outputs 105 not equal to unicharset of 101. Num outputs,weights in serial: Lfx256:256, 394240 Fc101:101, 25957 Total weights = 420197 Built network:[1,0,0,1[C5,5Ft16]Mp3,3Lfys64Lfx128Lrx128Lfx256Fc101] from request [Lfx256 O1c105] Training parameters: Debug interval = 0, weights = 0.1, learning rate = 0.0001, momentum=0.9 Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Arial_Bold.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Arial.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Arial_Bold_Italic.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Arial_Italic.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Century_Schoolbook_L_Medium.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Century_Schoolbook_L_Bold.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Century_Schoolbook_L_Bold_Italic.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Courier_New_Bold.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Century_Schoolbook_L_Italic.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Courier_New_Bold_Italic.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Courier_New.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Courier_New_Italic.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Georgia_Bold.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Georgia_Bold_Italic.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Georgia.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Georgia_Italic.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Times_New_Roman_Bold.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Times_New_Roman_Bold_Italic.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Times_New_Roman.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Times_New_Roman_Italic.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Trebuchet_MS_Bold.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Trebuchet_MS_Bold_Italic.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Trebuchet_MS.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Trebuchet_MS_Italic.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.URW_Bookman_L_Bold.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.URW_Bookman_L_Bold_Italic.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.URW_Bookman_L_Italic.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Verdana_Bold.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Verdana_Bold_Italic.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Verdana.exp0.lstmf Loaded 241/241 pages (1-241) of document /home/shree/tesstutorial/nor/nor.Verdana_Italic.exp0.lstmf At iteration 100/100/100, Mean rms=6.173%, delta=57.061%, char train=121.258%, word train=100%, skip ratio=0%, New worst char error = 121.258 wrote checkpoint. At iteration 200/200/200, Mean rms=5.509%, delta=43.627%, char train=102.285%, word train=99.873%, skip ratio=0%, New worst char error = 102.285 wrote checkpoint. 2 Percent improvement time=300, best error was 100 @ 0 At iteration 300/300/300, Mean rms=4.804%, delta=33.195%, char train=83.028%, word train=95.586%, skip ratio=0%, New best char error = 83.028 wrote checkpoint. 2 Percent improvement time=100, best error was 83.028 @ 300 At iteration 400/400/400, Mean rms=4.237%, delta=26.762%, char train=68.349%, word train=85.796%, skip ratio=0%, New best char error = 68.349 wrote best model:/home/shree/tesstutorial/nor_layer/norlayer68.349_400.lstm wrote checkpoint. 2 Percent improvement time=100, best error was 68.349 @ 400 At iteration 500/500/500, Mean rms=3.798%, delta=22.466%, char train=57.949%, word train=76.222%, skip ratio=0%, New best char error = 57.949 wrote best model:/home/shree/tesstutorial/nor_layer/norlayer57.949_500.lstm wrote checkpoint. 2 Percent improvement time=100, best error was 57.949 @ 500 At iteration 600/600/600, Mean rms=3.478%, delta=19.53%, char train=50.964%, word train=69.69%, skip ratio=0%, New best char error = 50.964 wrote best model:/home/shree/tesstutorial/nor_layer/norlayer50.964_600.lstm wrote checkpoint. 2 Percent improvement time=97, best error was 50.964 @ 600 At iteration 697/700/700, Mean rms=3.217%, delta=17.259%, char train=45.254%, word train=63.256%, skip ratio=0%, New best char error = 45.254 wrote best model:/home/shree/tesstutorial/nor_layer/norlayer45.254_697.lstm wrote checkpoint. 2 Percent improvement time=91, best error was 45.254 @ 697 At iteration 788/800/800, Mean rms=2.98%, delta=15.395%, char train=40.512%, word train=57.614%, skip ratio=0%, New best char error = 40.512 wrote best model:/home/shree/tesstutorial/nor_layer/norlayer40.512_788.lstm wrote checkpoint. 2 Percent improvement time=78, best error was 40.512 @ 788 At iteration 866/900/900, Mean rms=2.785%, delta=13.893%, char train=36.821%, word train=53.239%, skip ratio=0%, New best char error = 36.821 wrote best model:/home/shree/tesstutorial/nor_layer/norlayer36.821_866.lstm wrote checkpoint. 2 Percent improvement time=67, best error was 36.821 @ 866 At iteration 933/1000/1000, Mean rms=2.618%, delta=12.657%, char train=33.723%, word train=49.372%, skip ratio=0%, New best char error = 33.723 wrote best model:/home/shree/tesstutorial/nor_layer/norlayer33.723_933.lstm wrote checkpoint. 2 Percent improvement time=65, best error was 33.723 @ 933 At iteration 998/1100/1100, Mean rms=2.105%, delta=7.091%, char train=22.057%, word train=40.57%, skip ratio=0%, New best char error = 22.057 wrote best model:/home/shree/tesstutorial/nor_layer/norlayer22.057_998.lstm wrote checkpoint. 2 Percent improvement time=67, best error was 22.057 @ 998 At iteration 1065/1200/1200, Mean rms=1.717%, delta=4.18%, char train=14.09%, word train=31.585%, skip ratio=0%, New best char error = 14.09 wrote best model:/home/shree/tesstutorial/nor_layer/norlayer14.09_1065.lstm wrote checkpoint. 2 Percent improvement time=61, best error was 14.09 @ 1065 At iteration 1126/1300/1300, Mean rms=1.469%, delta=3.05%, char train=10.061%, word train=23.982%, skip ratio=0%, New best char error = 10.061 wrote best model:/home/shree/tesstutorial/nor_layer/norlayer10.061_1126.lstm wrote checkpoint. 2 Percent improvement time=46, best error was 10.061 @ 1126 At iteration 1172/1400/1400, Mean rms=1.296%, delta=2.367%, char train=7.89%, word train=18.967%, skip ratio=0%, New best char error = 7.89 Transitioned to stage 1 wrote best model:/home/shree/tesstutorial/nor_layer/norlayer7.89_1172.lstm wrote checkpoint. 2 Percent improvement time=90, best error was 10.061 @ 1126 At iteration 1216/1500/1500, Mean rms=1.165%, delta=1.895%, char train=6.419%, word train=15.675%, skip ratio=0%, New best char error = 6.419 wrote best model:/home/shree/tesstutorial/nor_layer/norlayer6.419_1216.lstm wrote checkpoint. ------------------------ created traineddata with norlayer0.853_1615.lstm using following commands lstmtraining --model_output ~/tesstutorial/nor_layer/norlayer.lstm \ --continue_from ~/tesstutorial/nor_layer/norlayer0.853_1615.lstm \ --stop_training cp ../tessdata/nor.traineddata ./tessdata combine_tessdata -o ./tessdata/nor.traineddata \ ~/tesstutorial/nor_layer/norlayer.lstm \ ~/tesstutorial/nor/nor.lstm-number-dawg \ ~/tesstutorial/nor/nor.lstm-punc-dawg \ ~/tesstutorial/nor/nor.lstm-word-dawg