Hi, I'm trying to do fine tuning of an existing model using line images and text labels. I'm running this version:
tesseract 4.0.0-beta.3-56-g5fda leptonica-1.76.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0 Found AVX2 Found AVX Found SSE I used OCR-D to generate lstmf files for the demo data. If I run the make command it works fine. make training MODEL_NAME=prova Now I isolated this command from the build: lstmtraining \ --traineddata data/prova/prova.traineddata \ --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c`head -n1 data/unicharset`]" \ --model_output data/checkpoints/prova \ --learning_rate 20e-4 \ --train_listfile data/list.train \ --eval_listfile data/list.eval \ --max_iterations 10000 and it works fine. Now I'm trying to modify it to fine tune the existing eng model. I made a few attempts, all ending into different errors (see the attached file for full output). I used: combine_tessdata -e /usr/local/share/tessdata/eng.traineddata extracted/eng.lstm to extract the eng.lstm model. This seems to works but I'm not sure it is the correct. lstmtraining \ --continue_from extracted/eng.lstm \ --traineddata data/prova/prova.traineddata \ --old_traineddata extracted/eng.traineddata \ --model_output data/checkpoints/prova \ --learning_rate 20e-4 \ --train_listfile data/list.train \ --eval_listfile data/list.eval \ --max_iterations 10000 (extracted/eng.traineddata is just a copy of eng.traineddata) The training resume exactly with the RMS of prova_checkpoint (6%) so it looks like it is training from that checkpoint, not the eng.lstm. Is this correct? What should I change? I'm following this guide: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters I think continue_from and traineddata should refer to the eng model and old_traineddata should point to prova.traineddata, but if I do that I get a segmentation fault: [...] !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 Segmentation fault What am I missing? Thanks, bye Lorenzo -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLyOJN31PdWQumXPO3JjuAc1Yz2BZYpMd4ftzBHgZkEaxA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
aaa@host .../DATA/DeepLearning/ocrd-train $ lstmtraining \ > --continue_from extracted/eng.lstm \ > --traineddata extracted/eng.traineddata \ > --model_output data/checkpoints/prova \ > --learning_rate 20e-4 \ > --train_listfile data/list.train \ > --eval_listfile data/list.eval \ > --max_iterations 10000 Loaded file data/checkpoints/prova_checkpoint, unpacking... Code range changed from 60 to 111! Must supply the old traineddata for code conversion! Loaded file extracted/eng.lstm, unpacking... Warning: LSTMTrainer deserialized an LSTMRecognizer! Continuing from extracted/eng.lstm Loaded 1/1 pages (1-1) of document data/train/hoffmann_nachtstuecke01_1817_0204_003.lstmf Loaded 1/1 pages (1-1) of document data/train/wienbarg_feldzuege_1834_0175_017.lstmf Loaded 1/1 pages (1-1) of document data/train/heine_reisebilder02_1827_0056_003.lstmf Loaded 1/1 pages (1-1) of document data/train/keller_sinngedicht_1882_0040_012.lstmf Loaded 1/1 pages (1-1) of document data/train/spielhagen_problematische02_1861_0151_022.lstmf Loaded 1/1 pages (1-1) of document data/train/wienbarg_feldzuege_1834_0127_011.lstmf Loaded 1/1 pages (1-1) of document data/train/wackenroder_herzensergiessungen_1797_0051_001.lstmf Loaded 1/1 pages (1-1) of document data/train/keller_sinngedicht_1882_0304_002.lstmf Loaded 1/1 pages (1-1) of document data/train/bismarck_erinnerungen02_1898_0150_012.lstmf Loaded 1/1 pages (1-1) of document data/train/paul_flegeljahre01_1804_0142_005.lstmf Loaded 1/1 pages (1-1) of document data/train/alexis_ruhe01_1852_0311_011.lstmf !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 Segmentation fault aaa@host .../DATA/DeepLearning/ocrd-train $ aaa@host .../DATA/DeepLearning/ocrd-train $ # prova 1.1 aaa@host .../DATA/DeepLearning/ocrd-train $ lstmtraining \ > --continue_from data/checkpoints/prova_checkpoint \ > --traineddata extracted/eng.traineddata \ > --old_traineddata data/prova/prova.traineddata \ > --model_output data/checkpoints/prova \ > --learning_rate 20e-4 \ > --train_listfile data/list.train \ > --eval_listfile data/list.eval \ > --max_iterations 10000 Loaded file data/checkpoints/prova_checkpoint, unpacking... Code range changed from 60 to 111! Must supply the old traineddata for code conversion! Loaded file data/checkpoints/prova_checkpoint, unpacking... Code range changed from 60 to 111! Num (Extended) outputs,weights in Series: 1,36,0,1:1, 0 Num (Extended) outputs,weights in Series: C3,3:9, 0 Ft16:16, 160 Total weights = 160 [C3,3Ft16]:16, 160 Mp3,3:16, 0 Lfys48:48, 12480 Lfx96:96, 55680 Lrx96:96, 74112 Lfx256:256, 361472 Fc111:111, 28527 Total weights = 532431 Previous null char=59 mapped to 110 Continuing from data/checkpoints/prova_checkpoint Loaded 1/1 pages (1-1) of document data/train/hoffmann_nachtstuecke01_1817_0204_003.lstmf Loaded 1/1 pages (1-1) of document data/train/keller_sinngedicht_1882_0040_012.lstmf Loaded 1/1 pages (1-1) of document data/train/heine_reisebilder02_1827_0056_003.lstmf Loaded 1/1 pages (1-1) of document data/train/spielhagen_problematische02_1861_0151_022.lstmf Loaded 1/1 pages (1-1) of document data/train/wienbarg_feldzuege_1834_0127_011.lstmf Loaded 1/1 pages (1-1) of document data/train/wienbarg_feldzuege_1834_0175_017.lstmf Loaded 1/1 pages (1-1) of document data/train/wackenroder_herzensergiessungen_1797_0051_001.lstmf Loaded 1/1 pages (1-1) of document data/train/keller_sinngedicht_1882_0304_002.lstmf Loaded 1/1 pages (1-1) of document data/train/bismarck_erinnerungen02_1898_0150_012.lstmf Loaded 1/1 pages (1-1) of document data/train/paul_flegeljahre01_1804_0142_005.lstmf Loaded 1/1 pages (1-1) of document data/train/alexis_ruhe01_1852_0311_011.lstmf Loaded 1/1 pages (1-1) of document data/train/eichendorff_taugenichts_1826_0036_001.lstmf Segmentation fault aaa@host .../DATA/DeepLearning/ocrd-train $ aaa@host .../DATA/DeepLearning/ocrd-train $ # Prova 2, da eng.traineddata aaa@host .../DATA/DeepLearning/ocrd-train $ lstmtraining \ > --continue_from extracted/eng.lstm \ > --traineddata extracted/eng.traineddata \ > --model_output data/checkpoints/prova \ > --learning_rate 20e-4 \ > --train_listfile data/list.train \ > --eval_listfile data/list.eval \ > --max_iterations 10000 Loaded file data/checkpoints/prova_checkpoint, unpacking... Code range changed from 60 to 111! Must supply the old traineddata for code conversion! Loaded file extracted/eng.lstm, unpacking... Warning: LSTMTrainer deserialized an LSTMRecognizer! Continuing from extracted/eng.lstm Loaded 1/1 pages (1-1) of document data/train/hoffmann_nachtstuecke01_1817_0204_003.lstmf Loaded 1/1 pages (1-1) of document data/train/heine_reisebilder02_1827_0056_003.lstmf Loaded 1/1 pages (1-1) of document data/train/spielhagen_problematische02_1861_0151_022.lstmf Loaded 1/1 pages (1-1) of document data/train/wienbarg_feldzuege_1834_0127_011.lstmf Loaded 1/1 pages (1-1) of document data/train/keller_sinngedicht_1882_0040_012.lstmf Loaded 1/1 pages (1-1) of document data/train/wienbarg_feldzuege_1834_0175_017.lstmf Loaded 1/1 pages (1-1) of document data/train/wackenroder_herzensergiessungen_1797_0051_001.lstmf Loaded 1/1 pages (1-1) of document data/train/bismarck_erinnerungen02_1898_0150_012.lstmf Loaded 1/1 pages (1-1) of document data/train/keller_sinngedicht_1882_0304_002.lstmf Loaded 1/1 pages (1-1) of document data/train/paul_flegeljahre01_1804_0142_005.lstmf Loaded 1/1 pages (1-1) of document data/train/alexis_ruhe01_1852_0311_011.lstmf !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 Segmentation fault aaa@host .../DATA/DeepLearning/ocrd-train $ aaa@host .../DATA/DeepLearning/ocrd-train $ # prova 3, with old_traineddata (this works but uses the prova checkpoint) aaa@host .../DATA/DeepLearning/ocrd-train $ aaa@host .../DATA/DeepLearning/ocrd-train $ lstmtraining \ > --continue_from extracted/eng.lstm \ > --traineddata data/prova/prova.traineddata \ > --old_traineddata extracted/eng.traineddata \ > --model_output data/checkpoints/prova \ > --learning_rate 20e-4 \ > --train_listfile data/list.train \ > --eval_listfile data/list.eval \ > --max_iterations 10000 Loaded file data/checkpoints/prova_checkpoint, unpacking... Successfully restored trainer from data/checkpoints/prova_checkpoint Loaded 1/1 pages (1-1) of document data/train/hoffmann_nachtstuecke01_1817_0204_003.lstmf Loaded 1/1 pages (1-1) of document data/train/keller_sinngedicht_1882_0040_012.lstmf Loaded 1/1 pages (1-1) of document data/train/heine_reisebilder02_1827_0056_003.lstmf Loaded 1/1 pages (1-1) of document data/train/spielhagen_problematische02_1861_0151_022.lstmf Loaded 1/1 pages (1-1) of document data/train/wienbarg_feldzuege_1834_0127_011.lstmf Loaded 1/1 pages (1-1) of document data/train/wienbarg_feldzuege_1834_0175_017.lstmf Loaded 1/1 pages (1-1) of document data/train/wackenroder_herzensergiessungen_1797_0051_001.lstmf Loaded 1/1 pages (1-1) of document data/train/bismarck_erinnerungen02_1898_0150_012.lstmf Loaded 1/1 pages (1-1) of document data/train/keller_sinngedicht_1882_0304_002.lstmf Loaded 1/1 pages (1-1) of document data/train/paul_flegeljahre01_1804_0142_005.lstmf Loaded 1/1 pages (1-1) of document data/train/menzel_literatur01_1828_0165_021.lstmf Loaded 1/1 pages (1-1) of document data/train/alexis_ruhe01_1852_0311_011.lstmf Loaded 1/1 pages (1-1) of document data/train/wackenroder_herzensergiessungen_1797_0204_018.lstmf Loaded 1/1 pages (1-1) of document data/train/rosenkranz_aesthetik_1853_0167_017.lstmf Loaded 1/1 pages (1-1) of document data/train/frapan_bittersuess_1891_0256_005.lstmf Loaded 1/1 pages (1-1) of document data/train/clauren_liebe_1827_0205_021.lstmf Loaded 1/1 pages (1-1) of document data/train/gutzkow_wally_1835_0143_007.lstmf Loaded 1/1 pages (1-1) of document data/train/paul_flegeljahre01_1804_0057_011.lstmf Loaded 1/1 pages (1-1) of document data/train/keller_sinngedicht_1882_0024_023.lstmf Loaded 1/1 pages (1-1) of document data/train/perthes_buchhandel_1816_0012_016.lstmf Loaded 1/1 pages (1-1) of document data/train/poersch_gewerkschaftsbewegung_1897_0018_008.lstmf ^C # this works, stopped aaa@host .../DATA/DeepLearning/ocrd-train $ # prova 4 aaa@host .../DATA/DeepLearning/ocrd-train $ lstmtraining \ > --continue_from extracted/eng.lstm \ > --old_traineddata data/prova/prova.traineddata \ > --traineddata extracted/eng.traineddata \ > --model_output data/checkpoints/prova \ > --learning_rate 20e-4 \ > --train_listfile data/list.train \ > --eval_listfile data/list.eval \ > --max_iterations 10000 Loaded file data/checkpoints/prova_checkpoint, unpacking... Code range changed from 60 to 111! Must supply the old traineddata for code conversion! Loaded file extracted/eng.lstm, unpacking... Warning: LSTMTrainer deserialized an LSTMRecognizer! Code range changed from 111 to 111! Num (Extended) outputs,weights in Series: 1,36,0,1:1, 0 Num (Extended) outputs,weights in Series: C3,3:9, 0 Ft16:16, 160 Total weights = 160 [C3,3Ft16]:16, 160 Mp3,3:16, 0 Lfys64:64, 20736 Lfx96:96, 61824 Lrx96:96, 74112 Lfx512:512, 1247232 Fc111:111, 56943 Total weights = 1461007 Previous null char=110 mapped to 110 Continuing from extracted/eng.lstm Loaded 1/1 pages (1-1) of document data/train/hoffmann_nachtstuecke01_1817_0204_003.lstmf Loaded 1/1 pages (1-1) of document data/train/spielhagen_problematische02_1861_0151_022.lstmf Loaded 1/1 pages (1-1) of document data/train/wienbarg_feldzuege_1834_0127_011.lstmf Loaded 1/1 pages (1-1) of document data/train/keller_sinngedicht_1882_0040_012.lstmf Loaded 1/1 pages (1-1) of document data/train/heine_reisebilder02_1827_0056_003.lstmf Loaded 1/1 pages (1-1) of document data/train/wienbarg_feldzuege_1834_0175_017.lstmf Loaded 1/1 pages (1-1) of document data/train/bismarck_erinnerungen02_1898_0150_012.lstmf Loaded 1/1 pages (1-1) of document data/train/keller_sinngedicht_1882_0304_002.lstmf Loaded 1/1 pages (1-1) of document data/train/wackenroder_herzensergiessungen_1797_0051_001.lstmf Loaded 1/1 pages (1-1) of document data/train/paul_flegeljahre01_1804_0142_005.lstmf Loaded 1/1 pages (1-1) of document data/train/alexis_ruhe01_1852_0311_011.lstmf !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 Segmentation fault

