<https://stackoverflow.com/posts/79803449/timeline>
I need to train the default eng data, so that it can also recognize new characters. I created box files and lstm files and when running cmd: lstmtraining \ --model_output output/eng_latin \ --continue_from "/c/Program Files/Tesseract-OCR/ tessdata/eng.lstm" \ --append_index 5 \ --net_spec "[Lfx192 O1c129]" \ --traineddata "/c/Program Files/Tesseract-OCR/tessdata/eng.traineddata" \ --train_listfile training/training_files.txt \ --max_iterations 400 getting error Loaded file C:/Program Files/Tesseract-OCR/tessdata/eng.lstm, unpacking... Warning: LSTMTrainer deserialized an LSTMRecognizer! Continuing from C:/Program Files/Tesseract-OCR/tessdata/eng.lstm Appending a new network to an old one!!Warning: given outputs 129 not equal to unicharset of 111. Num outputs,weights in Series: Lfx192:192, 221952 Fc111:111, 21423 Total weights = 243375 Built network:[1,36,0,1[C3,3Ft16]Mp3,3TxyLfys64Lfx96RxLrx96Lfx192Fc111] from request [Lfx192 O1c129] Training parameters: Debug interval = 0, weights = 0.1, learning rate = 0.001, momentum=0.5 null char=110 Deserialize header failed: 1.lstmf Deserialize header failed: 2.lstmf Deserialize header failed: 3.lstmf Deserialize header failed: 4.lstmf Deserialize header failed: 5.lstmf Load of page 0 failed! Load of images failed!! Files data: https://wormhole.app/X6mPda#lT3aG2Jm9u2QquNRyIruMA Note: I am on windows -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/dc3d67d2-f9a1-4271-a5be-e1da77e99c07n%40googlegroups.com.
<<attachment: training.zip>>

