Good morning everyone, First of all I found a similar problem on this post, although the solutions didn't seem to help me: https://groups.google.com/forum/#!msg/tesseract-ocr/O8EEFSSj7_I/aRCIzGbvAgAJ
So the question is, after various iterations on hundreds of pages, shound't the output traneddata size be diferent than the input? Mine is always the same. I'm training using my own set of images, here's what i'm doing: 1 - Create box files 2 - Create lstm models 3 - start lstm training using: lstmtraining \ --model_output output/por \ --continue_from por.lstm \ --traineddata tesseract/tessdata/por.traineddata \ --max_iterations 400\ --train_listfile train/por.training_files.txt 4 - after training is complete: lstmtraining \ --stop_training \ --continue_from output/por_checkpoint \ --traineddata tesseract/tessdata/por.traineddata \ --model_output por_NEW.trainneddata Am I doing something wrong? Or the trained files(input and result) should really have the same EXACTLY size? Thanks in advance -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d9a94578-2ede-42b8-a071-9580fcee1ac2%40googlegroups.com.