Hello All, I want to train tesseract 4.0 LSTM for receipt, So what I am asking related to:
1. Training based on image 2. Image processing 3. Add new words to the dictionary - I have read the documentation and I think the good option is: *Finetune*. So I need to provide box/tiff before training. - I know this command will create box file in under directory in /tmp, So should I edit the box file here or edit and provide it to this command in this case how can I provide it to this command. training/tesstrain.sh \ --fonts_dir /usr/share/fonts \ --training_text ../langdata/ara/ara.training_text \ --langdata_dir ../langdata \ --tessdata_dir ./tessdata \ --lang ara \ --linedata_only \ --noextract_font_properties \ --exposures "0" \ --fontlist "Arial" \ --output_dir ~/tesstutorial/aratest - for the image processing I am using the libraries that provided in documentation:https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality , if there are another options for image processing please tell me. <https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality> - for Adding new words to the dictionary, should I add them directly to ara.wordlist Any Help, thank you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/07d8a690-7837-40f7-8d7b-92651518ec8a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

