Hello All,

I want to train tesseract 4.0 LSTM for receipt, So what I am asking related 
to:

   1. Training based on image
   2. Image processing
   3. Add new words to the dictionary 
   

   - I have read the documentation and I think the good option is: 
   *Finetune*. So I need to provide box/tiff before training. 
   

   - I know this command will create box file in under directory in /tmp, 
   So should I edit the box file here or edit and provide it to this command 
   in this case how can I provide it to this command. 




training/tesstrain.sh \
 --fonts_dir /usr/share/fonts \
 --training_text ../langdata/ara/ara.training_text \
 --langdata_dir ../langdata \
 --tessdata_dir ./tessdata \
 --lang ara \
 --linedata_only \
 --noextract_font_properties \
 --exposures "0" \
 --fontlist "Arial" \
 --output_dir ~/tesstutorial/aratest



   - for the image processing I am using the libraries that provided in 
   documentation:https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality 
   , if there are another options for image processing please tell me.
   <https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality>
   - for Adding new words to the dictionary, should I add them directly to 
   ara.wordlist

Any Help, thank you.


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/07d8a690-7837-40f7-8d7b-92651518ec8a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to