[tesseract-ocr] train tesseract to improve the half-width Japanese(Katakana) recognition.

Li Xianglei Wed, 08 Nov 2017 09:02:53 -0800

Hi all,
    
      I'm trying to use tesseract to recognize Japanese on image.
      I found that it get a poor accuracy with the  half-width 
Japanese(Katakana).
      I'am trying to improve the accuracy by fine-tuning , 
      both [ Fine Tuning for  ±  a few characters] and [Training Just a Few 
Layers] have been tried,
      it seems may improve the  accuracy of half-width Japanese but do a 
lot of harm to the normal Japanese  recognition.
      Here is the way I do the fine-turing.


   1 add  half-width Japanese to the lang/jpn/jpn.training_text (clone from 
tesseract-ocr/langdata seems train data for v3)
   2 Create train data by tesstrain.sh
   3 combine_tessdata -e /usr/local/tesseract/share/tessdata/jpn.traineddata
(which is best/jpn.traineddata) trainhalfwidth/jpn.lstm
   4 lstmtraining --model_output trainhalfwidth/jpnhw \
                  --continue_from trainhalfwidth/jpn.lstm \
                  --traineddata trainhalfwidth/jpn/jpn.traineddata\
                  --old_traineddata 
/usr/local/tesseract/share/tessdata/jpn.traineddata 
\
                  --train_listfile trainhalfwidth/jpn.training_files.txt 
--max_iterations 
3600 &> trainhalfwidth/basetrain.log

  Any advice? Thank you

   #It seems Ray is working on the train data for lstm, any news so far?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/604e4981-9ca4-48be-980d-999df93f73ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] train tesseract to improve the half-width Japanese(Katakana) recognition.

Reply via email to