Yes, I added half-width characters to the given jpn.training_text and takes it as new jpn.training_text.
在 2017年11月9日星期四 UTC+8上午1:21:45,shree写道: > > does your training text include both half width and normal japanese? > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Wed, Nov 8, 2017 at 4:01 PM, Li Xianglei <[email protected] > <javascript:>> wrote: > >> Hi all, >> >> I'm trying to use tesseract to recognize Japanese on image. >> I found that it get a poor accuracy with the half-width >> Japanese(Katakana). >> I'am trying to improve the accuracy by fine-tuning , >> both [ Fine Tuning for ± a few characters] and [Training Just a >> Few Layers] have been tried, >> it seems may improve the accuracy of half-width Japanese but do a >> lot of harm to the normal Japanese recognition. >> Here is the way I do the fine-turing. >> >> 1 add half-width Japanese to the lang/jpn/jpn.training_text (clone >> from tesseract-ocr/langdata seems train data for v3) >> 2 Create train data by tesstrain.sh >> 3 combine_tessdata -e /usr/local/tesseract/share/tessdata/jpn. >> traineddata(which is best/jpn.traineddata) trainhalfwidth/jpn.lstm >> 4 lstmtraining --model_output trainhalfwidth/jpnhw \ >> --continue_from trainhalfwidth/jpn.lstm \ >> --traineddata trainhalfwidth/jpn/jpn.traineddata\ >> --old_traineddata /usr/local/tesseract/share/tessdata/ >> jpn.traineddata \ >> --train_listfile trainhalfwidth/jpn.training_files.txt >> --max_iterations 3600 &> trainhalfwidth/basetrain.log >> >> Any advice? Thank you >> >> #It seems Ray is working on the train data for lstm, any news so far? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/604e4981-9ca4-48be-980d-999df93f73ed%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/604e4981-9ca4-48be-980d-999df93f73ed%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e1f11578-54b6-49f1-8108-dc4ce14ce11c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

