does your training text include both half width and normal japanese? ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Nov 8, 2017 at 4:01 PM, Li Xianglei <[email protected]> wrote: > Hi all, > > I'm trying to use tesseract to recognize Japanese on image. > I found that it get a poor accuracy with the half-width > Japanese(Katakana). > I'am trying to improve the accuracy by fine-tuning , > both [ Fine Tuning for ± a few characters] and [Training Just a > Few Layers] have been tried, > it seems may improve the accuracy of half-width Japanese but do a > lot of harm to the normal Japanese recognition. > Here is the way I do the fine-turing. > > 1 add half-width Japanese to the lang/jpn/jpn.training_text (clone > from tesseract-ocr/langdata seems train data for v3) > 2 Create train data by tesstrain.sh > 3 combine_tessdata -e /usr/local/tesseract/share/tessdata/jpn. > traineddata(which is best/jpn.traineddata) trainhalfwidth/jpn.lstm > 4 lstmtraining --model_output trainhalfwidth/jpnhw \ > --continue_from trainhalfwidth/jpn.lstm \ > --traineddata trainhalfwidth/jpn/jpn.traineddata\ > --old_traineddata /usr/local/tesseract/share/tessdata/ > jpn.traineddata \ > --train_listfile trainhalfwidth/jpn.training_files.txt > --max_iterations 3600 &> trainhalfwidth/basetrain.log > > Any advice? Thank you > > #It seems Ray is working on the train data for lstm, any news so far? > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/604e4981-9ca4-48be-980d-999df93f73ed% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/604e4981-9ca4-48be-980d-999df93f73ed%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUat7aYmnKE%3Dnnfjp1xcSfMNGsOL5bENH7EzXxro%3D1z8Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

