> > Recently I modified the tesstrain_utils.sh and --max_pages=3 option > for text2image command,
Got an error, I mean I modified the tesstrain_utils.sh and *remove* the --max_pages=3 option. 在 2017年11月10日星期五 UTC+8上午10:29:21,Li Xianglei写道: > > Recently I modified the tesstrain_utils.sh and --max_pages=3 option > for text2image command, > it seems the the normal Japanese now can work happlily, but the > half-width characters still in a poor accuracy. > Now I wonder how many characters should I add to the jpn.training_text, > the wiki [ Fine Tuning for ± a few characters] said it should be > 20-repeat of the ±, but I tried about 20-repeat for every half-width > characters and it seems no use. > When the count of repeat came to 30 and it seems getting better but not > good enough, > then I tried the 150-repeat level and the results gone worse. > > 在 2017年11月9日星期四 UTC+8上午8:35:50,Li Xianglei写道: >> >> Yes, I added half-width characters to the given jpn.training_text and >> takes it as new jpn.training_text. >> >> 在 2017年11月9日星期四 UTC+8上午1:21:45,shree写道: >>> >>> does your training text include both half width and normal japanese? >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Wed, Nov 8, 2017 at 4:01 PM, Li Xianglei <[email protected]> wrote: >>> >>>> Hi all, >>>> >>>> I'm trying to use tesseract to recognize Japanese on image. >>>> I found that it get a poor accuracy with the half-width >>>> Japanese(Katakana). >>>> I'am trying to improve the accuracy by fine-tuning , >>>> both [ Fine Tuning for ± a few characters] and [Training Just a >>>> Few Layers] have been tried, >>>> it seems may improve the accuracy of half-width Japanese but do >>>> a lot of harm to the normal Japanese recognition. >>>> Here is the way I do the fine-turing. >>>> >>>> 1 add half-width Japanese to the lang/jpn/jpn.training_text (clone >>>> from tesseract-ocr/langdata seems train data for v3) >>>> 2 Create train data by tesstrain.sh >>>> 3 combine_tessdata -e /usr/local/tesseract/share/tessdata/jpn. >>>> traineddata(which is best/jpn.traineddata) trainhalfwidth/jpn.lstm >>>> 4 lstmtraining --model_output trainhalfwidth/jpnhw \ >>>> --continue_from trainhalfwidth/jpn.lstm \ >>>> --traineddata trainhalfwidth/jpn/jpn.traineddata\ >>>> --old_traineddata /usr/local/tesseract/share/tessdata >>>> /jpn.traineddata \ >>>> --train_listfile trainhalfwidth/jpn.training_files.txt >>>> --max_iterations 3600 &> trainhalfwidth/basetrain.log >>>> >>>> Any advice? Thank you >>>> >>>> #It seems Ray is working on the train data for lstm, any news so far? >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/604e4981-9ca4-48be-980d-999df93f73ed%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/604e4981-9ca4-48be-980d-999df93f73ed%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/88208dd3-41af-496f-a3d4-6c339d05022d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

