Yes, I added half-width characters to the given jpn.training_text and takes 
it as new jpn.training_text.

在 2017年11月9日星期四 UTC+8上午1:21:45,shree写道:
>
> does your training text include both half width and normal japanese?
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Wed, Nov 8, 2017 at 4:01 PM, Li Xianglei <[email protected] 
> <javascript:>> wrote:
>
>> Hi all,
>>     
>>       I'm trying to use tesseract to recognize Japanese on image.
>>       I found that it get a poor accuracy with the  half-width 
>> Japanese(Katakana).
>>       I'am trying to improve the accuracy by fine-tuning , 
>>       both [ Fine Tuning for  ±  a few characters] and [Training Just a 
>> Few Layers] have been tried,
>>       it seems may improve the  accuracy of half-width Japanese but do a 
>> lot of harm to the normal Japanese  recognition.
>>       Here is the way I do the fine-turing.
>>
>>    1 add  half-width Japanese to the lang/jpn/jpn.training_text (clone 
>> from tesseract-ocr/langdata seems train data for v3)
>>    2 Create train data by tesstrain.sh
>>    3 combine_tessdata -e /usr/local/tesseract/share/tessdata/jpn.
>> traineddata(which is best/jpn.traineddata) trainhalfwidth/jpn.lstm
>>    4 lstmtraining --model_output trainhalfwidth/jpnhw \
>>                   --continue_from trainhalfwidth/jpn.lstm \
>>                   --traineddata trainhalfwidth/jpn/jpn.traineddata\
>>                   --old_traineddata /usr/local/tesseract/share/tessdata/
>> jpn.traineddata \
>>                   --train_listfile trainhalfwidth/jpn.training_files.txt 
>> --max_iterations 3600 &> trainhalfwidth/basetrain.log
>>
>>   Any advice? Thank you
>>
>>    #It seems Ray is working on the train data for lstm, any news so far?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/604e4981-9ca4-48be-980d-999df93f73ed%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/604e4981-9ca4-48be-980d-999df93f73ed%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e1f11578-54b6-49f1-8108-dc4ce14ce11c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to