hi,zdenop
My origin output of chcp is "936"
As you said,I think it should be a problem with console coding.But i  don't 
know how to solve this coding problem.
In the end, I solved this problem in another way.I use software named 
"fontcreator" to modify the name of the fonts and changed the name to 
English.

在 2018年11月9日星期五 UTC+8下午4:44:41,zdenop写道:
>
> I want to know what is origin output of chcp;-)
>
> I think there are (at least) 2 issues:
>
>    1. encoding console problem (windows only - on linux it it correct)
>    2. font related issue (at the moment I am not sure if font itself or 
>    pango or text2image)
>
> Regarding 1.: 
> When I run:
>  text2image.exe --fonts_dir=i1252 --fontconfig_tmpdir=%temp% 
> --list_available_fonts
> I got output:
>   0: ĺ­tčż?ĺ'ŚéćĄ
>   1: 庞中华行äą| Light
>
> When I set chcp 65001 result is still wrong:
>   0: ĺ­™čż ĺ’Śé…·ćĄ·
>   1: 庞中华行书 Light
>
> When the output is redirected to file (text2image.exe --fonts_dir=i1252 
> --fontconfig_tmpdir=%temp% --list_available_fonts >font_list.txt) font 
> names are correct:
>   0: 孙运和酷楷
>   1: 庞中华行书 Light
>
> When I use "wrong console output" text2image is able to find and use font:
> text2image.exe --fonts_dir=i1252  --fontconfig_tmpdir=%temp% --text 
> i1252/chi_sim_test.txt --outputbase=chi_sim.test.exp0 --font="ĺ­™čż 
> 和酷楷", but it crash the same way as on linux (issue 2) as described 
> in issue 1252:
> ERROR: Illegal UTF8 encountered.
> Index 0 char = 0xffffffa2
> Index 1 char = 0xffffffd2
> Index 2 char = 0xffffffd4
> Index 3 char = 0xd
> Index 4 char = 0xa
> WARNING: Illegal UTF8 encountered
>
> ** (text2image.exe:22496): WARNING **: 09:33:51.804: Invalid UTF-8 string 
> passed to pango_layout_set_text()
> **
> ERROR:c:\users\zdeno\.cppan\storage\src\81\8f\8aa5\pango\pango-glyph-item.c:319:pango_glyph_item_iter_next_cluster:
>  
> assertion failed: (iter->start_char < iter->end_char
>
> So one thing is to fix windows issue for correctly handling input/output 
> from/to console (BTW is it UTF-8 or UTF-16), but it will not solve issue 
> that these font are still not usable in text2image.
>
>  Zdenko
>
>
> pi 9. 11. 2018 o 7:33 bruce <[email protected] <javascript:>> napísal(a):
>
>> hi,Zdenko
>>    I have tried the command under two cmd window encodings(chcp 65001 
>> and  chcp 936).
>>    I got the same failure results. 
>>    results as follows:
>> [image: chcp936.png]
>> [image: chcp65001.png]   
>>    
>>
>> 在 2018年11月9日星期五 UTC+8上午5:03:00,zdenop写道:
>>>
>>> What is output of command "chcp" (in command line)?
>>>  
>>> Zdenko
>>>
>>>
>>> st 7. 11. 2018 o 2:55 bruce <[email protected]> napísal(a):
>>>
>>>> hi,zdenop ,thank you for your reply.
>>>> my environment is:
>>>>                              windows 7 professional 64bit
>>>>                              tesseract version:
>>>> https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v4.0.0.20181030.exe
>>>>
>>>> test train_txt: 
>>>> https://drive.google.com/open?id=1BfURsI_HdwaKeowZP0sa8L6GKWgIVDWJ
>>>>
>>>> test fonts :     
>>>> https://drive.google.com/open?id=1YZObeYWOzNZbkMTcrCNw3KVlYT7hn1Q6
>>>>                       
>>>> https://drive.google.com/open?id=15C-v4ped8ssFGXW0pSKw6CMSQgW2s0WV    
>>>>                      
>>>>
>>>> I tried the fonts of all Chinese names.All got the same error 
>>>> message.and the link just two of these fonts. you can test .
>>>> I guess the --fonts parameter doesn't support chinese character?
>>>>
>>>> 在 2018年11月6日星期二 UTC+8下午6:11:00,zdenop写道:
>>>>>
>>>>> Hello,
>>>>>
>>>>> Please see bug-report and suggested solution:
>>>>> https://github.com/tesseract-ocr/tesseract/issues/1252
>>>>>
>>>>> I guess problem is in pango, but we would like to test it. Are you 
>>>>> able to create simple test case (provide small chi_sim.txt and share font 
>>>>> if it is possible) for this issue?
>>>>>
>>>>> Zdenko
>>>>>
>>>>>
>>>>> ut 6. 11. 2018 o 10:56 bruce <[email protected]> napísal(a):
>>>>>
>>>>>> I use the command as follows to find the fonts I can use to train my 
>>>>>> language.
>>>>>> *text2image.exe --text=chi_sim.txt --outputbase=chi_sim.庞中华行书.exp0 
>>>>>> --fints_dir=C:\Windows\Fonts --find_fonts*
>>>>>> and i got the result as follows:
>>>>>>                                                 Font MStiffHeiPRC 
>>>>>> failed with 414359 hits = 100.00%
>>>>>>                                                 Font MStiffHeiPRC 
>>>>>> failed with 414359 hits = 100.00%
>>>>>>                                                 Font MStiffHeiPRC 
>>>>>> failed with 414359 hits = 100.00%
>>>>>>                                                 Font MStiffHeiPRC 
>>>>>> failed with 414359 hits = 100.00%
>>>>>>                                                 Font MStream PRC 
>>>>>> failed with 414359 hits = 100.00%
>>>>>>                                                 Font MSung PRC failed 
>>>>>> with 414359 hits = 100.00%
>>>>>>                                                 Font MSung PRC failed 
>>>>>> with 414359 hits = 100.00%
>>>>>>                                                 庞中华行书 Light : 414361 
>>>>>> hits = 100.00%, raw = 3440 = 100.00%
>>>>>>                                                 Font 剑客毛笔行书 failed 
>>>>>> with 414357 hits = 100.00%
>>>>>>                                                 Font 可可漫雪体 failed 
>>>>>> with 414360 hits = 100.00%
>>>>>>                                                 Font 多米手写体 failed 
>>>>>> with 414253 hits = 99.97%
>>>>>>                                                 Font 字体中国-锐博体V1 
>>>>>> failed with 414359 hits = 100.00%
>>>>>>                                                 Font 孙运和酷楷 failed 
>>>>>> with 414359 hits = 100.00%
>>>>>>                                                 Font 建刚静心楷 failed 
>>>>>> with 414359 hits = 100.00%
>>>>>>                                                 Font 张维镜手写楷书 Medium 
>>>>>> failed with 410014 hits = 98.95%
>>>>>>                                                 Font 徐金如硬笔行楷X failed 
>>>>>> with 413042 hits = 99.68%
>>>>>>
>>>>>>
>>>>>>
>>>>>> Than I use command like this:*text2image.exe --text=chi_sim.txt 
>>>>>> --outputbase=chi_sim.庞中华行书.exp0 --ptsize 36 --font "庞中华行书" --fonts_dir 
>>>>>> C:\Windows\Fonts*
>>>>>> I got an error resut as follows:
>>>>>>                                                Could not find font 
>>>>>> named '庞中华行书'.
>>>>>>                                                Pango suggested font 
>>>>>> 'MingLiU'.
>>>>>>                                                Please correct --font 
>>>>>> arg.
>>>>>>
>>>>>> text2image not support chinese name fonts?How could i use these 
>>>>>> chinese name fonts?
>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a9a31397-9196-4923-aa79-43d151d534a1%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a9a31397-9196-4923-aa79-43d151d534a1%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/63e4ef0a-7754-4ee8-ad8f-7f95dcfef718%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/63e4ef0a-7754-4ee8-ad8f-7f95dcfef718%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1ca247e3-708c-4956-bedf-b8fbb586f10a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to