What is output of command "chcp" (in command line)? Zdenko
st 7. 11. 2018 o 2:55 bruce <[email protected]> napísal(a): > hi,zdenop ,thank you for your reply. > my environment is: > windows 7 professional 64bit > tesseract version: > https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v4.0.0.20181030.exe > > test train_txt: > https://drive.google.com/open?id=1BfURsI_HdwaKeowZP0sa8L6GKWgIVDWJ > > test fonts : > https://drive.google.com/open?id=1YZObeYWOzNZbkMTcrCNw3KVlYT7hn1Q6 > > https://drive.google.com/open?id=15C-v4ped8ssFGXW0pSKw6CMSQgW2s0WV > > > I tried the fonts of all Chinese names.All got the same error message.and > the link just two of these fonts. you can test . > I guess the --fonts parameter doesn't support chinese character? > > 在 2018年11月6日星期二 UTC+8下午6:11:00,zdenop写道: >> >> Hello, >> >> Please see bug-report and suggested solution: >> https://github.com/tesseract-ocr/tesseract/issues/1252 >> >> I guess problem is in pango, but we would like to test it. Are you able >> to create simple test case (provide small chi_sim.txt and share font if it >> is possible) for this issue? >> >> Zdenko >> >> >> ut 6. 11. 2018 o 10:56 bruce <[email protected]> napísal(a): >> >>> I use the command as follows to find the fonts I can use to train my >>> language. >>> *text2image.exe --text=chi_sim.txt --outputbase=chi_sim.庞中华行书.exp0 >>> --fints_dir=C:\Windows\Fonts --find_fonts* >>> and i got the result as follows: >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStream PRC failed >>> with 414359 hits = 100.00% >>> Font MSung PRC failed >>> with 414359 hits = 100.00% >>> Font MSung PRC failed >>> with 414359 hits = 100.00% >>> 庞中华行书 Light : 414361 >>> hits = 100.00%, raw = 3440 = 100.00% >>> Font 剑客毛笔行书 failed with >>> 414357 hits = 100.00% >>> Font 可可漫雪体 failed with >>> 414360 hits = 100.00% >>> Font 多米手写体 failed with >>> 414253 hits = 99.97% >>> Font 字体中国-锐博体V1 failed >>> with 414359 hits = 100.00% >>> Font 孙运和酷楷 failed with >>> 414359 hits = 100.00% >>> Font 建刚静心楷 failed with >>> 414359 hits = 100.00% >>> Font 张维镜手写楷书 Medium >>> failed with 410014 hits = 98.95% >>> Font 徐金如硬笔行楷X failed >>> with 413042 hits = 99.68% >>> >>> >>> >>> Than I use command like this:*text2image.exe --text=chi_sim.txt >>> --outputbase=chi_sim.庞中华行书.exp0 --ptsize 36 --font "庞中华行书" --fonts_dir >>> C:\Windows\Fonts* >>> I got an error resut as follows: >>> Could not find font >>> named '庞中华行书'. >>> Pango suggested font >>> 'MingLiU'. >>> Please correct --font arg. >>> >>> text2image not support chinese name fonts?How could i use these chinese >>> name fonts? >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/a9a31397-9196-4923-aa79-43d151d534a1%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/a9a31397-9196-4923-aa79-43d151d534a1%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> > 在 2018年11月6日星期二 UTC+8下午6:11:00,zdenop写道: >> >> Hello, >> >> Please see bug-report and suggested solution: >> https://github.com/tesseract-ocr/tesseract/issues/1252 >> >> I guess problem is in pango, but we would like to test it. Are you able >> to create simple test case (provide small chi_sim.txt and share font if it >> is possible) for this issue? >> >> Zdenko >> >> >> ut 6. 11. 2018 o 10:56 bruce <[email protected]> napísal(a): >> >>> I use the command as follows to find the fonts I can use to train my >>> language. >>> *text2image.exe --text=chi_sim.txt --outputbase=chi_sim.庞中华行书.exp0 >>> --fints_dir=C:\Windows\Fonts --find_fonts* >>> and i got the result as follows: >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStream PRC failed >>> with 414359 hits = 100.00% >>> Font MSung PRC failed >>> with 414359 hits = 100.00% >>> Font MSung PRC failed >>> with 414359 hits = 100.00% >>> 庞中华行书 Light : 414361 >>> hits = 100.00%, raw = 3440 = 100.00% >>> Font 剑客毛笔行书 failed with >>> 414357 hits = 100.00% >>> Font 可可漫雪体 failed with >>> 414360 hits = 100.00% >>> Font 多米手写体 failed with >>> 414253 hits = 99.97% >>> Font 字体中国-锐博体V1 failed >>> with 414359 hits = 100.00% >>> Font 孙运和酷楷 failed with >>> 414359 hits = 100.00% >>> Font 建刚静心楷 failed with >>> 414359 hits = 100.00% >>> Font 张维镜手写楷书 Medium >>> failed with 410014 hits = 98.95% >>> Font 徐金如硬笔行楷X failed >>> with 413042 hits = 99.68% >>> >>> >>> >>> Than I use command like this:*text2image.exe --text=chi_sim.txt >>> --outputbase=chi_sim.庞中华行书.exp0 --ptsize 36 --font "庞中华行书" --fonts_dir >>> C:\Windows\Fonts* >>> I got an error resut as follows: >>> Could not find font >>> named '庞中华行书'. >>> Pango suggested font >>> 'MingLiU'. >>> Please correct --font arg. >>> >>> text2image not support chinese name fonts?How could i use these chinese >>> name fonts? >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/a9a31397-9196-4923-aa79-43d151d534a1%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/a9a31397-9196-4923-aa79-43d151d534a1%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> > 在 2018年11月6日星期二 UTC+8下午6:11:00,zdenop写道: >> >> Hello, >> >> Please see bug-report and suggested solution: >> https://github.com/tesseract-ocr/tesseract/issues/1252 >> >> I guess problem is in pango, but we would like to test it. Are you able >> to create simple test case (provide small chi_sim.txt and share font if it >> is possible) for this issue? >> >> Zdenko >> >> >> ut 6. 11. 2018 o 10:56 bruce <[email protected]> napísal(a): >> >>> I use the command as follows to find the fonts I can use to train my >>> language. >>> *text2image.exe --text=chi_sim.txt --outputbase=chi_sim.庞中华行书.exp0 >>> --fints_dir=C:\Windows\Fonts --find_fonts* >>> and i got the result as follows: >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStream PRC failed >>> with 414359 hits = 100.00% >>> Font MSung PRC failed >>> with 414359 hits = 100.00% >>> Font MSung PRC failed >>> with 414359 hits = 100.00% >>> 庞中华行书 Light : 414361 >>> hits = 100.00%, raw = 3440 = 100.00% >>> Font 剑客毛笔行书 failed with >>> 414357 hits = 100.00% >>> Font 可可漫雪体 failed with >>> 414360 hits = 100.00% >>> Font 多米手写体 failed with >>> 414253 hits = 99.97% >>> Font 字体中国-锐博体V1 failed >>> with 414359 hits = 100.00% >>> Font 孙运和酷楷 failed with >>> 414359 hits = 100.00% >>> Font 建刚静心楷 failed with >>> 414359 hits = 100.00% >>> Font 张维镜手写楷书 Medium >>> failed with 410014 hits = 98.95% >>> Font 徐金如硬笔行楷X failed >>> with 413042 hits = 99.68% >>> >>> >>> >>> Than I use command like this:*text2image.exe --text=chi_sim.txt >>> --outputbase=chi_sim.庞中华行书.exp0 --ptsize 36 --font "庞中华行书" --fonts_dir >>> C:\Windows\Fonts* >>> I got an error resut as follows: >>> Could not find font >>> named '庞中华行书'. >>> Pango suggested font >>> 'MingLiU'. >>> Please correct --font arg. >>> >>> text2image not support chinese name fonts?How could i use these chinese >>> name fonts? >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/a9a31397-9196-4923-aa79-43d151d534a1%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/a9a31397-9196-4923-aa79-43d151d534a1%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> > 在 2018年11月6日星期二 UTC+8下午6:11:00,zdenop写道: >> >> Hello, >> >> Please see bug-report and suggested solution: >> https://github.com/tesseract-ocr/tesseract/issues/1252 >> >> I guess problem is in pango, but we would like to test it. Are you able >> to create simple test case (provide small chi_sim.txt and share font if it >> is possible) for this issue? >> >> Zdenko >> >> >> ut 6. 11. 2018 o 10:56 bruce <[email protected]> napísal(a): >> >>> I use the command as follows to find the fonts I can use to train my >>> language. >>> *text2image.exe --text=chi_sim.txt --outputbase=chi_sim.庞中华行书.exp0 >>> --fints_dir=C:\Windows\Fonts --find_fonts* >>> and i got the result as follows: >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStream PRC failed >>> with 414359 hits = 100.00% >>> Font MSung PRC failed >>> with 414359 hits = 100.00% >>> Font MSung PRC failed >>> with 414359 hits = 100.00% >>> 庞中华行书 Light : 414361 >>> hits = 100.00%, raw = 3440 = 100.00% >>> Font 剑客毛笔行书 failed with >>> 414357 hits = 100.00% >>> Font 可可漫雪体 failed with >>> 414360 hits = 100.00% >>> Font 多米手写体 failed with >>> 414253 hits = 99.97% >>> Font 字体中国-锐博体V1 failed >>> with 414359 hits = 100.00% >>> Font 孙运和酷楷 failed with >>> 414359 hits = 100.00% >>> Font 建刚静心楷 failed with >>> 414359 hits = 100.00% >>> Font 张维镜手写楷书 Medium >>> failed with 410014 hits = 98.95% >>> Font 徐金如硬笔行楷X failed >>> with 413042 hits = 99.68% >>> >>> >>> >>> Than I use command like this:*text2image.exe --text=chi_sim.txt >>> --outputbase=chi_sim.庞中华行书.exp0 --ptsize 36 --font "庞中华行书" --fonts_dir >>> C:\Windows\Fonts* >>> I got an error resut as follows: >>> Could not find font >>> named '庞中华行书'. >>> Pango suggested font >>> 'MingLiU'. >>> Please correct --font arg. >>> >>> text2image not support chinese name fonts?How could i use these chinese >>> name fonts? >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/a9a31397-9196-4923-aa79-43d151d534a1%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/a9a31397-9196-4923-aa79-43d151d534a1%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> > 在 2018年11月6日星期二 UTC+8下午6:11:00,zdenop写道: >> >> Hello, >> >> Please see bug-report and suggested solution: >> https://github.com/tesseract-ocr/tesseract/issues/1252 >> >> I guess problem is in pango, but we would like to test it. Are you able >> to create simple test case (provide small chi_sim.txt and share font if it >> is possible) for this issue? >> >> Zdenko >> >> >> ut 6. 11. 2018 o 10:56 bruce <[email protected]> napísal(a): >> >>> I use the command as follows to find the fonts I can use to train my >>> language. >>> *text2image.exe --text=chi_sim.txt --outputbase=chi_sim.庞中华行书.exp0 >>> --fints_dir=C:\Windows\Fonts --find_fonts* >>> and i got the result as follows: >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStream PRC failed >>> with 414359 hits = 100.00% >>> Font MSung PRC failed >>> with 414359 hits = 100.00% >>> Font MSung PRC failed >>> with 414359 hits = 100.00% >>> 庞中华行书 Light : 414361 >>> hits = 100.00%, raw = 3440 = 100.00% >>> Font 剑客毛笔行书 failed with >>> 414357 hits = 100.00% >>> Font 可可漫雪体 failed with >>> 414360 hits = 100.00% >>> Font 多米手写体 failed with >>> 414253 hits = 99.97% >>> Font 字体中国-锐博体V1 failed >>> with 414359 hits = 100.00% >>> Font 孙运和酷楷 failed with >>> 414359 hits = 100.00% >>> Font 建刚静心楷 failed with >>> 414359 hits = 100.00% >>> Font 张维镜手写楷书 Medium >>> failed with 410014 hits = 98.95% >>> Font 徐金如硬笔行楷X failed >>> with 413042 hits = 99.68% >>> >>> >>> >>> Than I use command like this:*text2image.exe --text=chi_sim.txt >>> --outputbase=chi_sim.庞中华行书.exp0 --ptsize 36 --font "庞中华行书" --fonts_dir >>> C:\Windows\Fonts* >>> I got an error resut as follows: >>> Could not find font >>> named '庞中华行书'. >>> Pango suggested font >>> 'MingLiU'. >>> Please correct --font arg. >>> >>> text2image not support chinese name fonts?How could i use these chinese >>> name fonts? >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/a9a31397-9196-4923-aa79-43d151d534a1%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/a9a31397-9196-4923-aa79-43d151d534a1%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> > 在 2018年11月6日星期二 UTC+8下午6:11:00,zdenop写道: >> >> Hello, >> >> Please see bug-report and suggested solution: >> https://github.com/tesseract-ocr/tesseract/issues/1252 >> >> I guess problem is in pango, but we would like to test it. Are you able >> to create simple test case (provide small chi_sim.txt and share font if it >> is possible) for this issue? >> >> Zdenko >> >> >> ut 6. 11. 2018 o 10:56 bruce <[email protected]> napísal(a): >> >>> I use the command as follows to find the fonts I can use to train my >>> language. >>> *text2image.exe --text=chi_sim.txt --outputbase=chi_sim.庞中华行书.exp0 >>> --fints_dir=C:\Windows\Fonts --find_fonts* >>> and i got the result as follows: >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStiffHeiPRC failed >>> with 414359 hits = 100.00% >>> Font MStream PRC failed >>> with 414359 hits = 100.00% >>> Font MSung PRC failed >>> with 414359 hits = 100.00% >>> Font MSung PRC failed >>> with 414359 hits = 100.00% >>> 庞中华行书 Light : 414361 >>> hits = 100.00%, raw = 3440 = 100.00% >>> Font 剑客毛笔行书 failed with >>> 414357 hits = 100.00% >>> Font 可可漫雪体 failed with >>> 414360 hits = 100.00% >>> Font 多米手写体 failed with >>> 414253 hits = 99.97% >>> Font 字体中国-锐博体V1 failed >>> with 414359 hits = 100.00% >>> Font 孙运和酷楷 failed with >>> 414359 hits = 100.00% >>> Font 建刚静心楷 failed with >>> 414359 hits = 100.00% >>> Font 张维镜手写楷书 Medium >>> failed with 410014 hits = 98.95% >>> Font 徐金如硬笔行楷X failed >>> with 413042 hits = 99.68% >>> >>> >>> >>> Than I use command like this:*text2image.exe --text=chi_sim.txt >>> --outputbase=chi_sim.庞中华行书.exp0 --ptsize 36 --font "庞中华行书" --fonts_dir >>> C:\Windows\Fonts* >>> I got an error resut as follows: >>> Could not find font >>> named '庞中华行书'. >>> Pango suggested font >>> 'MingLiU'. >>> Please correct --font arg. >>> >>> text2image not support chinese name fonts?How could i use these chinese >>> name fonts? >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/a9a31397-9196-4923-aa79-43d151d534a1%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/a9a31397-9196-4923-aa79-43d151d534a1%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/183909e9-3ff8-44a9-80fd-4b3d8e98ae37%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/183909e9-3ff8-44a9-80fd-4b3d8e98ae37%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y0_gfGDRbdsEmDgbeAN_f1sKKMZa76REx7vGt38JteqA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

