Are you using jpn_vert instead of jpn? I have trained jpn_vert https://github.com/zodiac3539/jpn_vert
On Mon, Jun 3, 2019 at 11:31 AM Shree Devi Kumar <[email protected]> wrote: > tesseract 4 has been trained on line images and hence gives better results > for lines, as far as I have seen. > > On Sun, Jun 2, 2019 at 2:52 PM Jorge Castrillo <[email protected]> > wrote: > >> Hi everyone. I'm making a program on that uses tesseract to get a word >> from a manga with a snipping-tool like program, and translates that word >> with JMdict. >> The thing is tesseract gives weird values for vertical, small selections. >> I'm going to explain it in more detail: >> >> >> Say I get a full horizontal line in Japanese, like the following one: >> >> [image: horizontal_full.jpg] >> The output "元来日本語は漢文に倣い、文字を上" is perfect >> >> Getting a full vertical line gives no problems either: >> >> [image: vertical_full.jpg] >> >> Gives the same correct output. Now if I want to get only words, when >> examining horizontal text there are no problems, while with the vertical >> text the output is almost always (except when examining a Kanji alone) >> wrong, like this: >> >> [image: nih-horizontal.jpg] >> >> >> [image: nih-vertical.jpg] >> >> >> The first one returns 日本語 while the second one returns 髑升田. >> They are both from the same file, same size, same font, yet the results >> vary greatly- >> >> >> Another example, this time from a manga: >> >> [image: ej2full.jpg] >> >> The output is 今日の勝敗よりも, again, correct. >> But going word by word we start to have errors: >> >> [image: eje2-word1.jpg] >> Output 由」〉 >> >> and >> >> [image: ej2-word.jpg] >> Output 健雛 >> >> Why is it that it can examine the full line without problem, but have so >> much trouble getting vertical words? I am using psm 8 for words, but it >> only seems to work with horizontal ones, and I can't get my head around it. >> I've been trying to find a solution to this all day, but without success. >> I'm not an expert programmer by any means, this is more of a college >> project, but any insight would be really, really appreciated. Thank you for >> reading. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/71b34e0f-5713-42d3-9ba0-4926291758cb%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/71b34e0f-5713-42d3-9ba0-4926291758cb%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWb0n%2Bie5ukkq7bRxtuD%2Bx6iQWYV5KK1b19s6yT-NhS1Q%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWb0n%2Bie5ukkq7bRxtuD%2Bx6iQWYV5KK1b19s6yT-NhS1Q%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CA%2BVWkA7FYEVFRz5PV1C98omoK%2BNJfY6Cc6nqg8mKeF%2B8svHp5g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

