Are you using jpn_vert instead of jpn?
I have trained jpn_vert

https://github.com/zodiac3539/jpn_vert


On Mon, Jun 3, 2019 at 11:31 AM Shree Devi Kumar <[email protected]>
wrote:

> tesseract 4 has been trained on line images and hence gives better results
> for lines, as far as I have seen.
>
> On Sun, Jun 2, 2019 at 2:52 PM Jorge Castrillo <[email protected]>
> wrote:
>
>> Hi everyone. I'm making a program on that uses tesseract to get a word
>> from a manga with a snipping-tool like program, and translates that word
>> with JMdict.
>> The thing is tesseract gives weird values for vertical, small selections.
>> I'm going to explain it in more detail:
>>
>>
>> Say I get a full horizontal line in Japanese, like  the following one:
>>
>> [image: horizontal_full.jpg]
>> The output "元来日本語は漢文に倣い、文字を上" is perfect
>>
>> Getting a full vertical line gives no problems either:
>>
>> [image: vertical_full.jpg]
>>
>> Gives the same correct output. Now if I want to get only words, when
>> examining horizontal text there are no problems, while with the vertical
>> text the output is almost always (except when examining a Kanji alone)
>> wrong, like this:
>>
>> [image: nih-horizontal.jpg]
>>
>>
>> [image: nih-vertical.jpg]
>>
>>
>> The first one returns 日本語 while the second one returns 髑升田.
>> They are both from the same file, same size, same font, yet the results
>> vary greatly-
>>
>>
>> Another example, this time from a manga:
>>
>> [image: ej2full.jpg]
>>
>> The output is 今日の勝敗よりも, again, correct.
>> But going word by word we start to have errors:
>>
>> [image: eje2-word1.jpg]
>> Output 由」〉
>>
>> and
>>
>> [image: ej2-word.jpg]
>> Output 健雛
>>
>> Why is it that it can examine the full line without problem, but have so
>> much trouble getting vertical words? I am using psm 8 for words, but it
>> only seems to work with horizontal ones, and I can't get my head around it.
>> I've been trying to find a solution to this all day, but without success.
>> I'm not an expert programmer by any means, this is more of a college
>> project, but any insight would be really, really appreciated. Thank you for
>> reading.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/71b34e0f-5713-42d3-9ba0-4926291758cb%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/71b34e0f-5713-42d3-9ba0-4926291758cb%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWb0n%2Bie5ukkq7bRxtuD%2Bx6iQWYV5KK1b19s6yT-NhS1Q%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWb0n%2Bie5ukkq7bRxtuD%2Bx6iQWYV5KK1b19s6yT-NhS1Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CA%2BVWkA7FYEVFRz5PV1C98omoK%2BNJfY6Cc6nqg8mKeF%2B8svHp5g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to