tesseract 4 has been trained on line images and hence gives better results
for lines, as far as I have seen.

On Sun, Jun 2, 2019 at 2:52 PM Jorge Castrillo <jorgemcastri...@gmail.com>
wrote:

> Hi everyone. I'm making a program on that uses tesseract to get a word
> from a manga with a snipping-tool like program, and translates that word
> with JMdict.
> The thing is tesseract gives weird values for vertical, small selections.
> I'm going to explain it in more detail:
>
>
> Say I get a full horizontal line in Japanese, like  the following one:
>
> [image: horizontal_full.jpg]
> The output "元来日本語は漢文に倣い、文字を上" is perfect
>
> Getting a full vertical line gives no problems either:
>
> [image: vertical_full.jpg]
>
> Gives the same correct output. Now if I want to get only words, when
> examining horizontal text there are no problems, while with the vertical
> text the output is almost always (except when examining a Kanji alone)
> wrong, like this:
>
> [image: nih-horizontal.jpg]
>
>
> [image: nih-vertical.jpg]
>
>
> The first one returns 日本語 while the second one returns 髑升田.
> They are both from the same file, same size, same font, yet the results
> vary greatly-
>
>
> Another example, this time from a manga:
>
> [image: ej2full.jpg]
>
> The output is 今日の勝敗よりも, again, correct.
> But going word by word we start to have errors:
>
> [image: eje2-word1.jpg]
> Output 由」〉
>
> and
>
> [image: ej2-word.jpg]
> Output 健雛
>
> Why is it that it can examine the full line without problem, but have so
> much trouble getting vertical words? I am using psm 8 for words, but it
> only seems to work with horizontal ones, and I can't get my head around it.
> I've been trying to find a solution to this all day, but without success.
> I'm not an expert programmer by any means, this is more of a college
> project, but any insight would be really, really appreciated. Thank you for
> reading.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/71b34e0f-5713-42d3-9ba0-4926291758cb%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/71b34e0f-5713-42d3-9ba0-4926291758cb%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWb0n%2Bie5ukkq7bRxtuD%2Bx6iQWYV5KK1b19s6yT-NhS1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to