Hello zodiac,

I'm trying to train vertical Japanese, but the documentation is not great 
for vertical language.
Could you briefly describe the steps you took?
Is it line image with text file? Is it vertical line image or horizontal 
line image?

Thank you! :)

On Monday, June 3, 2019 at 4:28:29 PM UTC-4 [email protected] wrote:

> Are you using jpn_vert instead of jpn?
> I have trained jpn_vert 
>
> https://github.com/zodiac3539/jpn_vert  
>
>
> On Mon, Jun 3, 2019 at 11:31 AM Shree Devi Kumar <[email protected]> 
> wrote:
>
>> tesseract 4 has been trained on line images and hence gives better 
>> results for lines, as far as I have seen.
>>
>> On Sun, Jun 2, 2019 at 2:52 PM Jorge Castrillo <[email protected]> 
>> wrote:
>>
>>> Hi everyone. I'm making a program on that uses tesseract to get a word 
>>> from a manga with a snipping-tool like program, and translates that word 
>>> with JMdict.
>>> The thing is tesseract gives weird values for vertical, small 
>>> selections. I'm going to explain it in more detail:
>>>
>>>
>>> Say I get a full horizontal line in Japanese, like  the following one:
>>>
>>> [image: horizontal_full.jpg]
>>> The output "元来日本語は漢文に倣い、文字を上" is perfect
>>>
>>> Getting a full vertical line gives no problems either:
>>>
>>> [image: vertical_full.jpg]
>>>
>>> Gives the same correct output. Now if I want to get only words, when 
>>> examining horizontal text there are no problems, while with the vertical 
>>> text the output is almost always (except when examining a Kanji alone) 
>>> wrong, like this:
>>>
>>> [image: nih-horizontal.jpg]
>>>
>>>
>>> [image: nih-vertical.jpg]
>>>
>>>
>>> The first one returns 日本語 while the second one returns 髑升田.
>>> They are both from the same file, same size, same font, yet the results 
>>> vary greatly-
>>>
>>>
>>> Another example, this time from a manga:
>>>
>>> [image: ej2full.jpg]
>>>
>>> The output is 今日の勝敗よりも, again, correct.
>>> But going word by word we start to have errors:
>>>
>>> [image: eje2-word1.jpg]
>>> Output 由」〉
>>>
>>> and
>>>
>>> [image: ej2-word.jpg]
>>> Output 健雛
>>>
>>> Why is it that it can examine the full line without problem, but have so 
>>> much trouble getting vertical words? I am using psm 8 for words, but it 
>>> only seems to work with horizontal ones, and I can't get my head around it. 
>>> I've been trying to find a solution to this all day, but without success. 
>>> I'm not an expert programmer by any means, this is more of a college 
>>> project, but any insight would be really, really appreciated. Thank you for 
>>> reading.
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/71b34e0f-5713-42d3-9ba0-4926291758cb%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/71b34e0f-5713-42d3-9ba0-4926291758cb%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> -- 
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWb0n%2Bie5ukkq7bRxtuD%2Bx6iQWYV5KK1b19s6yT-NhS1Q%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWb0n%2Bie5ukkq7bRxtuD%2Bx6iQWYV5KK1b19s6yT-NhS1Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>
>
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d41c8b29-c66b-45be-b60f-92520cb4ef49n%40googlegroups.com.

Reply via email to