See https://groups.google.com/g/tesseract-ocr/c/GFHIZ8hO3c4/m/ieYUckMvBgAJ

On Friday, August 7, 2020 at 10:21:11 AM UTC+5:30 ultra wrote:

> Hello zodiac,
>
> I'm trying to train vertical Japanese, but the documentation is not great 
> for vertical language.
> Could you briefly describe the steps you took?
> Is it line image with text file? Is it vertical line image or horizontal 
> line image?
>
> Thank you! :)
>
> On Monday, June 3, 2019 at 4:28:29 PM UTC-4 [email protected] wrote:
>
>> Are you using jpn_vert instead of jpn?
>> I have trained jpn_vert 
>>
>> https://github.com/zodiac3539/jpn_vert  
>>
>>
>> On Mon, Jun 3, 2019 at 11:31 AM Shree Devi Kumar <[email protected]> 
>> wrote:
>>
>>> tesseract 4 has been trained on line images and hence gives better 
>>> results for lines, as far as I have seen.
>>>
>>> On Sun, Jun 2, 2019 at 2:52 PM Jorge Castrillo <[email protected]> 
>>> wrote:
>>>
>>>> Hi everyone. I'm making a program on that uses tesseract to get a word 
>>>> from a manga with a snipping-tool like program, and translates that word 
>>>> with JMdict.
>>>> The thing is tesseract gives weird values for vertical, small 
>>>> selections. I'm going to explain it in more detail:
>>>>
>>>>
>>>> Say I get a full horizontal line in Japanese, like  the following one:
>>>>
>>>> [image: horizontal_full.jpg]
>>>> The output "元来日本語は漢文に倣い、文字を上" is perfect
>>>>
>>>> Getting a full vertical line gives no problems either:
>>>>
>>>> [image: vertical_full.jpg]
>>>>
>>>> Gives the same correct output. Now if I want to get only words, when 
>>>> examining horizontal text there are no problems, while with the vertical 
>>>> text the output is almost always (except when examining a Kanji alone) 
>>>> wrong, like this:
>>>>
>>>> [image: nih-horizontal.jpg]
>>>>
>>>>
>>>> [image: nih-vertical.jpg]
>>>>
>>>>
>>>> The first one returns 日本語 while the second one returns 髑升田.
>>>> They are both from the same file, same size, same font, yet the results 
>>>> vary greatly-
>>>>
>>>>
>>>> Another example, this time from a manga:
>>>>
>>>> [image: ej2full.jpg]
>>>>
>>>> The output is 今日の勝敗よりも, again, correct.
>>>> But going word by word we start to have errors:
>>>>
>>>> [image: eje2-word1.jpg]
>>>> Output 由」〉
>>>>
>>>> and
>>>>
>>>> [image: ej2-word.jpg]
>>>> Output 健雛
>>>>
>>>> Why is it that it can examine the full line without problem, but have 
>>>> so much trouble getting vertical words? I am using psm 8 for words, but it 
>>>> only seems to work with horizontal ones, and I can't get my head around 
>>>> it. 
>>>> I've been trying to find a solution to this all day, but without success. 
>>>> I'm not an expert programmer by any means, this is more of a college 
>>>> project, but any insight would be really, really appreciated. Thank you 
>>>> for 
>>>> reading.
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/71b34e0f-5713-42d3-9ba0-4926291758cb%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/71b34e0f-5713-42d3-9ba0-4926291758cb%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>> -- 
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>
>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWb0n%2Bie5ukkq7bRxtuD%2Bx6iQWYV5KK1b19s6yT-NhS1Q%40mail.gmail.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWb0n%2Bie5ukkq7bRxtuD%2Bx6iQWYV5KK1b19s6yT-NhS1Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>
>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/42d65783-1264-429d-a5f4-a27ae44f5b65n%40googlegroups.com.

Reply via email to