I'm not specifying psm explicitly, so it must be 3 = Fully automatic page 
segmentation, but no OSD. (Default)

On Tuesday, December 30, 2014 11:10:05 PM UTC-5, shree wrote:
>
> what page segmentation mode are you using?
>
> https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Wed, Dec 31, 2014 at 6:18 AM, Dan Vanderkam <[email protected] 
> <javascript:>> wrote:
>
>> More context here 
>> <http://stackoverflow.com/questions/27592430/how-can-i-tell-tesseract-that-my-font-has-a-particular-size>.
>>  
>> I'm trying to get Tesseract to split some of its detected boxes in half or 
>> thirds.
>>
>> My approach has been to draw white vertical lines through the joined 
>> letters, so from before:
>>
>>
>> to after:
>>
>>
>> (http://i.imgur.com/TPcCsi0.png)
>> If you can't see the lines, here they are in red:
>>
>> (http://i.imgur.com/MjSa0FS.png)
>>
>> I would have expected that drawing the white lines would split these 
>> boxes apart. It does that, but it also has a side effect: it joins the "9" 
>> on the first line with the "s" below it on the next line:
>>
>> even if I draw a white line below the "9" and the "0", this still 
>> happens. As you might expect, these tall letters wreak havoc on the 
>> resulting OCR'd text.
>>
>> I'm baffled why this is happening. Based on this SO answer 
>> <http://stackoverflow.com/a/27605797/388951>, my understanding was that 
>> Tesseract looked at connected components to find boxes, so I would have 
>> expected the white lines to force apart two components.
>>
>> Is it possible to give Tesseract an explicit list of boxes? If not, is 
>> there a more effective way to force apart two letters than what I'm doing?
>>
>> Thanks!
>>   - Dan
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/CAGiBXrzXUU9tC6MaKz89pugooXq31iDLQP1E3qr7d3s1CVgoxQ%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAGiBXrzXUU9tC6MaKz89pugooXq31iDLQP1E3qr7d3s1CVgoxQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0668c869-535a-4dbc-ba02-e4b1c40f9fab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to