I'm not specifying psm explicitly, so it must be 3 = Fully automatic page segmentation, but no OSD. (Default)
On Tuesday, December 30, 2014 11:10:05 PM UTC-5, shree wrote: > > what page segmentation mode are you using? > > https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Wed, Dec 31, 2014 at 6:18 AM, Dan Vanderkam <[email protected] > <javascript:>> wrote: > >> More context here >> <http://stackoverflow.com/questions/27592430/how-can-i-tell-tesseract-that-my-font-has-a-particular-size>. >> >> I'm trying to get Tesseract to split some of its detected boxes in half or >> thirds. >> >> My approach has been to draw white vertical lines through the joined >> letters, so from before: >> >> >> to after: >> >> >> (http://i.imgur.com/TPcCsi0.png) >> If you can't see the lines, here they are in red: >> >> (http://i.imgur.com/MjSa0FS.png) >> >> I would have expected that drawing the white lines would split these >> boxes apart. It does that, but it also has a side effect: it joins the "9" >> on the first line with the "s" below it on the next line: >> >> even if I draw a white line below the "9" and the "0", this still >> happens. As you might expect, these tall letters wreak havoc on the >> resulting OCR'd text. >> >> I'm baffled why this is happening. Based on this SO answer >> <http://stackoverflow.com/a/27605797/388951>, my understanding was that >> Tesseract looked at connected components to find boxes, so I would have >> expected the white lines to force apart two components. >> >> Is it possible to give Tesseract an explicit list of boxes? If not, is >> there a more effective way to force apart two letters than what I'm doing? >> >> Thanks! >> - Dan >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAGiBXrzXUU9tC6MaKz89pugooXq31iDLQP1E3qr7d3s1CVgoxQ%40mail.gmail.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/CAGiBXrzXUU9tC6MaKz89pugooXq31iDLQP1E3qr7d3s1CVgoxQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0668c869-535a-4dbc-ba02-e4b1c40f9fab%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

