Hummm: 1. It would make little sense for Tesseract to get it wrong because of so little text oriented wrongly, while all the rest of the text points in another direction (although Tess certainly does stupid things sometimes) 2. At least within ScanBizCards (running Tess 3.01), DetectOS DOES work properly, test for yourself on Android or iPhone
On Jun 23, 12:35 am, Dmitri Silaev <[email protected]> wrote: > This is an interesting case. If you take a closer look to the image > you've shown us, you can notice a small text line at the top of the > fax page - a fax header line - which is upright in contrast to other > text in this document. This very text line fools Tesseract's > orientation detection algo. If you crop the image to exclude this > line, everything goes alright. > > I used the following command line: > tesseract test_osd_cr.tif test_osd -psm 1 > > "-psm 1" stands for "Use automatic page segmentation with orientation > and script detection. (OSD)" > > I used a copy of "eng.traineddata" as "osd.traineddata" > > HTH > > Warm regards, > Dmitri Silaevwww.CustomOCR.com > > On Wed, Jun 22, 2011 at 9:05 AM, ogorman <[email protected]> wrote: > > On Jun 22, 6:48 am, patrickq <[email protected]> wrote: > >> I tested it via ScanBizCards and Indeed OSD has no issues whatsover > >> getting it right - there is 10 times the amount of text it needs and > >> the image is very sharp, it's guaranteed to get it right. I am not > >> familiar with the command-line tools however so I can't help, I'll > >> just say that it should be very easy to write your own little utility > >> making a call to DetectOS. > > >> Another easy solution: why don't you run Tesseract twice, first on the > >> original image then on the image rotated 180 degree? I assume you only > >> need these two possibilities because it's a FAX hence page size is > >> taller than it is wide. Then pick the one that yields the most > >> sensible text and the least gibberish characters. > > > That is my current method. It just has produced some edge cases where > > there isnt text like a graph per say and either side produces same > > amount of false positive noise. In those cases I just keep it the > > same way it came in. But was hoping for a more efficient method. I > > am glad the software works though I guess i might need to invest time > > in building a tool to detect orientation using tesseract. > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

