Patrick, Here you confuse the "DetectOS" function with the processing pipeline invoked via command-line. The truth is that "DetectOS" *is not* (!) called when the OSD is requested from the command line, it's only an API wrapper having its own logic. Command-line OSD logic is somewhat different from DetectOS's, hence the discrepancies for seemingly equal conditions.
Warm regards, Dmitri Silaev www.CustomOCR.com On Thu, Jun 23, 2011 at 3:47 AM, patrickq <[email protected]> wrote: > Hummm: > 1. It would make little sense for Tesseract to get it wrong because of > so little text oriented wrongly, while all the rest of the text points > in another direction (although Tess certainly does stupid things > sometimes) > 2. At least within ScanBizCards (running Tess 3.01), DetectOS DOES > work properly, test for yourself on Android or iPhone > > On Jun 23, 12:35 am, Dmitri Silaev <[email protected]> wrote: >> This is an interesting case. If you take a closer look to the image >> you've shown us, you can notice a small text line at the top of the >> fax page - a fax header line - which is upright in contrast to other >> text in this document. This very text line fools Tesseract's >> orientation detection algo. If you crop the image to exclude this >> line, everything goes alright. >> >> I used the following command line: >> tesseract test_osd_cr.tif test_osd -psm 1 >> >> "-psm 1" stands for "Use automatic page segmentation with orientation >> and script detection. (OSD)" >> >> I used a copy of "eng.traineddata" as "osd.traineddata" >> >> HTH >> >> Warm regards, >> Dmitri Silaevwww.CustomOCR.com >> >> On Wed, Jun 22, 2011 at 9:05 AM, ogorman <[email protected]> wrote: >> > On Jun 22, 6:48 am, patrickq <[email protected]> wrote: >> >> I tested it via ScanBizCards and Indeed OSD has no issues whatsover >> >> getting it right - there is 10 times the amount of text it needs and >> >> the image is very sharp, it's guaranteed to get it right. I am not >> >> familiar with the command-line tools however so I can't help, I'll >> >> just say that it should be very easy to write your own little utility >> >> making a call to DetectOS. >> >> >> Another easy solution: why don't you run Tesseract twice, first on the >> >> original image then on the image rotated 180 degree? I assume you only >> >> need these two possibilities because it's a FAX hence page size is >> >> taller than it is wide. Then pick the one that yields the most >> >> sensible text and the least gibberish characters. >> >> > That is my current method. It just has produced some edge cases where >> > there isnt text like a graph per say and either side produces same >> > amount of false positive noise. In those cases I just keep it the >> > same way it came in. But was hoping for a more efficient method. I >> > am glad the software works though I guess i might need to invest time >> > in building a tool to detect orientation using tesseract. >> >> > -- >> > You received this message because you are subscribed to the Google >> > Groups "tesseract-ocr" group. >> > To post to this group, send email to [email protected] >> > To unsubscribe from this group, send email to >> > [email protected] >> > For more options, visit this group at >> >http://groups.google.com/group/tesseract-ocr?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

