Patrick,

Here you confuse the "DetectOS" function with the processing pipeline
invoked via command-line. The truth is that "DetectOS" *is not* (!)
called when the OSD is requested from the command line, it's only an
API wrapper having its own logic. Command-line OSD logic is somewhat
different from DetectOS's, hence the discrepancies for seemingly equal
conditions.

Warm regards,
Dmitri Silaev
www.CustomOCR.com





On Thu, Jun 23, 2011 at 3:47 AM, patrickq <[email protected]> wrote:
> Hummm:
> 1. It would make little sense for Tesseract to get it wrong because of
> so little text oriented wrongly, while all the rest of the text points
> in another direction (although Tess certainly does stupid things
> sometimes)
> 2. At least within ScanBizCards (running Tess 3.01), DetectOS DOES
> work properly, test for yourself on Android or iPhone
>
> On Jun 23, 12:35 am, Dmitri Silaev <[email protected]> wrote:
>> This is an interesting case. If you take a closer look to the image
>> you've shown us, you can notice a small text line at the top of the
>> fax page - a fax header line - which is upright in contrast to other
>> text in this document. This very text line fools Tesseract's
>> orientation detection algo. If you crop the image to exclude this
>> line, everything goes alright.
>>
>> I used the following command line:
>> tesseract test_osd_cr.tif test_osd -psm 1
>>
>> "-psm 1" stands for "Use automatic page segmentation with orientation
>> and script detection. (OSD)"
>>
>> I used a copy of "eng.traineddata" as "osd.traineddata"
>>
>> HTH
>>
>> Warm regards,
>> Dmitri Silaevwww.CustomOCR.com
>>
>> On Wed, Jun 22, 2011 at 9:05 AM, ogorman <[email protected]> wrote:
>> > On Jun 22, 6:48 am, patrickq <[email protected]> wrote:
>> >> I tested it via ScanBizCards and Indeed OSD has no issues whatsover
>> >> getting it right - there is 10 times the amount of text it needs and
>> >> the image is very sharp, it's guaranteed to get it right. I am not
>> >> familiar with the command-line tools however so I can't help, I'll
>> >> just say that it should be very easy to write your own little utility
>> >> making a call to DetectOS.
>>
>> >> Another easy solution: why don't you run Tesseract twice, first on the
>> >> original image then on the image rotated 180 degree? I assume you only
>> >> need these two possibilities because it's a FAX hence page size is
>> >> taller than it is wide. Then pick the one that yields the most
>> >> sensible text and the least gibberish characters.
>>
>> > That is my current method.  It just has produced some edge cases where
>> > there isnt text like a graph per say and either side produces same
>> > amount of false positive noise.  In those cases I just keep it the
>> > same way it came in.  But was hoping for a more efficient method.  I
>> > am glad the software works though  I guess i might need to invest time
>> > in building a tool to detect orientation using tesseract.
>>
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "tesseract-ocr" group.
>> > To post to this group, send email to [email protected]
>> > To unsubscribe from this group, send email to
>> > [email protected]
>> > For more options, visit this group at
>> >http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to