With -psm 3, I got non-empty files (test_osd.txt)  which were empty with
-psm 0. This is true for both with/without -l options.

However, the results of detectOS is same for both -psm [0/3] option for any
of with/without -l options.

Please note that I have modified the code slightly to call detectOS
separately, which has been doing a good job for orientation detection given
script.  I am struggling to detect the script of the input document.

Regards,
Chirag


On Wed, Mar 14, 2012 at 4:05 PM, Sriranga(78yrsold) <[email protected]
> wrote:

> one more important - please test again as follows:
> 1st test:tesseract.exe  japanese_doc.tif  test_osd -l jpn -psm 3
> 2nd test:tesseract.exe  japanese_doc.tif  test_osd         -psm 3
> Please check the output text files "test_osd"  - you will find difference
> in script between two.
>
> On Wed, Mar 14, 2012 at 3:51 PM, Sriranga(78yrsold) <
> [email protected]> wrote:
>
>>  I noticed "-l lang" before "-psm 0" is missing in your commandline. In
>> the absence of "-l lang" tesseract will always  assume as "-l eng".
>>
>> extract of help is reproduced below:
>>
>> M:\>tesseract.exe -h
>> *Usage:tesseract.exe imagename outputbase [-l lang] [-psm pagesegmode]
>> [configfil*
>> e...]
>> pagesegmode values are:
>> 0 = Orientation and script detection (OSD) only.
>> 1 = Automatic page segmentation with OSD.
>> 2 = Automatic page segmentation, but no OSD, or OCR
>> 3 = Fully automatic page segmentation, but no OSD. (Default)
>> 4 = Assume a single column of text of variable sizes.
>> 5 = Assume a single uniform block of vertically aligned text.
>> 6 = Assume a single uniform block of text.
>> 7 = Treat the image as a single text line.
>> 8 = Treat the image as a single word.
>> 9 = Treat the image as a single word in a circle.
>> 10 = Treat the image as a single character.
>> -l lang and/or -psm pagesegmode must occur before anyconfigfile.
>>
>>
>>
>> On Wed, Mar 14, 2012 at 3:22 PM, Chirag <[email protected]> wrote:
>>
>>> Hi all,
>>>
>>> I was able to successfully test orientation detection (after stepping
>>> though the code) for various scripts using following commands:
>>>
>>> English: tesseract.exe  english_doc.tif  test_osd -l eng -psm 0
>>> Japanese: tesseract.exe  japanese_doc.tif  test_osd -l jpn -psm 0
>>> Korean: tesseract.exe  korean_doc.tif  test_osd -l kor -psm 0
>>>
>>> In these cases, the executable search for eng.traineddata,
>>> jpn.traineddata and kor.traineddata respectively along with osd.traineddata.
>>>
>>> The performance is really good.
>>>
>>>
>>> However, it seems like Tesseract is detecting orientation given script.
>>>
>>>
>>> If I run the executable as following:
>>>
>>> Japanese: tesseract.exe  japanese_doc.tif  test_osd  -psm 0
>>> Korean: tesseract.exe  korean_doc.tif  test_osd  -psm 0
>>>
>>> The results are not good. It seems like script detection is not robust.
>>>
>>> Am I missing some step? Kindly clarify.
>>>
>>>
>>> Regards,
>>> Chirag
>>>
>>>
>>> On Sat, Mar 3, 2012 at 7:12 PM, koray <[email protected]> wrote:
>>>
>>>>  OSD returns emty text when I tried. Can anyone please clarify if
>>>> this is a bug or I m doing things wrong?
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to [email protected]
>>>> To unsubscribe from this group, send email to
>>>> [email protected]
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>
>>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to