it would be better to furnish version of tesseract-ocr used and also type
of OS used by you.
also upload your used image file for recreating your problem.


On Thu, Mar 15, 2012 at 1:04 PM, Chirag <[email protected]> wrote:

> Thanks Sriranga for the response.
>
> I was able to perform automatic orientation detection along with page
> segmentation but I had to supply the script information in the argument for
> Non English scripts.
>
> I still could not perform automatic script detection.
>
> Regards
> Chirag
>
>
> 2012/3/14 Sriranga(78yrsold) <[email protected]>
>
>> I tested using my own lang.tif as follows:
>>  1) using with -l option -psm 3 ->pl see attached testtif-osd.txt.
>> (non-english)
>>   2)using without -l option -psm3 ->pl see attached 2testtif-osd.txt. (in
>> English)
>> In both cases there are *no empty* output but in different lang
>>
>> Extract of cmd reproduced below, if used -psm 0
>> M:\>tesseract.exe test.tif 2testtif-osd    -psm 0
>> Tesseract Open Source OCR Engine v3.02 with Leptonica
>> Error during processing.
>>
>> M:\>tesseract.exe test.tif 2testtif-osd -l k27   -psm 0
>> Tesseract Open Source OCR Engine v3.02 with Leptonica
>> Error during processing.
>>
>>
>>
>> On Wed, Mar 14, 2012 at 4:47 PM, Chirag <[email protected]> wrote:
>>
>>> With -psm 3, I got non-empty files (test_osd.txt)  which were empty with
>>> -psm 0. This is true for both with/without -l options.
>>>
>>> However, the results of detectOS is same for both -psm [0/3] option for
>>> any of with/without -l options.
>>>
>>> Please note that I have modified the code slightly to call detectOS
>>> separately, which has been doing a good job for orientation detection given
>>> script.  I am struggling to detect the script of the input document.
>>>
>>> Regards,
>>> Chirag
>>>
>>>
>>> On Wed, Mar 14, 2012 at 4:05 PM, Sriranga(78yrsold) <
>>> [email protected]> wrote:
>>>
>>>> one more important - please test again as follows:
>>>> 1st test:tesseract.exe  japanese_doc.tif  test_osd -l jpn -psm 3
>>>> 2nd test:tesseract.exe  japanese_doc.tif  test_osd         -psm 3
>>>> Please check the output text files "test_osd"  - you will find
>>>> difference in script between two.
>>>>
>>>> On Wed, Mar 14, 2012 at 3:51 PM, Sriranga(78yrsold) <
>>>> [email protected]> wrote:
>>>>
>>>>>  I noticed "-l lang" before "-psm 0" is missing in your commandline.
>>>>> In the absence of "-l lang" tesseract will always  assume as "-l eng".
>>>>>
>>>>>
>>>>> extract of help is reproduced below:
>>>>>
>>>>> M:\>tesseract.exe -h
>>>>> *Usage:tesseract.exe imagename outputbase [-l lang] [-psm
>>>>> pagesegmode] [configfil*
>>>>> e...]
>>>>> pagesegmode values are:
>>>>> 0 = Orientation and script detection (OSD) only.
>>>>> 1 = Automatic page segmentation with OSD.
>>>>> 2 = Automatic page segmentation, but no OSD, or OCR
>>>>> 3 = Fully automatic page segmentation, but no OSD. (Default)
>>>>> 4 = Assume a single column of text of variable sizes.
>>>>> 5 = Assume a single uniform block of vertically aligned text.
>>>>> 6 = Assume a single uniform block of text.
>>>>> 7 = Treat the image as a single text line.
>>>>> 8 = Treat the image as a single word.
>>>>> 9 = Treat the image as a single word in a circle.
>>>>> 10 = Treat the image as a single character.
>>>>> -l lang and/or -psm pagesegmode must occur before anyconfigfile.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 14, 2012 at 3:22 PM, Chirag <[email protected]> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I was able to successfully test orientation detection (after stepping
>>>>>> though the code) for various scripts using following commands:
>>>>>>
>>>>>> English: tesseract.exe  english_doc.tif  test_osd -l eng -psm 0
>>>>>> Japanese: tesseract.exe  japanese_doc.tif  test_osd -l jpn -psm 0
>>>>>> Korean: tesseract.exe  korean_doc.tif  test_osd -l kor -psm 0
>>>>>>
>>>>>> In these cases, the executable search for eng.traineddata,
>>>>>> jpn.traineddata and kor.traineddata respectively along with 
>>>>>> osd.traineddata.
>>>>>>
>>>>>> The performance is really good.
>>>>>>
>>>>>>
>>>>>> However, it seems like Tesseract is detecting orientation given
>>>>>> script.
>>>>>>
>>>>>>
>>>>>> If I run the executable as following:
>>>>>>
>>>>>> Japanese: tesseract.exe  japanese_doc.tif  test_osd  -psm 0
>>>>>> Korean: tesseract.exe  korean_doc.tif  test_osd  -psm 0
>>>>>>
>>>>>> The results are not good. It seems like script detection is not
>>>>>> robust.
>>>>>>
>>>>>> Am I missing some step? Kindly clarify.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Chirag
>>>>>>
>>>>>>
>>>>>> On Sat, Mar 3, 2012 at 7:12 PM, koray <[email protected]>wrote:
>>>>>>
>>>>>>>  OSD returns emty text when I tried. Can anyone please clarify if
>>>>>>> this is a bug or I m doing things wrong?
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To post to this group, send email to [email protected]
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> [email protected]
>>>>>>> For more options, visit this group at
>>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>>>>
>>>>>>
>>>>>>  --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To post to this group, send email to [email protected]
>>>>>> To unsubscribe from this group, send email to
>>>>>> [email protected]
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>>>
>>>>>
>>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to [email protected]
>>>> To unsubscribe from this group, send email to
>>>> [email protected]
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to