Hello,Chirag.

I am also trying to find a way to detect the script of the input document. 

Kindly let me know if  you have some progress.

Thanks and Regards, 
Alex



在 2012年3月14日星期三UTC+8下午7时17分29秒,Chirag Jain写道:
>
> With -psm 3, I got non-empty files (test_osd.txt)  which were empty with 
> -psm 0. This is true for both with/without -l options.
>
> However, the results of detectOS is same for both -psm [0/3] option for 
> any of with/without -l options.
>
> Please note that I have modified the code slightly to call detectOS 
> separately, which has been doing a good job for orientation detection given 
> script.  I am struggling to detect the script of the input document. 
>
> Regards,
> Chirag
>
>
> On Wed, Mar 14, 2012 at 4:05 PM, Sriranga(78yrsold) 
> <[email protected]<javascript:>
> > wrote:
>
>> one more important - please test again as follows:
>> 1st test:tesseract.exe  japanese_doc.tif  test_osd -l jpn -psm 3
>> 2nd test:tesseract.exe  japanese_doc.tif  test_osd         -psm 3
>> Please check the output text files "test_osd"  - you will find difference 
>> in script between two. 
>>
>> On Wed, Mar 14, 2012 at 3:51 PM, Sriranga(78yrsold) 
>> <[email protected]<javascript:>
>> > wrote:
>>
>>>  I noticed "-l lang" before "-psm 0" is missing in your commandline. In 
>>> the absence of "-l lang" tesseract will always  assume as "-l eng". 
>>>
>>> extract of help is reproduced below:
>>>
>>> M:\>tesseract.exe -h
>>> *Usage:tesseract.exe imagename outputbase [-l lang] [-psm pagesegmode] 
>>> [configfil*
>>> e...]
>>> pagesegmode values are:
>>> 0 = Orientation and script detection (OSD) only.
>>> 1 = Automatic page segmentation with OSD.
>>> 2 = Automatic page segmentation, but no OSD, or OCR
>>> 3 = Fully automatic page segmentation, but no OSD. (Default)
>>> 4 = Assume a single column of text of variable sizes.
>>> 5 = Assume a single uniform block of vertically aligned text.
>>> 6 = Assume a single uniform block of text.
>>> 7 = Treat the image as a single text line.
>>> 8 = Treat the image as a single word.
>>> 9 = Treat the image as a single word in a circle.
>>> 10 = Treat the image as a single character.
>>> -l lang and/or -psm pagesegmode must occur before anyconfigfile.
>>>
>>>
>>>
>>> On Wed, Mar 14, 2012 at 3:22 PM, Chirag <[email protected] <javascript:>
>>> > wrote:
>>>
>>>> Hi all,
>>>>
>>>> I was able to successfully test orientation detection (after stepping 
>>>> though the code) for various scripts using following commands:
>>>>
>>>> English: tesseract.exe  english_doc.tif  test_osd -l eng -psm 0 
>>>> Japanese: tesseract.exe  japanese_doc.tif  test_osd -l jpn -psm 0
>>>> Korean: tesseract.exe  korean_doc.tif  test_osd -l kor -psm 0
>>>>
>>>> In these cases, the executable search for eng.traineddata, 
>>>> jpn.traineddata and kor.traineddata respectively along with 
>>>> osd.traineddata.
>>>>
>>>> The performance is really good.
>>>>
>>>>
>>>> However, it seems like Tesseract is detecting orientation given script.
>>>>
>>>>
>>>> If I run the executable as following: 
>>>>
>>>> Japanese: tesseract.exe  japanese_doc.tif  test_osd  -psm 0
>>>> Korean: tesseract.exe  korean_doc.tif  test_osd  -psm 0
>>>>
>>>> The results are not good. It seems like script detection is not robust.
>>>>
>>>> Am I missing some step? Kindly clarify. 
>>>>
>>>>
>>>> Regards,
>>>> Chirag
>>>>
>>>>
>>>> On Sat, Mar 3, 2012 at 7:12 PM, koray 
>>>> <[email protected]<javascript:>
>>>> > wrote:
>>>>
>>>>>  OSD returns emty text when I tried. Can anyone please clarify if
>>>>> this is a bug or I m doing things wrong?
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to 
>>>>> [email protected]<javascript:>
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected] <javascript:>
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>>
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to 
>>>> [email protected]<javascript:>
>>>> To unsubscribe from this group, send email to
>>>> [email protected] <javascript:>
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>
>>>
>>>
>>  -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]<javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to