With -psm 3, I got non-empty files (test_osd.txt) which were empty with -psm 0. This is true for both with/without -l options.
However, the results of detectOS is same for both -psm [0/3] option for any of with/without -l options. Please note that I have modified the code slightly to call detectOS separately, which has been doing a good job for orientation detection given script. I am struggling to detect the script of the input document. Regards, Chirag On Wed, Mar 14, 2012 at 4:05 PM, Sriranga(78yrsold) <[email protected] > wrote: > one more important - please test again as follows: > 1st test:tesseract.exe japanese_doc.tif test_osd -l jpn -psm 3 > 2nd test:tesseract.exe japanese_doc.tif test_osd -psm 3 > Please check the output text files "test_osd" - you will find difference > in script between two. > > On Wed, Mar 14, 2012 at 3:51 PM, Sriranga(78yrsold) < > [email protected]> wrote: > >> I noticed "-l lang" before "-psm 0" is missing in your commandline. In >> the absence of "-l lang" tesseract will always assume as "-l eng". >> >> extract of help is reproduced below: >> >> M:\>tesseract.exe -h >> *Usage:tesseract.exe imagename outputbase [-l lang] [-psm pagesegmode] >> [configfil* >> e...] >> pagesegmode values are: >> 0 = Orientation and script detection (OSD) only. >> 1 = Automatic page segmentation with OSD. >> 2 = Automatic page segmentation, but no OSD, or OCR >> 3 = Fully automatic page segmentation, but no OSD. (Default) >> 4 = Assume a single column of text of variable sizes. >> 5 = Assume a single uniform block of vertically aligned text. >> 6 = Assume a single uniform block of text. >> 7 = Treat the image as a single text line. >> 8 = Treat the image as a single word. >> 9 = Treat the image as a single word in a circle. >> 10 = Treat the image as a single character. >> -l lang and/or -psm pagesegmode must occur before anyconfigfile. >> >> >> >> On Wed, Mar 14, 2012 at 3:22 PM, Chirag <[email protected]> wrote: >> >>> Hi all, >>> >>> I was able to successfully test orientation detection (after stepping >>> though the code) for various scripts using following commands: >>> >>> English: tesseract.exe english_doc.tif test_osd -l eng -psm 0 >>> Japanese: tesseract.exe japanese_doc.tif test_osd -l jpn -psm 0 >>> Korean: tesseract.exe korean_doc.tif test_osd -l kor -psm 0 >>> >>> In these cases, the executable search for eng.traineddata, >>> jpn.traineddata and kor.traineddata respectively along with osd.traineddata. >>> >>> The performance is really good. >>> >>> >>> However, it seems like Tesseract is detecting orientation given script. >>> >>> >>> If I run the executable as following: >>> >>> Japanese: tesseract.exe japanese_doc.tif test_osd -psm 0 >>> Korean: tesseract.exe korean_doc.tif test_osd -psm 0 >>> >>> The results are not good. It seems like script detection is not robust. >>> >>> Am I missing some step? Kindly clarify. >>> >>> >>> Regards, >>> Chirag >>> >>> >>> On Sat, Mar 3, 2012 at 7:12 PM, koray <[email protected]> wrote: >>> >>>> OSD returns emty text when I tried. Can anyone please clarify if >>>> this is a bug or I m doing things wrong? >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To post to this group, send email to [email protected] >>>> To unsubscribe from this group, send email to >>>> [email protected] >>>> For more options, visit this group at >>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >> >> > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

