it would be better to furnish version of tesseract-ocr used and also type of OS used by you. also upload your used image file for recreating your problem.
On Thu, Mar 15, 2012 at 1:04 PM, Chirag <[email protected]> wrote: > Thanks Sriranga for the response. > > I was able to perform automatic orientation detection along with page > segmentation but I had to supply the script information in the argument for > Non English scripts. > > I still could not perform automatic script detection. > > Regards > Chirag > > > 2012/3/14 Sriranga(78yrsold) <[email protected]> > >> I tested using my own lang.tif as follows: >> 1) using with -l option -psm 3 ->pl see attached testtif-osd.txt. >> (non-english) >> 2)using without -l option -psm3 ->pl see attached 2testtif-osd.txt. (in >> English) >> In both cases there are *no empty* output but in different lang >> >> Extract of cmd reproduced below, if used -psm 0 >> M:\>tesseract.exe test.tif 2testtif-osd -psm 0 >> Tesseract Open Source OCR Engine v3.02 with Leptonica >> Error during processing. >> >> M:\>tesseract.exe test.tif 2testtif-osd -l k27 -psm 0 >> Tesseract Open Source OCR Engine v3.02 with Leptonica >> Error during processing. >> >> >> >> On Wed, Mar 14, 2012 at 4:47 PM, Chirag <[email protected]> wrote: >> >>> With -psm 3, I got non-empty files (test_osd.txt) which were empty with >>> -psm 0. This is true for both with/without -l options. >>> >>> However, the results of detectOS is same for both -psm [0/3] option for >>> any of with/without -l options. >>> >>> Please note that I have modified the code slightly to call detectOS >>> separately, which has been doing a good job for orientation detection given >>> script. I am struggling to detect the script of the input document. >>> >>> Regards, >>> Chirag >>> >>> >>> On Wed, Mar 14, 2012 at 4:05 PM, Sriranga(78yrsold) < >>> [email protected]> wrote: >>> >>>> one more important - please test again as follows: >>>> 1st test:tesseract.exe japanese_doc.tif test_osd -l jpn -psm 3 >>>> 2nd test:tesseract.exe japanese_doc.tif test_osd -psm 3 >>>> Please check the output text files "test_osd" - you will find >>>> difference in script between two. >>>> >>>> On Wed, Mar 14, 2012 at 3:51 PM, Sriranga(78yrsold) < >>>> [email protected]> wrote: >>>> >>>>> I noticed "-l lang" before "-psm 0" is missing in your commandline. >>>>> In the absence of "-l lang" tesseract will always assume as "-l eng". >>>>> >>>>> >>>>> extract of help is reproduced below: >>>>> >>>>> M:\>tesseract.exe -h >>>>> *Usage:tesseract.exe imagename outputbase [-l lang] [-psm >>>>> pagesegmode] [configfil* >>>>> e...] >>>>> pagesegmode values are: >>>>> 0 = Orientation and script detection (OSD) only. >>>>> 1 = Automatic page segmentation with OSD. >>>>> 2 = Automatic page segmentation, but no OSD, or OCR >>>>> 3 = Fully automatic page segmentation, but no OSD. (Default) >>>>> 4 = Assume a single column of text of variable sizes. >>>>> 5 = Assume a single uniform block of vertically aligned text. >>>>> 6 = Assume a single uniform block of text. >>>>> 7 = Treat the image as a single text line. >>>>> 8 = Treat the image as a single word. >>>>> 9 = Treat the image as a single word in a circle. >>>>> 10 = Treat the image as a single character. >>>>> -l lang and/or -psm pagesegmode must occur before anyconfigfile. >>>>> >>>>> >>>>> >>>>> On Wed, Mar 14, 2012 at 3:22 PM, Chirag <[email protected]> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I was able to successfully test orientation detection (after stepping >>>>>> though the code) for various scripts using following commands: >>>>>> >>>>>> English: tesseract.exe english_doc.tif test_osd -l eng -psm 0 >>>>>> Japanese: tesseract.exe japanese_doc.tif test_osd -l jpn -psm 0 >>>>>> Korean: tesseract.exe korean_doc.tif test_osd -l kor -psm 0 >>>>>> >>>>>> In these cases, the executable search for eng.traineddata, >>>>>> jpn.traineddata and kor.traineddata respectively along with >>>>>> osd.traineddata. >>>>>> >>>>>> The performance is really good. >>>>>> >>>>>> >>>>>> However, it seems like Tesseract is detecting orientation given >>>>>> script. >>>>>> >>>>>> >>>>>> If I run the executable as following: >>>>>> >>>>>> Japanese: tesseract.exe japanese_doc.tif test_osd -psm 0 >>>>>> Korean: tesseract.exe korean_doc.tif test_osd -psm 0 >>>>>> >>>>>> The results are not good. It seems like script detection is not >>>>>> robust. >>>>>> >>>>>> Am I missing some step? Kindly clarify. >>>>>> >>>>>> >>>>>> Regards, >>>>>> Chirag >>>>>> >>>>>> >>>>>> On Sat, Mar 3, 2012 at 7:12 PM, koray <[email protected]>wrote: >>>>>> >>>>>>> OSD returns emty text when I tried. Can anyone please clarify if >>>>>>> this is a bug or I m doing things wrong? >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To post to this group, send email to [email protected] >>>>>>> To unsubscribe from this group, send email to >>>>>>> [email protected] >>>>>>> For more options, visit this group at >>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To post to this group, send email to [email protected] >>>>>> To unsubscribe from this group, send email to >>>>>> [email protected] >>>>>> For more options, visit this group at >>>>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>>>> >>>>> >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To post to this group, send email to [email protected] >>>> To unsubscribe from this group, send email to >>>> [email protected] >>>> For more options, visit this group at >>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

