psm = pagesegmode: 0 = Orientation and script detection (OSD) only. 1 = Automatic page segmentation with OSD. 2 = Automatic page segmentation, but no OSD, or OCR 3 = Fully automatic page segmentation, but no OSD. (Default) 4 = Assume a single column of text of variable sizes. 5 = Assume a single uniform block of vertically aligned text. 6 = Assume a single uniform block of text. 7 = Treat the image as a single text line. 8 = Treat the image as a single word. 9 = Treat the image as a single word in a circle. 10 = Treat the image as a single character.
it is implemented in tesseract 3.01 version (will be release "soon")... Zdenko On Fri, Aug 19, 2011 at 2:45 PM, Andriy Malovanyy <[email protected]>wrote: > Thanks guys for the help! Really appreciate it! > > To sriranga: > I tried changing dpi (check the previous post). It doesnt work. > > To Dmitri: > I tried manually removing greyness (check the previous post). It > doesnt work. I think the major issue is the language file. The > charecters are probably too bold. > > To Andres: > Which file did you try to recognize?? The last one, manually edited? > Can you try recognize the other ones as well?? Jpg files work good > with Tesseract 3.0, I have some of the files created with Photoshop > been recognized. Btw, can you try to recognize one of them (check > first post). Do you also get one full stop instead of two?? > > To Zdenko: > What does -psm option? I tried to google it and could not find an > answer. When I try to run "tesseract.exe webcam4-2.jpg text -psm 7" I > get: > read_variables_files: Can't open psm > read_variables_files: Can't open 7 > > I googled that error as well and also got not much help. > Can you try also processing other unedited images which I attached to > the first post?? > > > On 19 Aug, 07:20, zdenko podobny <[email protected]> wrote: > > I try it with 3.01 version and: > > tesseract download\webcam4-2.jpg webcam4-2 > > produced empty page. > > > > BUT: > > tesseract download\webcam4-2.jpg webcam4-2 -psm 7 > > produce correct result... > > > > Zdenko > > > > > > > > > > > > > > > > On Fri, Aug 19, 2011 at 6:54 AM, Andres <[email protected]> wrote: > > > Hi Andriy, > > > > > I'm using Tesseract 2.04 > > > > > I don't remember if it works with jpg files, when I tried it with your > > > image I obtained: > > > > > Tesseract Open Source OCR Engine > > > name_to_image_type:Error:Unrecognized image type:webcam.jpg > > > IMAGE::read_header:Error:Can't read this image type:webcam.jpg > > > tesseract:Error:Read of file failed:webcam.jpg > > > > > So I opened your file with windows paint and saved it as webcam.bmp > > > > > Then I executed: > > > > > tesseract webcam.bmp output -l eng > > > > > and I obtained the file "output.txt" with the correct text. > > > > > Regards, > > > > > Andres > > > > > 2011/8/18 Andriy Malovanyy <[email protected]>: > > > > Forgot to add attachment > > > > > > -- > > > > You received this message because you are subscribed to the Google > > > > Groups "tesseract-ocr" group. > > > > To post to this group, send email to [email protected] > > > > To unsubscribe from this group, send email to > > > > [email protected] > > > > For more options, visit this group at > > > >http://groups.google.com/group/tesseract-ocr?hl=en > > > > > -- > > > You received this message because you are subscribed to the Google > > > Groups "tesseract-ocr" group. > > > To post to this group, send email to [email protected] > > > To unsubscribe from this group, send email to > > > [email protected] > > > For more options, visit this group at > > >http://groups.google.com/group/tesseract-ocr?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

