Congratulations! You have succeeded in your efforts. It would be nice to post sample of tif and commandline used as well as output text - for benefit of users of tesseract-ocr. Cheers, -sriranga(79yrs)
On Tue, Feb 14, 2012 at 5:55 PM, John Williams <[email protected]>wrote: > You're right... I've been testing out the psm flag in various situations > this whole time, but last night when I was trying out all of your > suggestions, it slipped my mind. The best solution I've found is to segment > the columns into "rows" of 1 or 2 digits each and use the "-psm 7" switch. > So far, it reads everything perfectly. > > On a semi-related note, I'm really impressed with Tesseract. In my > preliminary OCR research I read many posts saying that Tesseract's > recognition was fairly poor and that a different/commercial OCR package > should be used. I think these people didn't know about or hadn't use the > training feature of Tesseract, because it's working wonderfully for me, > which is great considering I had almost no expectations coming in :) > > Thanks a lot to everyone for the help and to the developers who work on > this tool. > > On Tue, Feb 14, 2012 at 1:33 AM, Dmitri Silaev <[email protected]>wrote: > >> Did you try the "psm" switch (look for it in the forum)? Your own >> segmentation? Both combined? >> >> Warm regards, >> Dmitri Silaev >> www.CustomOCR.com >> >> >> >> On Tue, Feb 14, 2012 at 1:55 AM, John Williams <[email protected]> >> wrote: >> > If I duplicate the column 9 times, so that there's ten columns with the >> same >> > numbers, it reads it correctly. Running these results through the >> training >> > tools didn't help it recognize the original image, though. Running >> tesseract >> > on images with a single digit yielded nothing as well. >> > >> > In my program, do I have to programatically duplicate my column of >> numbers >> > several times and then figure out what the result was supposed to be... >> or >> > can I train tesseract to recognize a single column? I suppose >> duplicating it >> > will work, but it seems like a bad hack. >> > >> > On Mon, Feb 13, 2012 at 10:42 AM, Chris <[email protected]> wrote: >> >> >> >> I'd try segmenting the numbers out yourself and feeding them into >> >> tesseract as individual characters. Might work better than feeding it >> >> the whole image. >> >> >> >> Make sure you put some padding around each character. >> >> >> >> On Feb 13, 1:56 am, JD <[email protected]> wrote: >> >> > I'm using v 3.01 on Windows 7 to perform OCR on another program. I >> >> > don't have access to the fonts the program is using, so I trained >> >> > tesseract using some screenshots, and so far the text recognition is >> >> > far better than I expected. However, when I try to process a >> >> > screenshot that contains only a few numbers, it doesn't match >> anything >> >> > at all. If was matching garbage, or the wrong numbers, then I'd just >> >> > keep working on improving the training... but it doesn't find >> >> > anything. Does anyone have a suggestion about what I should try? >> >> > >> >> > It doesn't look like I can attach a screenshot, but the numbers are >> in >> >> > a column... something like this: >> >> > >> >> > 10 >> >> > 13 >> >> > 14 >> >> > 15 >> >> > 17 >> >> > >> >> > I pre-process the screenshots so the text is black on white. I also >> >> > zoom in on the images, so they're slightly blurred (only very >> >> > slightly)... but the text recognition is near perfect, so I don't >> >> > think that's an issue. Plus, it seems like it should find SOMETHING. >> >> >> >> -- >> >> You received this message because you are subscribed to the Google >> >> Groups "tesseract-ocr" group. >> >> To post to this group, send email to [email protected] >> >> To unsubscribe from this group, send email to >> >> [email protected] >> >> For more options, visit this group at >> >> http://groups.google.com/group/tesseract-ocr?hl=en >> > >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups "tesseract-ocr" group. >> > To post to this group, send email to [email protected] >> > To unsubscribe from this group, send email to >> > [email protected] >> > For more options, visit this group at >> > http://groups.google.com/group/tesseract-ocr?hl=en >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

