Congratulations! You have succeeded in your efforts. It would be nice to
post sample of tif and commandline used as well as output text -  for
benefit of users of tesseract-ocr.
Cheers,
-sriranga(79yrs)

On Tue, Feb 14, 2012 at 5:55 PM, John Williams <[email protected]>wrote:

> You're right... I've been testing out the psm flag in various situations
> this whole time, but last night when I was trying out all of your
> suggestions, it slipped my mind. The best solution I've found is to segment
> the columns into "rows" of 1 or 2 digits each and use the "-psm 7" switch.
> So far, it reads everything perfectly.
>
> On a semi-related note, I'm really impressed with Tesseract. In my
> preliminary OCR research I read many posts saying that Tesseract's
> recognition was fairly poor and that a different/commercial OCR package
> should be used. I think these people didn't know about or hadn't use the
> training feature of Tesseract, because it's working wonderfully for me,
> which is great considering I had almost no expectations coming in :)
>
> Thanks a lot to everyone for the help and to the developers who work on
> this tool.
>
> On Tue, Feb 14, 2012 at 1:33 AM, Dmitri Silaev <[email protected]>wrote:
>
>> Did you try the "psm" switch (look for it in the forum)? Your own
>> segmentation? Both combined?
>>
>> Warm regards,
>> Dmitri Silaev
>> www.CustomOCR.com
>>
>>
>>
>> On Tue, Feb 14, 2012 at 1:55 AM, John Williams <[email protected]>
>> wrote:
>> > If I duplicate the column 9 times, so that there's ten columns with the
>> same
>> > numbers, it reads it correctly. Running these results through the
>> training
>> > tools didn't help it recognize the original image, though. Running
>> tesseract
>> > on images with a single digit yielded nothing as well.
>> >
>> > In my program, do I have to programatically duplicate my column of
>> numbers
>> > several times and then figure out what the result was supposed to be...
>> or
>> > can I train tesseract to recognize a single column? I suppose
>> duplicating it
>> > will work, but it seems like a bad hack.
>> >
>> > On Mon, Feb 13, 2012 at 10:42 AM, Chris <[email protected]> wrote:
>> >>
>> >> I'd try segmenting the numbers out yourself and feeding them into
>> >> tesseract as individual characters. Might work better than feeding it
>> >> the whole image.
>> >>
>> >> Make sure you put some padding around each character.
>> >>
>> >> On Feb 13, 1:56 am, JD <[email protected]> wrote:
>> >> > I'm using v 3.01 on Windows 7 to perform OCR on another program. I
>> >> > don't have access to the fonts the program is using, so I trained
>> >> > tesseract using some screenshots, and so far the text recognition is
>> >> > far better than I expected. However, when I try to process a
>> >> > screenshot that contains only a few numbers, it doesn't match
>> anything
>> >> > at all. If was matching garbage, or the wrong numbers, then I'd just
>> >> > keep working on improving the training... but it doesn't find
>> >> > anything. Does anyone have a suggestion about what I should try?
>> >> >
>> >> > It doesn't look like I can attach a screenshot, but the numbers are
>> in
>> >> > a column... something like this:
>> >> >
>> >> > 10
>> >> > 13
>> >> > 14
>> >> > 15
>> >> > 17
>> >> >
>> >> > I pre-process the screenshots so the text is black on white. I also
>> >> > zoom in on the images, so they're slightly blurred (only very
>> >> > slightly)... but the text recognition is near perfect, so I don't
>> >> > think that's an issue. Plus, it seems like it should find SOMETHING.
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> >> Groups "tesseract-ocr" group.
>> >> To post to this group, send email to [email protected]
>> >> To unsubscribe from this group, send email to
>> >> [email protected]
>> >> For more options, visit this group at
>> >> http://groups.google.com/group/tesseract-ocr?hl=en
>> >
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "tesseract-ocr" group.
>> > To post to this group, send email to [email protected]
>> > To unsubscribe from this group, send email to
>> > [email protected]
>> > For more options, visit this group at
>> > http://groups.google.com/group/tesseract-ocr?hl=en
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to