Hi Doug, sorry for the delay in my reply.

On Fri, Jan 10, 2014 at 05:40:57AM -0800, Doug . wrote:
> I am grabbing English text from a program (approx 8 pt bitmap font), writing 
> it
> to an image file, and passing the image file through tesseract. Following the
> instructions I have scaled the image up. That gets me from about 5% accuracy
> (original size) to about 10-12% accuracy (scaled up). The characters are 
> clear,
> distinct, and exactly the same every single time. I had expected tesseract
> would do a decent job of it, but this has not proven to be the case.

That is suprising, do you mind sending along a sample image?

> Satisfied this was worth the effort I finished editing the .box file and
> proceeded to work through the steps again. I'll be damned if I can now get
> through the steps successfully. So I am stuck with a completed .tif/.box
> combination I spent a *lot* of time on that is doing nothing for me since I
> can't get tesseract to train on the blasted thing. Frustrating indeed.

Sorry it's been so difficult for you. There are quite a few steps,
and error messages are often unhelpful, but it shouldn't be as hard
as you're finding it. If it helps you can see the script I use to
combine all my training materials together:

https://gitorious.org/ancient-greek-training-for-tesseract/tesstrainingtools/source/051333694288d7b6ae4e0b2d0cee23727e291ad0:combinetraining-v3.sh

But what would be really helpful for us would be if you could post
where you get stuck, and why (what error message appears, or what
doesn't get created that you're expecting).

> Side note: Contrary to the instructions the "Run Tesseract for Training"
> section, unicharset_extractor would crash and burn every single time for me
> unless I edited the file and replaced UnknownFont with the name of the font.

That's odd, I've certainly never seen that. Are you using Tesseract
3.02? Can you send the exact error message / crash output, please?

I'm glad you're finally having some success, and sorry it's been
more arduous than it should be. Please do stick around and help me
understand what has been most needlessly difficult so I can improve
the documentation and tools.

Nick

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to