Re: unicharset script and metrics questions

Nick White Fri, 08 Jun 2012 04:19:12 -0700

Hi Steve,

I'll cut up your email to reply to bits.

> Zdeno, does your note on unicharset_extractor mean that the currently 
> codeline doesn't work properly?
> You mentioned a script to correct the information, is there any place that 
> documents how I can fix the file so that it works properly?
> 
> Nick, have you been able to train either 3.01 or 3.02/current codeline to 
> recognize a new language properly?

Yes, the training I'm doing is with the 3.02 trunk code, and is
working very well now. As Zdenko says, we're just keen to make it as
good as possible, hence looking into unicharset oddities. My
training is already above my expectations. So don't be put off!

On Thu, Jun 07, 2012 at 01:02:59PM -0700, steve8918 wrote:
> Thanks Zdeno and Nick.  Yes, I'm using the latest code of tesseract 
> (revision 729) because the 3.01 version doesn't appear to work well for me, 
> I'm getting "Couldn't find matching blob" for only one of my characters for 
> some reason.

I get that for various of my training images, for no obvious reason.
It doesn't seem to have a major impact on the training for me
though, so I wouldn't worry too much about it.

> After following your instructions, I was able to get 
> everything working without crashing or errors.  However, the training 
> didn't seem to work, because it's not recognizing anything properly.

That is suprising. Can you give more information about what
(mis)recognition is happening? 

Nick

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: unicharset script and metrics questions

Reply via email to