Re: Newbie: Training tesseract

Nick White Mon, 23 Jul 2012 03:49:30 -0700

Hi,

On Sat, Jul 21, 2012 at 02:32:09AM -0700, Bulke wrote:
> - I managed to unpack eng.traineddata at the end. Problem was that I was in 
> wrong directory when I was executing command.
> 
> - Results are improved after inserting new lines in eng.unicharambigs.


Great!

> Now, how can I define *blank spaces* in eng.unicharambigs file?

I'm pretty sure you can't define spaces in the unicharambigs file,
as Tesseract treats characters and spaces quite differently.

> Since I have a problem with kV, it is recognized as W. But I cannot swap W 
> on each place with kV.
> So, how can I say for example:
> Everywhere where you see W after word Scan ("*Scan W*"), *swap it* with 
> "*Scan 
> kV*".

A rule like this should do it:
1       W       2       k V     0

The zero at the end means it's just a suggestion to Tesseract, so it
won't always make the substitution. In my experience these
'suggestion' rules don't make a lot of difference, but they should
help somewhat.

> And is there a way to insert *TAB* instead blanks space, or a character, or 
> whatever?

Not that I know of. I don't think Tesseract inserts multiple spaces
in cases of large gaps, either, so it may not be useful for
generating a TSV file anyway. Though you could try replacing the
spaces yourself as a test, and see if it comes out as you'd like. 

Nick

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Newbie: Training tesseract

Reply via email to