Hi,
On Sat, Jul 21, 2012 at 02:32:09AM -0700, Bulke wrote:
> - I managed to unpack eng.traineddata at the end. Problem was that I was in
> wrong directory when I was executing command.
>
> - Results are improved after inserting new lines in eng.unicharambigs.
Great!
> Now, how can I define *blank spaces* in eng.unicharambigs file?
I'm pretty sure you can't define spaces in the unicharambigs file,
as Tesseract treats characters and spaces quite differently.
> Since I have a problem with kV, it is recognized as W. But I cannot swap W
> on each place with kV.
> So, how can I say for example:
> Everywhere where you see W after word Scan ("*Scan W*"), *swap it* with
> "*Scan
> kV*".
A rule like this should do it:
1 W 2 k V 0
The zero at the end means it's just a suggestion to Tesseract, so it
won't always make the substitution. In my experience these
'suggestion' rules don't make a lot of difference, but they should
help somewhat.
> And is there a way to insert *TAB* instead blanks space, or a character, or
> whatever?
Not that I know of. I don't think Tesseract inserts multiple spaces
in cases of large gaps, either, so it may not be useful for
generating a TSV file anyway. Though you could try replacing the
spaces yourself as a test, and see if it comes out as you'd like.
Nick
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en