Re: Bad results using hebrew training files

Sven Pedersen Fri, 18 Jan 2013 10:15:08 -0800

Hi Arik,
Replies below...


On Fri, Jan 18, 2013 at 2:57 AM, אריק הלפרין <[email protected]>wrote:

> Hello,
>
> I have been using tesseract with hebrew text and encountered the following
> issues:
>
> 1) There is a great variations between texts, on one text I got 80% on
> another 60% and the last one I checked gave 20% success rates. All were
> printed fonts with the latest training data file for hebrew.
>
There is a community made file for Hebrew as well as the official one.
Often the resolution (specifically pixel height) of the file impacts it
(see FAQ for range). You can preprocess the image to eliminate noise. But
we'll need to see an example image to know what's happening.

>
> 3) Results are highly influenced by the direction of the text. Meaning if
> the text is tilted a little(For instance when you scan a book - part of the
> line is at an angle), you won't get a result.
>

There is a mode to auto-rotate -- check command line options.

>
> Has anyone encountered these issues? Any idea how to solve?
>
> Thanks in advance,
> Arik Halperin
>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Bad results using hebrew training files

Reply via email to