I have opened this as an issue at https://github.com/tesserac
t-ocr/tessdata/issues/77
You can provide additional feedback there.
@theraysmith is doing the training at Google. The examples you provide
will be helpful to him and improve future training.
ShreeDevi
spa and latin within best folders are moreless equivalent, there is no
significant difference, although there are several failures they are quite
reasonable. The one that provide real bad output are the official ones that
are automatically installed.
Do you need help training the data? (is a
>Btw, is there any way to tell tesseract that values are in a table, so
that it will not make a mistake identifying lines with charts?
I don't think tesseract has that ability.
You will need to preprocess the image to remove lines. Leptonica has
functions to do that, as well as a table detector.
I had not checked the list.
It should actually be Latin.traineddata for all languages written in Latin
script. Not Spanish, as I had written.
On 29-Aug-2017 3:54 AM, wrote:
> So... I have installed the default tessdata used by the installer, which
> seems to be this
So... I have installed the default tessdata used by the installer, which
seems to be this
one: https://github.com/tesseract-ocr/tessdata/blob/master/spa.traineddata
Looking to your comment I have installed the
package:
https://github.com/tesseract-ocr/tessdata/blob/master/best/spa.traineddata
Have you tried with the 'best' traineddatas?
What about results using best/Spanish vs best/spa?
I have opened this as an issue at
https://github.com/tesseract-ocr/tessdata/issues/77
You can provide additional feedback there.
ShreeDevi
So... after following the instructions from quality
improvement: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality
I found what I think is a nice picture, I attach you tessinput.tif file I
received as output.
When I ran tesseract 4.0.0 on the image I found that actually the eng
7 matches
Mail list logo