>Btw, is there any way to tell tesseract that values are in a table, so
that it will not make a mistake identifying lines with charts?

I don't think tesseract has that ability.

You will need to preprocess the image to remove lines. Leptonica has
functions to do that, as well as a table detector.

See https://github.com/DanBloomberg/leptonica/commits/master



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Aug 29, 2017 at 6:47 AM, ShreeDevi Kumar <[email protected]>
wrote:

> I had not checked the list.
>
> It should actually be Latin.traineddata for all languages written in Latin
> script. Not Spanish, as I had written.
>
> On 29-Aug-2017 3:54 AM, <[email protected]> wrote:
>
>> So... I have installed the default tessdata used by the installer, which
>> seems to be this one: https://github.com/tesser
>> act-ocr/tessdata/blob/master/spa.traineddata
>>
>> Looking to your comment I have installed the package:
>> https://github.com/tesseract-ocr/tessdata/blob/mast
>> er/best/spa.traineddata
>>
>> But I have not found best/Spanish, is it missing in the upload?
>>
>> The best/spa is REALLY better and comparable quality when compared to
>> english, the have moreless the same level of errors.
>>
>> Where is best/Spanish, looking to the effect I am really interested in
>> testing it.
>>
>> Btw, is there any way to tell tesseract that values are in a table, so
>> that it will not make a mistake identifying lines with charts?
>>
>> El lunes, 28 de agosto de 2017, 8:15:41 (UTC+2), shree escribió:
>>>
>>> Have you tried with the 'best' traineddatas?
>>>
>>> What about results using best/Spanish vs best/spa?
>>>
>>> I have opened this as an issue at https://github.com/tesserac
>>> t-ocr/tessdata/issues/77
>>>
>>> You can provide additional feedback there.
>>>
>>> ShreeDevi
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Mon, Aug 28, 2017 at 6:04 AM, <[email protected]> wrote:
>>>
>>>> So... after following the instructions from quality improvement:
>>>> https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality I found
>>>> what I think is a nice picture, I attach you tessinput.tif file I received
>>>> as output.
>>>>
>>>> When I ran tesseract 4.0.0 on the image I found that actually the eng
>>>> version is providing a better nicer version of the analysis than the
>>>> spanish version.
>>>>
>>>> What can I do? I actually have seen recurrent errors with the same
>>>> chart.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>>> gid/tesseract-ocr/b1efae89-d9d5-4970-9b3e-5e29f9dd6620%40goo
>>>> glegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/b1efae89-d9d5-4970-9b3e-5e29f9dd6620%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/0299357d-0026-4a7a-8cfa-921094a0c25e%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/0299357d-0026-4a7a-8cfa-921094a0c25e%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWPkn6kWe7pnQ7W3%2Bi542juyKECM08M_7mBp0R7ZPXzbA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to