I have opened this as an issue at https://github.com/tesserac
t-ocr/tessdata/issues/77

You can provide additional feedback there.

@theraysmith is doing the training at Google.  The examples you provide
will be helpful to him and improve future training.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Aug 29, 2017 at 7:38 PM, <[email protected]> wrote:

> spa and latin within best folders are moreless equivalent, there is no
> significant difference, although there are several failures they are quite
> reasonable. The one that provide real bad output are the official ones that
> are automatically installed.
>
> Do you need help training the data? (is a neural network?) I can provide
> examples.
>
> El martes, 29 de agosto de 2017, 3:17:40 (UTC+2), shree escribió:
>>
>> I had not checked the list.
>>
>> It should actually be Latin.traineddata for all languages written in
>> Latin script. Not Spanish, as I had written.
>>
>> On 29-Aug-2017 3:54 AM, <[email protected]> wrote:
>>
>>> So... I have installed the default tessdata used by the installer, which
>>> seems to be this one: https://github.com/tesser
>>> act-ocr/tessdata/blob/master/spa.traineddata
>>>
>>> Looking to your comment I have installed the package:
>>> https://github.com/tesseract-ocr/tessdata/blob/mast
>>> er/best/spa.traineddata
>>>
>>> But I have not found best/Spanish, is it missing in the upload?
>>>
>>> The best/spa is REALLY better and comparable quality when compared to
>>> english, the have moreless the same level of errors.
>>>
>>> Where is best/Spanish, looking to the effect I am really interested in
>>> testing it.
>>>
>>> Btw, is there any way to tell tesseract that values are in a table, so
>>> that it will not make a mistake identifying lines with charts?
>>>
>>> El lunes, 28 de agosto de 2017, 8:15:41 (UTC+2), shree escribió:
>>>>
>>>> Have you tried with the 'best' traineddatas?
>>>>
>>>> What about results using best/Spanish vs best/spa?
>>>>
>>>> I have opened this as an issue at https://github.com/tesserac
>>>> t-ocr/tessdata/issues/77
>>>>
>>>> You can provide additional feedback there.
>>>>
>>>> ShreeDevi
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>> On Mon, Aug 28, 2017 at 6:04 AM, <[email protected]> wrote:
>>>>
>>>>> So... after following the instructions from quality improvement:
>>>>> https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality I
>>>>> found what I think is a nice picture, I attach you tessinput.tif file I
>>>>> received as output.
>>>>>
>>>>> When I ran tesseract 4.0.0 on the image I found that actually the eng
>>>>> version is providing a better nicer version of the analysis than the
>>>>> spanish version.
>>>>>
>>>>> What can I do? I actually have seen recurrent errors with the same
>>>>> chart.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/b1efae89-d9d
>>>>> 5-4970-9b3e-5e29f9dd6620%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/b1efae89-d9d5-4970-9b3e-5e29f9dd6620%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/0299357d-0026-4a7a-8cfa-921094a0c25e%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/0299357d-0026-4a7a-8cfa-921094a0c25e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/cf07113f-e581-4cd0-bf8e-050a8b8dc3a0%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/cf07113f-e581-4cd0-bf8e-050a8b8dc3a0%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVqvic1GgvccpKnbuGeKXwW0aLfgNEZbr6eyFEa%3DiFkcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to