I have opened this as an issue at https://github.com/tesserac t-ocr/tessdata/issues/77
You can provide additional feedback there. @theraysmith is doing the training at Google. The examples you provide will be helpful to him and improve future training. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Aug 29, 2017 at 7:38 PM, <[email protected]> wrote: > spa and latin within best folders are moreless equivalent, there is no > significant difference, although there are several failures they are quite > reasonable. The one that provide real bad output are the official ones that > are automatically installed. > > Do you need help training the data? (is a neural network?) I can provide > examples. > > El martes, 29 de agosto de 2017, 3:17:40 (UTC+2), shree escribió: >> >> I had not checked the list. >> >> It should actually be Latin.traineddata for all languages written in >> Latin script. Not Spanish, as I had written. >> >> On 29-Aug-2017 3:54 AM, <[email protected]> wrote: >> >>> So... I have installed the default tessdata used by the installer, which >>> seems to be this one: https://github.com/tesser >>> act-ocr/tessdata/blob/master/spa.traineddata >>> >>> Looking to your comment I have installed the package: >>> https://github.com/tesseract-ocr/tessdata/blob/mast >>> er/best/spa.traineddata >>> >>> But I have not found best/Spanish, is it missing in the upload? >>> >>> The best/spa is REALLY better and comparable quality when compared to >>> english, the have moreless the same level of errors. >>> >>> Where is best/Spanish, looking to the effect I am really interested in >>> testing it. >>> >>> Btw, is there any way to tell tesseract that values are in a table, so >>> that it will not make a mistake identifying lines with charts? >>> >>> El lunes, 28 de agosto de 2017, 8:15:41 (UTC+2), shree escribió: >>>> >>>> Have you tried with the 'best' traineddatas? >>>> >>>> What about results using best/Spanish vs best/spa? >>>> >>>> I have opened this as an issue at https://github.com/tesserac >>>> t-ocr/tessdata/issues/77 >>>> >>>> You can provide additional feedback there. >>>> >>>> ShreeDevi >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>>> On Mon, Aug 28, 2017 at 6:04 AM, <[email protected]> wrote: >>>> >>>>> So... after following the instructions from quality improvement: >>>>> https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality I >>>>> found what I think is a nice picture, I attach you tessinput.tif file I >>>>> received as output. >>>>> >>>>> When I ran tesseract 4.0.0 on the image I found that actually the eng >>>>> version is providing a better nicer version of the analysis than the >>>>> spanish version. >>>>> >>>>> What can I do? I actually have seen recurrent errors with the same >>>>> chart. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/b1efae89-d9d >>>>> 5-4970-9b3e-5e29f9dd6620%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/b1efae89-d9d5-4970-9b3e-5e29f9dd6620%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/0299357d-0026-4a7a-8cfa-921094a0c25e%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/0299357d-0026-4a7a-8cfa-921094a0c25e%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/cf07113f-e581-4cd0-bf8e-050a8b8dc3a0% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/cf07113f-e581-4cd0-bf8e-050a8b8dc3a0%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVqvic1GgvccpKnbuGeKXwW0aLfgNEZbr6eyFEa%3DiFkcA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

