>Btw, is there any way to tell tesseract that values are in a table, so that it will not make a mistake identifying lines with charts?
I don't think tesseract has that ability. You will need to preprocess the image to remove lines. Leptonica has functions to do that, as well as a table detector. See https://github.com/DanBloomberg/leptonica/commits/master ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Aug 29, 2017 at 6:47 AM, ShreeDevi Kumar <[email protected]> wrote: > I had not checked the list. > > It should actually be Latin.traineddata for all languages written in Latin > script. Not Spanish, as I had written. > > On 29-Aug-2017 3:54 AM, <[email protected]> wrote: > >> So... I have installed the default tessdata used by the installer, which >> seems to be this one: https://github.com/tesser >> act-ocr/tessdata/blob/master/spa.traineddata >> >> Looking to your comment I have installed the package: >> https://github.com/tesseract-ocr/tessdata/blob/mast >> er/best/spa.traineddata >> >> But I have not found best/Spanish, is it missing in the upload? >> >> The best/spa is REALLY better and comparable quality when compared to >> english, the have moreless the same level of errors. >> >> Where is best/Spanish, looking to the effect I am really interested in >> testing it. >> >> Btw, is there any way to tell tesseract that values are in a table, so >> that it will not make a mistake identifying lines with charts? >> >> El lunes, 28 de agosto de 2017, 8:15:41 (UTC+2), shree escribió: >>> >>> Have you tried with the 'best' traineddatas? >>> >>> What about results using best/Spanish vs best/spa? >>> >>> I have opened this as an issue at https://github.com/tesserac >>> t-ocr/tessdata/issues/77 >>> >>> You can provide additional feedback there. >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Mon, Aug 28, 2017 at 6:04 AM, <[email protected]> wrote: >>> >>>> So... after following the instructions from quality improvement: >>>> https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality I found >>>> what I think is a nice picture, I attach you tessinput.tif file I received >>>> as output. >>>> >>>> When I ran tesseract 4.0.0 on the image I found that actually the eng >>>> version is providing a better nicer version of the analysis than the >>>> spanish version. >>>> >>>> What can I do? I actually have seen recurrent errors with the same >>>> chart. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit https://groups.google.com/d/ms >>>> gid/tesseract-ocr/b1efae89-d9d5-4970-9b3e-5e29f9dd6620%40goo >>>> glegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/b1efae89-d9d5-4970-9b3e-5e29f9dd6620%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit https://groups.google.com/d/ms >> gid/tesseract-ocr/0299357d-0026-4a7a-8cfa-921094a0c25e%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/0299357d-0026-4a7a-8cfa-921094a0c25e%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWPkn6kWe7pnQ7W3%2Bi542juyKECM08M_7mBp0R7ZPXzbA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

