If the invoices have a fixed format, you can try with uzn. See https://github.com/jsoma/tesseract-uzn https://jsoma.github.io/kull/#/
Or checkout OPENCV See https://www.learnopencv.com/deep-learning-based-text-recognition-ocr-using-tesseract-and-opencv/ On Fri, Mar 22, 2019 at 9:35 PM yoganand <[email protected]> wrote: > Hello, > > Im building a OCR to read selected fields from invoices. i used tesseract, > problems im facing are > 1)not able to get table structures as is, atleast expecting a pipe symbol, > which wil help in parsing text > 2)few of characters were not extracted correctly. how to improve quality. > does training tesseract4 helps? > 3)why do you train tesseract4 additionally? > 4)is there any option that i can use to get white spaces between words and > text alignment as is in image after converting > > i almost spent 1 mnth on this, could able to build ocr tool with a 40% > accuracy > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/8ea1b021-5e96-43f4-a862-07da94eae9e6%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/8ea1b021-5e96-43f4-a862-07da94eae9e6%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWVbSrmWTofvbJ8Eut4WcJorKmci6g3MM8cadB7jpnFMg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

