Hello, Im building a OCR to read selected fields from invoices. i used tesseract, problems im facing are 1)not able to get table structures as is, atleast expecting a pipe symbol, which wil help in parsing text 2)few of characters were not extracted correctly. how to improve quality. does training tesseract4 helps? 3)why do you train tesseract4 additionally? 4)is there any option that i can use to get white spaces between words and text alignment as is in image after converting
i almost spent 1 mnth on this, could able to build ocr tool with a 40% accuracy -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8ea1b021-5e96-43f4-a862-07da94eae9e6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

