[tesseract-ocr] pytesseract - how to improve quality of text

yoganand Fri, 22 Mar 2019 09:06:06 -0700

Hello,

Im building a OCR to read selected fields from invoices. i used tesseract, 
problems im facing are
1)not able to get table structures as is, atleast expecting a pipe symbol, 
which wil help in parsing text
2)few of characters were not extracted correctly. how to improve quality. 
does training tesseract4 helps?
3)why do you train tesseract4 additionally?
4)is there any option that i can use to get white spaces between words and 
text alignment as is in image after converting


i almost spent 1 mnth on this, could able to build ocr tool with a 40% 
accuracy

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8ea1b021-5e96-43f4-a862-07da94eae9e6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] pytesseract - how to improve quality of text

Reply via email to