You should consider also using the PAGE format. You can use this tool for conversion: http://www.primaresearch.org/tools/TesseractOCRToPAGE
On Monday, 13 July 2015 06:23:09 UTC+1, [email protected] wrote: > > I'm working on converting a large number of tax forms into structured > data, is hOCR the best way to do this? maybe there are other ways? I would > imagine this is a problem that is at least partially solved. > > Thanks in advance! Tesseract is awesome :) > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2c8fcf96-16b7-496b-804c-470d63e3e413%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

