You should look at the different tesseract page segmentation (PSM) modes. The data you have is in a table and you'll need to process it differently. hOCR format is HTML, so it will not work as CSV format, though it does supply accuracy info, so if you want to evaluate that and product CSV you can. --Sven
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAFTC0i7EgGNdPciDjLgYAp86oceW7fMfcmK1KDOMhQg4sBvhHw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

