I'm trying to reproduce results achieved at the ICDAR page segmentation 
competitions [1,2] with tesseract. I'm struggling to get the tool to output 
the hOCR tags that I'm expecting for tables and figures etc [3]. At the 
moment I'm calling tesseract with pagesegmode 1. Should I be adding other 
options via a config file to achieve the full extent of tesseracts 
segmentation and labelling ability (I'm not interested in the character 
recognition element as much).

   1. Antonacopoulos (2013, ICDAR) ICDAR2013 Competition on Historical Book 
   Recognition – HBR2013
   2. Antonacopoulos (2013, ICDAR) ICDAR2013 Competition on Historical 
   Newspaper Layout Analysis – HNLA2013
   3. Breuel (2010) The hOCR Embedded OCR Workflow and Output Format


I've cross-posted this 
from https://github.com/tesseract-ocr/tesseract/issues/42 and will update 
both with responses. Which is the default Q&A place?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b3b996cf-2237-465f-8735-f6ab2906e946%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to