You will see how the hocr file is built with lines like this:
api/baseapi.cpp: hocr_str.add_str_int("\n <p class='ocr_par'
dir='ltr' id='par_",
Going out on a limb, I grepped the tree for ocr_float, and got no hits. A
closer look at the code might turn up something, so have a look.
What I see in api/baseapi.cpp is:
'ocr_page'
'ocr_carea'
'ocr_par'
'ocr_line'
'ocrx_word'
You can also look in api/renderer.cpp :
bool TessHOcrRenderer::BeginDocumentHandler() {
..
" <meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par"
" ocr_line ocrx_word");
On Monday, July 6, 2015 at 6:52:14 AM UTC-4, James Owers wrote:
>
> I'm trying to reproduce results achieved at the ICDAR page segmentation
> competitions [1,2] with tesseract. I'm struggling to get the tool to output
> the hOCR tags that I'm expecting for tables and figures etc [3]. At the
> moment I'm calling tesseract with pagesegmode 1. Should I be adding other
> options via a config file to achieve the full extent of tesseracts
> segmentation and labelling ability (I'm not interested in the character
> recognition element as much).
>
> 1. Antonacopoulos (2013, ICDAR) ICDAR2013 Competition on Historical
> Book Recognition – HBR2013
> 2. Antonacopoulos (2013, ICDAR) ICDAR2013 Competition on Historical
> Newspaper Layout Analysis – HNLA2013
> 3. Breuel (2010) The hOCR Embedded OCR Workflow and Output Format
>
>
> I've cross-posted this from
> https://github.com/tesseract-ocr/tesseract/issues/42 and will update both
> with responses. Which is the default Q&A place?
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/c44095fb-dc09-4e26-905c-dcc6d3990f9b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.