You will see how the hocr file is built with lines like this: 
api/baseapi.cpp:        hocr_str.add_str_int("\n    <p class='ocr_par' 
dir='ltr' id='par_",

Going out on a limb, I grepped the tree for ocr_float, and got no hits. A 
closer look at the code might turn up something, so have a look.

What I see in api/baseapi.cpp is:
'ocr_page'
'ocr_carea'
'ocr_par'
'ocr_line'
'ocrx_word'

You can also look in api/renderer.cpp :

bool TessHOcrRenderer::BeginDocumentHandler() {
..
      "  <meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par"
      " ocr_line ocrx_word");
  

On Monday, July 6, 2015 at 6:52:14 AM UTC-4, James Owers wrote:
>
> I'm trying to reproduce results achieved at the ICDAR page segmentation 
> competitions [1,2] with tesseract. I'm struggling to get the tool to output 
> the hOCR tags that I'm expecting for tables and figures etc [3]. At the 
> moment I'm calling tesseract with pagesegmode 1. Should I be adding other 
> options via a config file to achieve the full extent of tesseracts 
> segmentation and labelling ability (I'm not interested in the character 
> recognition element as much).
>
>    1. Antonacopoulos (2013, ICDAR) ICDAR2013 Competition on Historical 
>    Book Recognition – HBR2013
>    2. Antonacopoulos (2013, ICDAR) ICDAR2013 Competition on Historical 
>    Newspaper Layout Analysis – HNLA2013
>    3. Breuel (2010) The hOCR Embedded OCR Workflow and Output Format
>
>
> I've cross-posted this from 
> https://github.com/tesseract-ocr/tesseract/issues/42 and will update both 
> with responses. Which is the default Q&A place?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c44095fb-dc09-4e26-905c-dcc6d3990f9b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to