Thank you Rick. A concise answer was given on GitHub recently:

*jimregan <https://github.com/jimregan> *commented 2 days ago 
<https://github.com/tesseract-ocr/tesseract/issues/42#issuecomment-122577036>

This issue is currently the top search result for 'ocr_float'; it lacks a 
simple summary: Tesseract (currently) does not support ocr_float.

On Monday, 6 July 2015 19:59:15 UTC+1, Rick Leir wrote:
>
> You will see how the hocr file is built with lines like this: 
> api/baseapi.cpp:        hocr_str.add_str_int("\n    <p class='ocr_par' 
> dir='ltr' id='par_",
>
> Going out on a limb, I grepped the tree for ocr_float, and got no hits. A 
> closer look at the code might turn up something, so have a look.
>
> What I see in api/baseapi.cpp is:
> 'ocr_page'
> 'ocr_carea'
> 'ocr_par'
> 'ocr_line'
> 'ocrx_word'
>
> You can also look in api/renderer.cpp :
>
> bool TessHOcrRenderer::BeginDocumentHandler() {
> ..
>       "  <meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par"
>       " ocr_line ocrx_word");
>   
>
> On Monday, July 6, 2015 at 6:52:14 AM UTC-4, James Owers wrote:
>>
>> I'm trying to reproduce results achieved at the ICDAR page segmentation 
>> competitions [1,2] with tesseract. I'm struggling to get the tool to output 
>> the hOCR tags that I'm expecting for tables and figures etc [3]. At the 
>> moment I'm calling tesseract with pagesegmode 1. Should I be adding other 
>> options via a config file to achieve the full extent of tesseracts 
>> segmentation and labelling ability (I'm not interested in the character 
>> recognition element as much).
>>
>>    1. Antonacopoulos (2013, ICDAR) ICDAR2013 Competition on Historical 
>>    Book Recognition – HBR2013
>>    2. Antonacopoulos (2013, ICDAR) ICDAR2013 Competition on Historical 
>>    Newspaper Layout Analysis – HNLA2013
>>    3. Breuel (2010) The hOCR Embedded OCR Workflow and Output Format
>>
>>
>> I've cross-posted this from 
>> https://github.com/tesseract-ocr/tesseract/issues/42 and will update 
>> both with responses. Which is the default Q&A place?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3a090c00-d682-4e30-9658-ba79f5e417a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to