>Example:
><span class='ocr_line' id='line_1' title="bbox 0 0 45 20"><span 
>class='ocr_xword' id='xword_1' title="bbox 0 0 20 20"><span class='ocr_cinfo' 
>title="x_bboxes b1x0 b1y0 b1x1 b1y1 b2x0 ...">hello</span></span><span> 
></span><span class='ocr_xword' id='xword_2' title="bbox 25 0 45 20"><span 
>class='ocr_cinfo' title="x_bboxes b1x0 b1y0 b1x1 b1y1 b2x0 
>...">world</span></span>
>(note the whitespace which is not part of any ocr_xword as cuneiform will 
>produce an incorrect bbox for it)

That looks much better that the current output or pre-0.9 output. 
However, I'm not sure if/why we need ocr_cinfo at all here. AFAIU, 
"x_bboxes" is analogous to "cuts" and "nlp" properties, which could be 
applied to any element (e.g. directly to an ocr_xword).

Anyway, if there are any doubts on the interpretation of the hOCR 
specification (which is admittedly vague), it's better to ask at 
[email protected] than to guess.

-- 
Jakub Wilk

-- 
Font size not correct in merged sandvich PDF
https://bugs.launchpad.net/bugs/623438
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to