Hi,
Is there a way to output the HOCR tesseract generates into a good HTML5
page complete with the text's positioning and font style ?
Or best to just read the bbox coordinates as is and output to an HTML5 ?
<div class='ocr_carea' id='block_2_8' title="bbox 1165 1335 1644 1358">
<p class='ocr_par' dir='ltr' id='par_2_8' title="bbox 1165 1335 1644
1358">
<span class='ocr_line' id='line_2_21' title="bbox 1165 1335 1644 1358;
baseline 0 -1"><span class='ocrx_word' id='word_2_122' title='bbox 1165
1335 1275 1358; x_wconf 98' lang='eng' dir='ltr'>TOTAL</span> <span class=
'ocrx_word' id='word_2_123' title='bbox 1302 1335 1412 1358; x_wconf 82'
lang='eng' dir='ltr'>AMoUNT</span> <span class='ocrx_word' id='word_2_124'
title='bbox 1439 1335 1644 1357; x_wconf 89' lang='eng' dir='ltr'>TAKEN
</span>
</span>
</p>
</div>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/c2ed3b79-8d2b-4f64-a0c6-fbc719ab4d57%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.