Any update on this? On Sunday, August 30, 2015 at 11:30:12 AM UTC+3, gonx wrote: > > Hi, > > Is there a way to output the HOCR tesseract generates into a good HTML5 > page complete with the text's positioning and font style ? > > Or best to just read the bbox coordinates as is and output to an HTML5 ? > > <div class='ocr_carea' id='block_2_8' title="bbox 1165 1335 1644 1358"> > <p class='ocr_par' dir='ltr' id='par_2_8' title="bbox 1165 1335 1644 > 1358"> > <span class='ocr_line' id='line_2_21' title="bbox 1165 1335 1644 > 1358; baseline 0 -1"><span class='ocrx_word' id='word_2_122' title='bbox > 1165 1335 1275 1358; x_wconf 98' lang='eng' dir='ltr'>TOTAL</span> <span > class='ocrx_word' id='word_2_123' title='bbox 1302 1335 1412 1358; > x_wconf 82' lang='eng' dir='ltr'>AMoUNT</span> <span class='ocrx_word' id= > 'word_2_124' title='bbox 1439 1335 1644 1357; x_wconf 89' lang='eng' dir= > 'ltr'>TAKEN</span> > </span> > </p> > </div> > > >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d99e74bd-7255-44be-bc79-a9507b179f90%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

