Hello, I've already tried to ask this question on 24 April, but it seems to not have made it through to the group, so I'm trying again.
I would like to retrieve the layout coordinates of text blocks in a document. This worked well with ocroscript in version 0.2, but in more recent versions the hocr output has no coordinates. Example: ocropus book2pages _temp data/testimages/simple.png ocropus pages2lines _temp ocropus lines2fsts _temp ocropus fsts2text _temp ocropus buildhtml _temp The output is: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <meta name="ocr-capabilities" content="ocr_line ocr_page" /> <meta name="ocr-langs" content="en" /> <meta name="ocr-scripts" content="Latn" /> <meta name="ocr-microformats" content="" /> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" / ><title>OCR Output</title> </head> <body> <div class="ocr_page"> <span class="ocr_line"> This is a lot of 1 2 point text to test the</span><span class="ocr_line"> ocr code and see if it works on all types</span><span class="ocr_line"> of file format.</span><span class="ocr_line"> The quick brown dog jumped over the</span><span class="ocr_line"> lazy fox. The quick brown dog jumped</span><span class="ocr_line"> over the lazy fox. The quick brown dog</span><span class="ocr_line"> jumped over the lazy fox. The quick</span><span class="ocr_line"> brown dog jumped over the lazy fox.</span></div> </body> </html> I was expecting each span element to have a title attribute like <span class="ocr_line" title="bbox 313 324 733 1922">...</span> Is there any way to turn this on? Thanks, Wolfgang -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
