We're converting the top level commands from C++ to Python.  Most of
the OCRopus C++ command line programs now have Python command line
equivalents.

There are two ways of generating hOCR output.  You can do the
traditional multi-step processing, in which case you use ocropus-hocr
to generate the final output, and you can use ocropus-pages, which
does all recognition in a single program.  Both output bounding boxes
for lines.  We'll be adding support for word and character bounding
boxes later as well.

Tom

On May 3, 6:59 pm, Wolfgang Schwarz <[email protected]> wrote:
> Hello,
>
> I've already tried to ask this question on 24 April, but it seems to
> not have made it through to the group, so I'm trying again.
>
> I would like to retrieve the layout coordinates of text blocks in a
> document. This worked well with ocroscript in version 0.2, but in more
> recent versions the hocr output has no coordinates. Example:
>
> ocropus book2pages _temp data/testimages/simple.png
> ocropus pages2lines _temp
> ocropus lines2fsts _temp
> ocropus fsts2text _temp
> ocropus buildhtml _temp
>
> The output is:
>
> <!DOCTYPE html
>    PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN
>    http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
> <html>
> <head>
> <meta name="ocr-capabilities" content="ocr_line ocr_page" />
> <meta name="ocr-langs" content="en" />
> <meta name="ocr-scripts" content="Latn" />
> <meta name="ocr-microformats" content="" />
> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" 
> /><title>OCR Output</title>
>
> </head>
> <body>
> <div class="ocr_page">
> <span class="ocr_line">
> This is a lot of 1 2 point text to test the</span><span
> class="ocr_line">
> ocr code and see if it works on all types</span><span
> class="ocr_line">
> of file format.</span><span class="ocr_line">
> The quick brown dog jumped over the</span><span class="ocr_line">
> lazy fox. The quick brown dog jumped</span><span class="ocr_line">
> over the lazy fox. The quick brown dog</span><span class="ocr_line">
> jumped over the lazy fox. The quick</span><span class="ocr_line">
> brown dog jumped over the lazy fox.</span></div>
> </body>
> </html>
>
> I was expecting each span element to have a title attribute like
>
> <span class="ocr_line" title="bbox 313 324 733 1922">...</span>
>
> Is there any way to turn this on?
>
> Thanks,
> Wolfgang
>
> --
> You received this message because you are subscribed to the Google Groups 
> "ocropus" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group 
> athttp://groups.google.com/group/ocropus?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en.

Reply via email to