Not yet.  Paragraphs are actually tricky because there are many
different ways of indicating them.  We may not put paragraph detection
directly into OCRopus but leave that for an hOCR to hOCR transformer.

You could write a simple such transformer and contribute it.
Basically, you need to look at all the ocr_line elements and their
corresponding bounding boxes.  A new paragraph starts either if the x0
for a line is substantially indented relative to the previous and
following line, or if there is substantially more space between a line
and its previous line than there is on average.

Tom

On Fri, May 29, 2009 at 17:07, [email protected]
<[email protected]> wrote:
>
> Hello. I need paragraphs instead of lines, too.
>
> Did you make progress yet?
>
> Michael
>
> On 1 Apr., 02:56, Michael Moore <[email protected]> wrote:
>> On Tue, Mar 31, 2009 at 3:03 AM, Duncan McGregor
>>
>> <[email protected]> wrote:
>>
>> > Do you want to group text intoparagraphunits, or just lay it out
>> > more visually convincingly?
>>
>> I'd like to group it intoparagraphunits. The output will be
>> displayed in a browser for the user and will then be copied and pasted
>> into a word processor. The word processor will paste the content into
>> paragraphs if the HTML is marked up as paragraphs, otherwise it's just
>> one huge block of text.
>>
>> The being visually convincing is nice too, but not as important for
>> this work flow.
>>
>> Thank you,
>> Michael Moore
>>
>>
>>
>> > I don't know about the former, but for the latter I had some success
>> > in adding css to the divs to move their position to where the bbox
>> > said they should be.
>>
>> > Duncan McGregor
>> >www.VelOCRaptor.com
>>
>> > On Mon, Mar 30, 2009 at 11:08 PM, Michael Moore <[email protected]> 
>> > wrote:
>>
>> >> Are there any tools or options I can use to get my hocr output with
>> >>paragraphtags?
>>
>> >> Many thanks,
>> >> --
>> >> Michael Moore
>>
>> --
>> Michael Moore
>> -------------------------
>> Share your families' genealogy and family history books. It's easy and
>> free :http://bookscanned.com
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to