Hi,
make_SegmentWords() in ocr-word-segmentation.h implements the ISegmentPage
interface to segment the given input binary image into words.

Cheers,
Faisal

On Mon, Mar 2, 2009 at 6:25 PM, Thomas Breuel <[email protected]> wrote:

> On Mon, Mar 2, 2009 at 03:59, Leo <[email protected]> wrote:
>
>> I am looking for an algorithm in ocropus that allows word extraction
>> from an image of paragraph or line of text. At moment I using the
>> make_StandardGrouper() function with CurvedCut segmentation for
>> extracting the character position, however it didn't seem to work
>> quite well.
>
>
> StandardGrouper + CurvedCut does not give you characters, it gives you a
> large collection of character hypotheses, most of which aren't characters.
>
> If you want characters, you need to store the character hypotheses in a
> lattice and then select the best path with a language model.
>
>
>> Is there any word segmentation algorithm currently
>> implemented in Ocropus that allows me to extract or find out the
>> position of each word within an image?
>
>
> There are two kinds of word segmentations: image-based and OCR output
> based.  Image-based word segmentation doesn't require OCR, but it is also
> not very accurate and only works for Latin scripts.  Output-based works for
> all languages.  In principle, you can do both with OCRopus.
>
> The way this is done is changing.  In the next release of OCRopus, you
> should be able to get the word bounding boxes from character bounding boxes
> in the hOCR output.  Right now, it's a little more complicated.  My
> suggestion would be to wait a couple of weeks.
>
> I believe Faisal wrote an image-based word segmenter; maybe he can answer
> about how/whether you can use that.
>
> Tom
>
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to