Hi, make_SegmentWords() in ocr-word-segmentation.h implements the ISegmentPage interface to segment the given input binary image into words.
Cheers, Faisal On Mon, Mar 2, 2009 at 6:25 PM, Thomas Breuel <[email protected]> wrote: > On Mon, Mar 2, 2009 at 03:59, Leo <[email protected]> wrote: > >> I am looking for an algorithm in ocropus that allows word extraction >> from an image of paragraph or line of text. At moment I using the >> make_StandardGrouper() function with CurvedCut segmentation for >> extracting the character position, however it didn't seem to work >> quite well. > > > StandardGrouper + CurvedCut does not give you characters, it gives you a > large collection of character hypotheses, most of which aren't characters. > > If you want characters, you need to store the character hypotheses in a > lattice and then select the best path with a language model. > > >> Is there any word segmentation algorithm currently >> implemented in Ocropus that allows me to extract or find out the >> position of each word within an image? > > > There are two kinds of word segmentations: image-based and OCR output > based. Image-based word segmentation doesn't require OCR, but it is also > not very accurate and only works for Latin scripts. Output-based works for > all languages. In principle, you can do both with OCRopus. > > The way this is done is changing. In the next release of OCRopus, you > should be able to get the word bounding boxes from character bounding boxes > in the hOCR output. Right now, it's a little more complicated. My > suggestion would be to wait a couple of weeks. > > I believe Faisal wrote an image-based word segmenter; maybe he can answer > about how/whether you can use that. > > Tom > > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
