Hi Leo, Since you are assigning everything to the first column, you will get everything as the first column :-)
You need to write an algorithm that groups zones into text columns. You can find such an algorithm in ocr-layout-rast/ocr-layout-rast.cc (sorry the file is a big mess at the moment, but I am working on refactoring it). Please look at the method: void SegmentPageByRAST::getCol(rectarray &columns, rectarray ¶graphs) that takes a rectarray of paragraphs and groups them into text column using their aligment and position on the page. You can treat the zones returned by x-y cut as paragraphs and try this algorithm. If this does not work then you have to write some algorithm on your own. Cheers, Faisal 2008/11/18 Leo <[EMAIL PROTECTED]> > > Hi Faisal > > I have do the change that you suggest and it can pass the > check_page_segmentation() function > however when I try to do the RegionExtractor for text columns/ > paragraphs/ lines extraction it still didn't work quite well. > it return the whole image as one columns when the image actually > contain two columns, is there any way to work-around it? > thanks for the help > > Cheers, > Leo > > On 11月18日, 上午1時37分, "Faisal Shafait" <[EMAIL PROTECTED]> wrote: > > Hi Leo, > > Thanks for reporting this bug. > > > > XYCUTS and Voronoi algorithms divide a page into several blocks. They do > not > > define the role of these blocks whether the block contains text/images > etc. > > and provide no information about the columnar structure of the document. > > Therefore they do not pass the check_page_segmentation() function as the > > function checks for proper encoding of text columns/ paragraphs/ lines > etc. > > > > A work-around for the moment would be to assign all blocks to the first > > column. I have changed that in the svn version. You just need to replace: > > int color = i+1; > > in the segment() method with: > > int color = (i+1) | (0x00010000); > > > > Cheers, > > Faisal > > > > On Mon, Nov 17, 2008 at 10:38 AM, Leo <[EMAIL PROTECTED]> wrote: > > > > > Hi All, > > > > > I am currently using version 0.2 and try out the new XYCUTS page > > > segmentation algorithm make_SegmentPageByXYCUTS(), however I found > > > that after segmentation the intarray didn't pass the > > > check_page_segmentation() function, therefore I can not use it for > > > RegionExtractor, did anyone had the similar problem? > > > > > Cheers, > > > Leo > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
