Hi Leo, > > however I just tried on image segmenation using VORONOI algorithm and > it appear to me the algorithm tends to erase off some componemnts > after segmenation,
Voronoi may erase all components that it considers belonging to noise. Usually these are very small sized components, but you can disable this by a parameter (bool remove_noise). > also is the current VORONOI algoritm similar to the > XY-CUT algoritm which haven't yet able to provide information for each > block it found? Yes, Voronoi also does not provide information about columns in the page. > I have upload a "VORONOI.png" for your reference, the original image > is "01_1_1_2.PNG" > There have been some bug fixes in both Voronoi and x-y cut algorithm after the 0.2 release. Please use 0.3 release to get the latest version. I tried this image and looks fine to me (I am using svn version, which is the same as 0.3 release as far as Voronoi and x-y cut are concerned), though I needed to binarize the image before giving it to Voronoi. Cheers, Faisal > > Cheers, > Leo > > On 11月19日, 上午12時25分, "Faisal Shafait" <[EMAIL PROTECTED]> wrote: > > Hi Leo, > > Since you are assigning everything to the first column, you will get > > everything as the first column :-) > > > > You need to write an algorithm that groups zones into text columns. You > can > > find such an algorithm in ocr-layout-rast/ocr-layout-rast.cc (sorry the > file > > is a big mess at the moment, but I am working on refactoring it). Please > > look at the method: > > void SegmentPageByRAST::getCol(rectarray &columns, rectarray > ¶graphs) > > that takes a rectarray of paragraphs and groups them into text column > using > > their aligment and position on the page. You can treat the zones returned > by > > x-y cut as paragraphs and try this algorithm. If this does not work then > you > > have to write some algorithm on your own. > > > > Cheers, > > Faisal > > > > 2008/11/18 Leo <[EMAIL PROTECTED]> > > > > > > > > > Hi Faisal > > > > > I have do the change that you suggest and it can pass the > > > check_page_segmentation() function > > > however when I try to do the RegionExtractor for text columns/ > > > paragraphs/ lines extraction it still didn't work quite well. > > > it return the whole image as one columns when the image actually > > > contain two columns, is there any way to work-around it? > > > thanks for the help > > > > > Cheers, > > > Leo > > > > > On 11月18日, 上午1時37分, "Faisal Shafait" <[EMAIL PROTECTED]> wrote: > > > > Hi Leo, > > > > Thanks for reporting this bug. > > > > > > XYCUTS and Voronoi algorithms divide a page into several blocks. They > do > > > not > > > > define the role of these blocks whether the block contains > text/images > > > etc. > > > > and provide no information about the columnar structure of the > document. > > > > Therefore they do not pass the check_page_segmentation() function as > the > > > > function checks for proper encoding of text columns/ paragraphs/ > lines > > > etc. > > > > > > A work-around for the moment would be to assign all blocks to the > first > > > > column. I have changed that in the svn version. You just need to > replace: > > > > int color = i+1; > > > > in the segment() method with: > > > > int color = (i+1) | (0x00010000); > > > > > > Cheers, > > > > Faisal > > > > > > On Mon, Nov 17, 2008 at 10:38 AM, Leo <[EMAIL PROTECTED]> wrote: > > > > > > > Hi All, > > > > > > > I am currently using version 0.2 and try out the new XYCUTS page > > > > > segmentation algorithm make_SegmentPageByXYCUTS(), however I found > > > > > that after segmentation the intarray didn't pass the > > > > > check_page_segmentation() function, therefore I can not use it for > > > > > RegionExtractor, did anyone had the similar problem? > > > > > > > Cheers, > > > > > Leo > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
