Hi Leo,
Since you are assigning everything to the first column, you will get
everything as the first column :-)

You need to write an algorithm that groups zones into text columns. You can
find such an algorithm in ocr-layout-rast/ocr-layout-rast.cc (sorry the file
is a big mess at the moment, but I am working on refactoring it). Please
look at the method:
   void SegmentPageByRAST::getCol(rectarray &columns, rectarray &paragraphs)
that takes a rectarray of paragraphs and groups them into text column using
their aligment and position on the page. You can treat the zones returned by
x-y cut as paragraphs and try this algorithm. If this does not work then you
have to write some algorithm on your own.

Cheers,
Faisal


2008/11/18 Leo <[EMAIL PROTECTED]>

>
> Hi Faisal
>
> I have do the change that you suggest and it can pass the
> check_page_segmentation() function
> however when I try to do the RegionExtractor for text columns/
> paragraphs/ lines extraction it still didn't work quite well.
> it return the whole image as one columns when the image actually
> contain two columns, is there any way to work-around it?
> thanks for the help
>
> Cheers,
> Leo
>
> On 11月18日, 上午1時37分, "Faisal Shafait" <[EMAIL PROTECTED]> wrote:
> > Hi Leo,
> > Thanks for reporting this bug.
> >
> > XYCUTS and Voronoi algorithms divide a page into several blocks. They do
> not
> > define the role of these blocks whether the block contains text/images
> etc.
> > and provide no information about the columnar structure of the document.
> > Therefore they do not pass the check_page_segmentation() function as the
> > function checks for proper encoding of text columns/ paragraphs/ lines
> etc.
> >
> > A work-around for the moment would be to assign all blocks to the first
> > column. I have changed that in the svn version. You just need to replace:
> >             int color = i+1;
> > in the segment() method with:
> >             int color = (i+1) | (0x00010000);
> >
> > Cheers,
> > Faisal
> >
> > On Mon, Nov 17, 2008 at 10:38 AM, Leo <[EMAIL PROTECTED]> wrote:
> >
> > > Hi All,
> >
> > > I am currently using version 0.2 and try out the new XYCUTS page
> > > segmentation algorithm make_SegmentPageByXYCUTS(), however I found
> > > that after segmentation the intarray didn't pass the
> > > check_page_segmentation() function, therefore I can not use it for
> > > RegionExtractor, did anyone had the similar problem?
> >
> > > Cheers,
> > > Leo
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to