Hi

Maybe you can implement SegmentMergeFilter interface to filter segments
during segment merge.


On Wed, Mar 6, 2013 at 6:02 AM, Markus Jelsma <[email protected]>wrote:

> Hi
>
> You can't do this with -slice but you can merge segments and filter them.
> This would mean you'd have to merge the segments for each domain. But
> that's far too much work. Why do you want to do this? There may be better
> ways in achieving you goal.
>
>
>
> -----Original message-----
> > From:Jason S <[email protected]>
> > Sent: Tue 05-Mar-2013 22:18
> > To: [email protected]
> > Subject: keep all pages from a domain in one slice
> >
> > Hello,
> >
> > I seem to remember seeing a discussion about this in the past but I
> can't seem to find it in the archives.
> >
> > When using mergesegs -slice, is it possible to keep all the pages from a
> domain in the same slice?  I have just been messing around with this
> functionality (Nutch 1.6), and it seems like the records are simply split
> after the counter has reached the slice size specified, sometimes splitting
> the records from a single domain over multiple slices.
> >
> > How can I segregate a domain to a single slice?
> >
> > Thanks in advance,
> >
> > ~Jason
>



-- 
Don't Grow Old, Grow Up... :-)

Reply via email to