Hi Jason,

I saw no mention of mergesegs or any mention of Nutch within the first post
of this thread.
It probably would be a useful feature for many Nutch users. I agree.
It seems the thread did not get much feedback though which is a shame.
If you want to discuss more thoroughly then please do. There will surely be
some that will participate.
Thanks, have a great weekend.
Lewis

On Wed, Mar 6, 2013 at 1:34 AM, Stubblefield Jason <
[email protected]> wrote:

> Well Lewis, I quite frankly disagree.
>
> I am asking how I can have more control for the slice process in the nutch
> mergesegs operation.
>
> I think this could be a useful feature to many Nutch users.
>
> I can see that I wont get any more assistance here.
>
> Thanks,
>
> Jason
>
>
>
> On Mar 6, 2013, at 6:18 AM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
> > Hi Jason,
> > There is nothing I can see here which concerns Nutch.
> > Try solr lists please.
> > Thank you
> > Lewis
> >
> > On Tuesday, March 5, 2013, Stubblefield Jason <
> > [email protected]> wrote:
> >> I have several Solr 3.6 instances that for various reasons, I don't want
> > to upgrade to 4.0 yet.  My index is too big to fit on one machine.  I
> want
> > to be able to slice the crawl so that I can have 1 slice per solr shard,
> > but also use the grouping feature on solr.  From what I understand, solr
> > grouping doesn't work properly when pages from a domain are spread across
> > solr shards.
> >>
> >> Basically i'm after something like this:
> >>
> >> slice1 (apache.org, linux.org) -> solr1
> >>
> >> slice2 (stackoverflow.com, wikipedia.org) -> solr2
> >>
> >> etc...
> >>
> >> I could upgrade to Solrcloud, or possibly use elasticsearch, but it
> would
> > be a fair amount of re-coding.  I was just curious if I could manage the
> > sharding manually.
> >>
> >> Suggestions would certainly be appreciated, it seems like I am faced
> with
> > a massive upgrade or to break the grouping functionality.
> >>
> >> ~Jason
> >>
> >> On Mar 5, 2013, at 11:02 PM, Markus Jelsma <[email protected]>
> > wrote:
> >>
> >>> Hi
> >>>
> >>> You can't do this with -slice but you can merge segments and filter
> > them. This would mean you'd have to merge the segments for each domain.
> But
> > that's far too much work. Why do you want to do this? There may be better
> > ways in achieving you goal.
> >>>
> >>>
> >>>
> >>> -----Original message-----
> >>>> From:Jason S <[email protected]>
> >>>> Sent: Tue 05-Mar-2013 22:18
> >>>> To: [email protected]
> >>>> Subject: keep all pages from a domain in one slice
> >>>>
> >>>> Hello,
> >>>>
> >>>> I seem to remember seeing a discussion about this in the past but I
> > can't seem to find it in the archives.
> >>>>
> >>>> When using mergesegs -slice, is it possible to keep all the pages from
> > a domain in the same slice?  I have just been messing around with this
> > functionality (Nutch 1.6), and it seems like the records are simply split
> > after the counter has reached the slice size specified, sometimes
> splitting
> > the records from a single domain over multiple slices.
> >>>>
> >>>> How can I segregate a domain to a single slice?
> >>>>
> >>>> Thanks in advance,
> >>>>
> >>>> ~Jason
> >>
> >>
> >
> > --
> > *Lewis*
>
>


-- 
*Lewis*

Reply via email to