Hi Jason, I saw no mention of mergesegs or any mention of Nutch within the first post of this thread. It probably would be a useful feature for many Nutch users. I agree. It seems the thread did not get much feedback though which is a shame. If you want to discuss more thoroughly then please do. There will surely be some that will participate. Thanks, have a great weekend. Lewis
On Wed, Mar 6, 2013 at 1:34 AM, Stubblefield Jason < [email protected]> wrote: > Well Lewis, I quite frankly disagree. > > I am asking how I can have more control for the slice process in the nutch > mergesegs operation. > > I think this could be a useful feature to many Nutch users. > > I can see that I wont get any more assistance here. > > Thanks, > > Jason > > > > On Mar 6, 2013, at 6:18 AM, Lewis John Mcgibbney < > [email protected]> wrote: > > > Hi Jason, > > There is nothing I can see here which concerns Nutch. > > Try solr lists please. > > Thank you > > Lewis > > > > On Tuesday, March 5, 2013, Stubblefield Jason < > > [email protected]> wrote: > >> I have several Solr 3.6 instances that for various reasons, I don't want > > to upgrade to 4.0 yet. My index is too big to fit on one machine. I > want > > to be able to slice the crawl so that I can have 1 slice per solr shard, > > but also use the grouping feature on solr. From what I understand, solr > > grouping doesn't work properly when pages from a domain are spread across > > solr shards. > >> > >> Basically i'm after something like this: > >> > >> slice1 (apache.org, linux.org) -> solr1 > >> > >> slice2 (stackoverflow.com, wikipedia.org) -> solr2 > >> > >> etc... > >> > >> I could upgrade to Solrcloud, or possibly use elasticsearch, but it > would > > be a fair amount of re-coding. I was just curious if I could manage the > > sharding manually. > >> > >> Suggestions would certainly be appreciated, it seems like I am faced > with > > a massive upgrade or to break the grouping functionality. > >> > >> ~Jason > >> > >> On Mar 5, 2013, at 11:02 PM, Markus Jelsma <[email protected]> > > wrote: > >> > >>> Hi > >>> > >>> You can't do this with -slice but you can merge segments and filter > > them. This would mean you'd have to merge the segments for each domain. > But > > that's far too much work. Why do you want to do this? There may be better > > ways in achieving you goal. > >>> > >>> > >>> > >>> -----Original message----- > >>>> From:Jason S <[email protected]> > >>>> Sent: Tue 05-Mar-2013 22:18 > >>>> To: [email protected] > >>>> Subject: keep all pages from a domain in one slice > >>>> > >>>> Hello, > >>>> > >>>> I seem to remember seeing a discussion about this in the past but I > > can't seem to find it in the archives. > >>>> > >>>> When using mergesegs -slice, is it possible to keep all the pages from > > a domain in the same slice? I have just been messing around with this > > functionality (Nutch 1.6), and it seems like the records are simply split > > after the counter has reached the slice size specified, sometimes > splitting > > the records from a single domain over multiple slices. > >>>> > >>>> How can I segregate a domain to a single slice? > >>>> > >>>> Thanks in advance, > >>>> > >>>> ~Jason > >> > >> > > > > -- > > *Lewis* > > -- *Lewis*

