I have several Solr 3.6 instances that for various reasons, I don't want to 
upgrade to 4.0 yet.  My index is too big to fit on one machine.  I want to be 
able to slice the crawl so that I can have 1 slice per solr shard, but also use 
the grouping feature on solr.  From what I understand, solr grouping doesn't 
work properly when pages from a domain are spread across solr shards.

Basically i'm after something like this:

slice1 (apache.org, linux.org) -> solr1

slice2 (stackoverflow.com, wikipedia.org) -> solr2

etc...

I could upgrade to Solrcloud, or possibly use elasticsearch, but it would be a 
fair amount of re-coding.  I was just curious if I could manage the sharding 
manually.

Suggestions would certainly be appreciated, it seems like I am faced with a 
massive upgrade or to break the grouping functionality.

~Jason

On Mar 5, 2013, at 11:02 PM, Markus Jelsma <[email protected]> wrote:

> Hi
> 
> You can't do this with -slice but you can merge segments and filter them. 
> This would mean you'd have to merge the segments for each domain. But that's 
> far too much work. Why do you want to do this? There may be better ways in 
> achieving you goal.
> 
> 
> 
> -----Original message-----
>> From:Jason S <[email protected]>
>> Sent: Tue 05-Mar-2013 22:18
>> To: [email protected]
>> Subject: keep all pages from a domain in one slice
>> 
>> Hello,
>> 
>> I seem to remember seeing a discussion about this in the past but I can't 
>> seem to find it in the archives.
>> 
>> When using mergesegs -slice, is it possible to keep all the pages from a 
>> domain in the same slice?  I have just been messing around with this 
>> functionality (Nutch 1.6), and it seems like the records are simply split 
>> after the counter has reached the slice size specified, sometimes splitting 
>> the records from a single domain over multiple slices. 
>> 
>> How can I segregate a domain to a single slice?
>> 
>> Thanks in advance,
>> 
>> ~Jason

Reply via email to