I have several Solr 3.6 instances that for various reasons, I don't want to upgrade to 4.0 yet. My index is too big to fit on one machine. I want to be able to slice the crawl so that I can have 1 slice per solr shard, but also use the grouping feature on solr. From what I understand, solr grouping doesn't work properly when pages from a domain are spread across solr shards.
Basically i'm after something like this: slice1 (apache.org, linux.org) -> solr1 slice2 (stackoverflow.com, wikipedia.org) -> solr2 etc... I could upgrade to Solrcloud, or possibly use elasticsearch, but it would be a fair amount of re-coding. I was just curious if I could manage the sharding manually. Suggestions would certainly be appreciated, it seems like I am faced with a massive upgrade or to break the grouping functionality. ~Jason On Mar 5, 2013, at 11:02 PM, Markus Jelsma <[email protected]> wrote: > Hi > > You can't do this with -slice but you can merge segments and filter them. > This would mean you'd have to merge the segments for each domain. But that's > far too much work. Why do you want to do this? There may be better ways in > achieving you goal. > > > > -----Original message----- >> From:Jason S <[email protected]> >> Sent: Tue 05-Mar-2013 22:18 >> To: [email protected] >> Subject: keep all pages from a domain in one slice >> >> Hello, >> >> I seem to remember seeing a discussion about this in the past but I can't >> seem to find it in the archives. >> >> When using mergesegs -slice, is it possible to keep all the pages from a >> domain in the same slice? I have just been messing around with this >> functionality (Nutch 1.6), and it seems like the records are simply split >> after the counter has reached the slice size specified, sometimes splitting >> the records from a single domain over multiple slices. >> >> How can I segregate a domain to a single slice? >> >> Thanks in advance, >> >> ~Jason

