Hi Jason,
There is nothing I can see here which concerns Nutch.
Try solr lists please.
Thank you
Lewis

On Tuesday, March 5, 2013, Stubblefield Jason <
[email protected]> wrote:
> I have several Solr 3.6 instances that for various reasons, I don't want
to upgrade to 4.0 yet.  My index is too big to fit on one machine.  I want
to be able to slice the crawl so that I can have 1 slice per solr shard,
but also use the grouping feature on solr.  From what I understand, solr
grouping doesn't work properly when pages from a domain are spread across
solr shards.
>
> Basically i'm after something like this:
>
> slice1 (apache.org, linux.org) -> solr1
>
> slice2 (stackoverflow.com, wikipedia.org) -> solr2
>
> etc...
>
> I could upgrade to Solrcloud, or possibly use elasticsearch, but it would
be a fair amount of re-coding.  I was just curious if I could manage the
sharding manually.
>
> Suggestions would certainly be appreciated, it seems like I am faced with
a massive upgrade or to break the grouping functionality.
>
> ~Jason
>
> On Mar 5, 2013, at 11:02 PM, Markus Jelsma <[email protected]>
wrote:
>
>> Hi
>>
>> You can't do this with -slice but you can merge segments and filter
them. This would mean you'd have to merge the segments for each domain. But
that's far too much work. Why do you want to do this? There may be better
ways in achieving you goal.
>>
>>
>>
>> -----Original message-----
>>> From:Jason S <[email protected]>
>>> Sent: Tue 05-Mar-2013 22:18
>>> To: [email protected]
>>> Subject: keep all pages from a domain in one slice
>>>
>>> Hello,
>>>
>>> I seem to remember seeing a discussion about this in the past but I
can't seem to find it in the archives.
>>>
>>> When using mergesegs -slice, is it possible to keep all the pages from
a domain in the same slice?  I have just been messing around with this
functionality (Nutch 1.6), and it seems like the records are simply split
after the counter has reached the slice size specified, sometimes splitting
the records from a single domain over multiple slices.
>>>
>>> How can I segregate a domain to a single slice?
>>>
>>> Thanks in advance,
>>>
>>> ~Jason
>
>

-- 
*Lewis*

Reply via email to