Well Lewis, I quite frankly disagree.  

I am asking how I can have more control for the slice process in the nutch 
mergesegs operation.

I think this could be a useful feature to many Nutch users.

I can see that I wont get any more assistance here.

Thanks,

Jason



On Mar 6, 2013, at 6:18 AM, Lewis John Mcgibbney <[email protected]> 
wrote:

> Hi Jason,
> There is nothing I can see here which concerns Nutch.
> Try solr lists please.
> Thank you
> Lewis
> 
> On Tuesday, March 5, 2013, Stubblefield Jason <
> [email protected]> wrote:
>> I have several Solr 3.6 instances that for various reasons, I don't want
> to upgrade to 4.0 yet.  My index is too big to fit on one machine.  I want
> to be able to slice the crawl so that I can have 1 slice per solr shard,
> but also use the grouping feature on solr.  From what I understand, solr
> grouping doesn't work properly when pages from a domain are spread across
> solr shards.
>> 
>> Basically i'm after something like this:
>> 
>> slice1 (apache.org, linux.org) -> solr1
>> 
>> slice2 (stackoverflow.com, wikipedia.org) -> solr2
>> 
>> etc...
>> 
>> I could upgrade to Solrcloud, or possibly use elasticsearch, but it would
> be a fair amount of re-coding.  I was just curious if I could manage the
> sharding manually.
>> 
>> Suggestions would certainly be appreciated, it seems like I am faced with
> a massive upgrade or to break the grouping functionality.
>> 
>> ~Jason
>> 
>> On Mar 5, 2013, at 11:02 PM, Markus Jelsma <[email protected]>
> wrote:
>> 
>>> Hi
>>> 
>>> You can't do this with -slice but you can merge segments and filter
> them. This would mean you'd have to merge the segments for each domain. But
> that's far too much work. Why do you want to do this? There may be better
> ways in achieving you goal.
>>> 
>>> 
>>> 
>>> -----Original message-----
>>>> From:Jason S <[email protected]>
>>>> Sent: Tue 05-Mar-2013 22:18
>>>> To: [email protected]
>>>> Subject: keep all pages from a domain in one slice
>>>> 
>>>> Hello,
>>>> 
>>>> I seem to remember seeing a discussion about this in the past but I
> can't seem to find it in the archives.
>>>> 
>>>> When using mergesegs -slice, is it possible to keep all the pages from
> a domain in the same slice?  I have just been messing around with this
> functionality (Nutch 1.6), and it seems like the records are simply split
> after the counter has reached the slice size specified, sometimes splitting
> the records from a single domain over multiple slices.
>>>> 
>>>> How can I segregate a domain to a single slice?
>>>> 
>>>> Thanks in advance,
>>>> 
>>>> ~Jason
>> 
>> 
> 
> -- 
> *Lewis*

Reply via email to