Hello Alexis, see inline.

Regards,
Markus 
 
-----Original message-----
> From:IZaBEE_Keeper <ale...@dvynedesign.com>
> Sent: Wednesday 20th March 2019 1:28
> To: user@nutch.apache.org
> Subject: RE: Limiting Results From Single Domain
> 
> Markus Jelsma-2 wrote
> > Hello Alexis,
> > 
> > This is definately a question for Solr. Regardless of that, you choice is
> > between Solr's Result Grouping component, or FieldCollapsing filter query
> > parser.
> > 
> > Regards,
> > Markus
> 
> Thank you..  
> 
> I kinda figured that I'd need to figure out how to use the FieldCollapsing
> query parser & figure out how to make it work on a per hostname basis from
> the hostname field.. I'm not too sure on how to write the function for it
> but I should be able to figure it out..

fq={!collapse field=host}

keep in mind, for this to work equal hosts must be indexed into equals shards.
 
> I'm hopeful though that nutch might solve some of this for me as it indexes
> another billion pages.. It seems to be less frequent with more pages added
> to the index from multiple domains..

Nutch, out-of-the-box, can't solve this for you, unless you crawl or index 
less. Or get rid of a decent amount of duplicates, which are usually around if 
you crawl a few billion pages.

> 
> Thanks again..  :)
> 
> 
> 
> 
> -----
> Bee Keeper at IZaBEE.com
> --
> Sent from: http://lucene.472066.n3.nabble.com/Nutch-User-f603147.html
> 

Reply via email to