Hello Alexis, see inline. Regards, Markus -----Original message----- > From:IZaBEE_Keeper <ale...@dvynedesign.com> > Sent: Wednesday 20th March 2019 1:28 > To: user@nutch.apache.org > Subject: RE: Limiting Results From Single Domain > > Markus Jelsma-2 wrote > > Hello Alexis, > > > > This is definately a question for Solr. Regardless of that, you choice is > > between Solr's Result Grouping component, or FieldCollapsing filter query > > parser. > > > > Regards, > > Markus > > Thank you.. > > I kinda figured that I'd need to figure out how to use the FieldCollapsing > query parser & figure out how to make it work on a per hostname basis from > the hostname field.. I'm not too sure on how to write the function for it > but I should be able to figure it out..
fq={!collapse field=host} keep in mind, for this to work equal hosts must be indexed into equals shards. > I'm hopeful though that nutch might solve some of this for me as it indexes > another billion pages.. It seems to be less frequent with more pages added > to the index from multiple domains.. Nutch, out-of-the-box, can't solve this for you, unless you crawl or index less. Or get rid of a decent amount of duplicates, which are usually around if you crawl a few billion pages. > > Thanks again.. :) > > > > > ----- > Bee Keeper at IZaBEE.com > -- > Sent from: http://lucene.472066.n3.nabble.com/Nutch-User-f603147.html >