Re: Weight servers differently

Johan Svensson Wed, 31 Aug 2011 06:58:47 -0700

Thank you, Markus,

At current rate, I just want this to work. I have no idea whether I want to
omitNorms or not. At the moment of writing, I don't feel like that, anyway.
More importantly, I want to boost pages which site field is www.example.com
 over blog.example.com, but without omitting hits on blog.example.com.


The query boost seems to filter out hits from blog.example.com completely,
so that is not what I want.

Abusing the boost field might be a nice idea. Can you please show me an
example, presuming I don't really understand the connection between all the
xml files and binaries. Not even really which one of solr and nutch is
responsible for which task... :)

2011/8/31 Markus Jelsma <[email protected]>

> Index-time boosting is not something very common and raises issues if you
> want
> to omitNorms in Solr.
>
> In Solr DisMax you can use a bq (boost query) to boost site:example.com
> ^10.
> All results that match the boost query receive a ^10 boost. This is only
> client side.
>
> You can also abuse the boost field Nutch is writing. By default this is
> 1.0f.
> You can write a simple scoring filter or even an indexing filter that
> check's
> the site field for your site and sets the boost field accordingly.
>
> On Wednesday 31 August 2011 15:30:08 Johan Svensson wrote:
> > I guess this is the solution. Though, I have been trying to implement
> this
> > the whole afternoon with no success. I have a field "site" in my
> > scheme.xml, stored and indexed. I'm using nutch -solrindex to tell solr
> to
> > index what nutch has crawled. How can I tell nutch to tell solr to boost
> > all documents with the value "www.example.com" of the "site" field? An
> > example would be perfect for a loser like myself. I've googled all the
> > Internets over and over.
> >
> > 2011/8/31 Gora Mohanty <[email protected]>
> >
> > > On Wed, Aug 31, 2011 at 2:51 PM, Johan Svensson
> > >
> > > <[email protected]> wrote:
> > > > Thank you! This looks interesting. However, I wonder if it really can
> > >
> > > solve
> > >
> > > > this problem. No part of the search query is by necessary means part
> of
> > >
> > > the
> > >
> > > > domain name. Let's say for example that we search for "foobar". On
> > > > www.example.com/page42.html this word is found, as well for lots of
> > >
> > > pages
> > >
> > > > with different names at blog.example.com/. Can you apply boosting
> magic
> > >
> > > for
> > >
> > > > the hit at www.example.com although the search term is not a part of
> > > > the url?
> > >
> > > Presumably, you know the domain name from which the
> > > document originates at indexing time. If so, you can use
> > > index-time boosting:
> > > http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts
> > > E.g., this can be used to boost all documents from www.example.com
> > > over those from blog.example.com.
> > >
> > > Regards,
> > > Gora
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Re: Weight servers differently

Reply via email to