Re: Relevancy and random sorting

Alexandre Rocco Thu, 12 Jan 2012 05:39:44 -0800

Erick,

This document already has a field that indicates the source (site).
The issue we are trying to solve is when we list all documents without any
specific criteria. Since we bring the most recent ones and the ones that
contains images, we end up having a lot of listings from a single site,
since the documents are indexed in batches from the same site. At some
point we have several documents from the same site in the same date/time
and having images. I'm trying to give some random aspect to this search so
other documents can also appear in between that big dataset from the same
source.
Does the grouping help to achieve this?


Alexandre

On Thu, Jan 12, 2012 at 12:31 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> Alexandre:
>
> Have you thought about grouping? If you can analyze the incoming
> documents and include a field such that "similar" documents map
> to the same value, than group on that value you'll get output that
> isn't dominated by repeated copies of the "similar" documents. It
> depends, though, on being able to do a suitable mapping.
>
> In your case, could the mapping just be the site from which you
> got the data?
>
> Best
> Erick
>
> On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Rocco <alel...@gmail.com>
> wrote:
> > Erick,
> >
> > Probably I really written something silly. You are right on either
> sorting
> > by field or ranking.
> > I just need to change the ranking to shift things around as you said.
> >
> > To clarify the use case:
> > We have a listing aggregator that gets product listings from a lot of
> > different sites and since they are added in batches, sometimes you see a
> > lot of pages from the same source (site). We are working on some changes
> to
> > shift things around and reduce this "blocking" effect, so we can present
> > mixed sources on the result pages.
> >
> > I guess I will start with the document random field and later try to
> > develop a custom plugin to make things better.
> >
> > Thanks for the pointers.
> >
> > Regards,
> > Alexandre
> >
> > On Wed, Jan 11, 2012 at 1:58 PM, Erick Erickson <erickerick...@gmail.com
> >wrote:
> >
> >> I really don't understand what this means:
> >> "random sorting for the records but also preserving the ranking"
> >>
> >> Either you're sorting on rank or you're not. If you mean you're
> >> trying to shift things around just a little bit, *mostly* respecting
> >> relevance then I guess you can do what you're thinking.
> >>
> >> You could create your own function query to do the boosting, see:
> >> http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser
> >>
> >> which would keep you from having to re-index your data to get
> >> a different "randomness".
> >>
> >> You could also consider external file fields, but I think your
> >> own function query would be cleaner. I don't think math.random
> >> is a supported function OOB
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Rocco <alel...@gmail.com>
> >> wrote:
> >> > Hello all,
> >> >
> >> > Recently i've been trying to tweak some aspects of relevancy in one
> >> listing
> >> > project.
> >> > I need to give a higher score to newer documents and also boost the
> >> > document based on a boolean field that indicates the listing has
> >> pictures.
> >> > On top of that, in some situations we need a random sorting for the
> >> records
> >> > but also preserving the ranking.
> >> >
> >> > I tried to combine some techniques described in the Solr Relevancy FAQ
> >> > wiki, but when I add the random sorting, the ranking gets messy (as
> >> > expected).
> >> >
> >> > This works well:
> >> >
> >>
> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score
> >> >
> >> > This does not work, gives a random order on what is already ranked
> >> >
> >>
> http://localhost:18979/solr/select/?start=0&rows=15&q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22&fl=*,score&sort=random_1+desc
> >> >
> >> > The only way I see is to create another field on the schema
> containing a
> >> > random value and use it to boost the document the same way that was
> tone
> >> on
> >> > the boolean field.
> >> > Anyone tried something like this before and knows some way to get it
> >> > working?
> >> >
> >> > Thanks,
> >> > Alexandre
> >>
>

Re: Relevancy and random sorting

Reply via email to