Sorry for the delayed response.

Limitations in this scenario where we have 5 million indexed documents from
about only 1000 sites. If results are grouped by site we will not be able
to show more than a couple of pages for lot of search keywords.


Ex: Search for "Solr" has 1000 matches but only from 20 sites.
In these 20 sites
10 sites are of sitetype A - boost 5
7 sites are of sitetype B - boost 2
3 sites are of sitetype C - boost 1

Limitation 1: If these are grouped by site only 20 results would be
displayed in 2 pages (10 per page).

We still want to display all the results. For a better user experience
"Ideally" we would like to have 10 results in page 1  from 10 distinct
sites of sitetype A (which has higher boost already) or In a real world
scenario from 7-8 distinct sites. In our case we see like 7 matches on a
page from a single site.

Limitation 2: Inverse Document frequency (IDF) would have helped here but,
in that case our preferential boost for sitetypes is ignored and some
results from sitetype C would come on top due to IDF boost.

What we want to achieve is any way to control variety of sites displayed in
search results with preferential boost still in place.

Thanks in advance




On Sun, Sep 8, 2013 at 6:36 AM, Furkan KAMACI <furkankam...@gmail.com>wrote:

> What do you mean with "*these limitations" *Do you want to make multiple
> grouping at same time?
>
>
> 2013/9/6 Sai Gadde <gadde....@gmail.com>
>
> > Thank you Jack for the suggestion.
> >
> > We can try group by site. But considering that number of sites are only
> > about 1000 against the index size of 5 million, One can expect most of
> the
> > hits would be hidden and for certain specific keywords only a handful of
> > actual results could be displayed if results are grouped by site.
> >
> > we already group on a signature field to identify duplicate content in
> > these 5 million+ docs. But here the number of duplicates are only about
> > 3-5% maximum.
> >
> > Is there any workaround for these limitations with grouping?
> >
> > Thanks
> > Shyam
> >
> >
> >
> > On Thu, Sep 5, 2013 at 9:16 PM, Jack Krupansky <j...@basetechnology.com
> > >wrote:
> >
> > > The grouping (field collapsing) feature somewhat addresses this - group
> > by
> > > a "site" field and then if more than one or a few top pages are from
> the
> > > same site they get grouped or collapsed so that you can see more sites
> > in a
> > > few results.
> > >
> > > See:
> > > http://wiki.apache.org/solr/**FieldCollapsing<
> > http://wiki.apache.org/solr/FieldCollapsing>
> > > https://cwiki.apache.org/**confluence/display/solr/**Result+Grouping<
> > https://cwiki.apache.org/confluence/display/solr/Result+Grouping>
> > >
> > > -- Jack Krupansky
> > >
> > > -----Original Message----- From: Sai Gadde
> > > Sent: Thursday, September 05, 2013 2:27 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Tweaking boosts for more search results variety
> > >
> > >
> > > Our index is aggregated content from various sites on the web. We want
> > good
> > > user experience by showing multiple sites in the search results. In our
> > > setup we are seeing most of the results from same site on the top.
> > >
> > > Here is some information regarding queries and schema
> > >                site - String field. We have about 1000 sites in index
> > >                sitetype - String field.  we have 3 site types
> > > omitNorms="true" for both the fields
> > >
> > > Doc count varies largely based on site and sitetype by a factor of 10 -
> > > 1000 times
> > > Total index size is about 5 million docs.
> > > Solr Version: 4.0
> > >
> > > In our queries we have a fixed and preferential boost for certain
> sites.
> > > sitetype has different and fixed boosts for 3 possible values. We
> turned
> > > off Inverse Document Frequency (IDF) for these boosts to work properly.
> > > Other text fields are boosted based on search keywords only.
> > >
> > > With this setup we often see a bunch of hits from a single site
> followed
> > by
> > > next etc.,
> > > Is there any solution to see results from variety of sites and still
> keep
> > > the preferential boosts in place?
> > >
> >
>

Reply via email to