Re: access matched token ids in the FacetComponent?

Dmitry Kan Mon, 21 Jan 2013 05:26:05 -0800

Mikhail,

the griddynamics blog's link returns "Sorry, the page you were looking for
in this blog does not exist." Could you check if it is still available,
would be interesting to see the details!


Dmitry

On Mon, Jan 21, 2013 at 2:33 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Dmitry,
>
> First of all, FacetComponent is the Solr's out-of-the-box functionality. It
> runs after search is done and accesses the bitSet of the found document,
> i.e. there is no spans (matched terms positions) there at all.
>
> StandardFacetsAccumulator sounds like the "brand new" lucene faceting
> library. see http://shaierera.blogspot.com/. I don't think but don't
> exactly know whether they are accessible there too.
>
> Some time ago my team successfully prototyped facet component backed on
> spans
> blog.griddynamics.com/2011/10/solr-experience-search-parent-child.htmlbut
> I don't suggest you go this way.
> I can suggest you start from the following:
> - supply PostFilter/DelegatingCollector
> http://yonik.com/posts/advanced-filter-caching-in-solr/
> - the DelegatingCollector will accept the scorer instance
> - if this scorer is BooleanScorer2 (but not BooleanScorer!), you can access
> the SpanQueryScorer in one of the legs and try to access the matched spans
> - if you are in 3.x you'll have a problem with disjunction queries.
>
> it seems challenging, doesn't it?
>
> 18.01.2013 17:40 пользователь "Dmitry Kan" <solrexp...@gmail.com> написал:
>
> > Mikhail,
> >
> > Do you say, that it is not possible to access the matched terms positions
> > in the FacetComponent? If that would be possible (somewhere in the
> > StandardFacetsAccumulator class, where docids are available), then by
> > knowing the matched term positions I can do some school simple math to
> > calculate the sentence counts per doc id.
> >
> > Dmitry
> >
> > On Fri, Jan 18, 2013 at 2:45 PM, Mikhail Khludnev <
> > mkhlud...@griddynamics.com> wrote:
> >
> > > Dmitry,
> > >
> > > It definitely seems like postptocessing highlighter's output. The also
> > > approach is:
> > > - limit number of occurrences of a word in a sentence to 1
> > > - play with facet by function patch
> > > https://issues.apache.org/jira/browse/SOLR-1581 accomplished by tf()
> > > function.
> > >
> > > It doesn't seem like much help.
> > >
> > > On Fri, Jan 18, 2013 at 12:42 PM, Dmitry Kan <solrexp...@gmail.com>
> > wrote:
> > >
> > > > that we actually require the count of the sentences inside
> > > > each document where the hits were found.
> > > >
> > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > <http://www.griddynamics.com>
> > >  <mkhlud...@griddynamics.com>
> > >
> >
>

Re: access matched token ids in the FacetComponent?

Reply via email to