Re: Using Sentence Information For Snippet Generation

Dmitry Kan Mon, 07 Apr 2014 03:14:26 -0700

Furkan,

I haven't worked with the boundary scanner before, but one thing I had to
tweak with position increments was the highlighter component itself.
Because it started to throw exceptions. The solution is described in this
thread (a conversation with myself :) )


http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3CCAHUAEU_qjKcgzrxtM=x90_j8i5v0a5h0mtq4b0+0etxc7q0...@mail.gmail.com%3E

HTH,
Dmitry


On Sun, Apr 6, 2014 at 12:44 AM, Furkan KAMACI <furkankam...@gmail.com>wrote:

> Hi Dmitry;
>
> I think that such kind of hacking may reduce the search speed. I think that
> it should be done with boundary scanner isn't it? I think that bs.type=LINE
> is what I am looking for? There is one more point. I want to do that for
> Turkish language and I think that I should customize it or if I put special
> characters to point boundaries I can use simple boundary scanner?
>
> Thanks;
> Furkan KAMACI
>
>
>
> 2014-03-24 21:14 GMT+02:00 Dmitry Kan <solrexp...@gmail.com>:
>
> > Hi Furkan,
> >
> > I have done an implementation with a custom filler (special character)
> > sequence in between sentences. A better solution I landed at was
> increasing
> > the position of each sentence's first token by a large number, like 10000
> > (perhaps, a smaller number could be used too). Then a user search can be
> > conducted with a proximity query: "some tokens" ~5000 (the recently
> > committed complexphrase parser supports rich phrase syntax, for example).
> > This of course expects that a sentence fits the 5000 window size and the
> > total number of sentences in the field * 10k does not exceed
> > Integer.MAX_VALUE. Then on the highlighter side you'd get the hits within
> > sentences naturally.
> >
> > Is this something you are looking for?
> >
> > Dmitry
> >
> >
> >
> > On Mon, Mar 24, 2014 at 5:43 PM, Furkan KAMACI <furkankam...@gmail.com
> > >wrote:
> >
> > > Hi;
> > >
> > > When I generate snippet via Solr I do not want to remove beginning of
> any
> > > sentence at the snippet. So I need to do a sentence detection. I think
> > that
> > > I can do it before I send documents into Solr. I can put some special
> > > characters that signs beginning or end of a sentence. Then I can use
> that
> > > information when generating snippet. On the other hand I should not
> show
> > > that special character to the user.
> > >
> > > What do you think that how can I do it or do you have any other ideas
> for
> > > my purpose?
> > >
> > > PS: I do not do it for English sentences.
> > >
> > > Thanks;
> > > Furkan KAMACI
> > >
> >
> >
> >
> > --
> > Dmitry
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> >
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Re: Using Sentence Information For Snippet Generation

Reply via email to