subject:"Using Sentence Information For Snippet Generation"

Re: Using Sentence Information For Snippet Generation

2014-04-07 Thread Dmitry Kan

Furkan,

I haven't worked with the boundary scanner before, but one thing I had to
tweak with position increments was the highlighter component itself.
Because it started to throw exceptions. The solution is described in this
thread (a conversation with myself :) )

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3CCAHUAEU_qjKcgzrxtM=x90_j8i5v0a5h0mtq4b0+0etxc7q0...@mail.gmail.com%3E

HTH,
Dmitry

On Sun, Apr 6, 2014 at 12:44 AM, Furkan KAMACI furkankam...@gmail.comwrote:

Hi Dmitry;

I think that such kind of hacking may reduce the search speed. I think that
it should be done with boundary scanner isn't it? I think that bs.type=LINE
is what I am looking for? There is one more point. I want to do that for
Turkish language and I think that I should customize it or if I put special
characters to point boundaries I can use simple boundary scanner?

Thanks;
Furkan KAMACI

2014-03-24 21:14 GMT+02:00 Dmitry Kan solrexp...@gmail.com:

Hi Furkan,

I have done an implementation with a custom filler (special character)
sequence in between sentences. A better solution I landed at was
increasing
the position of each sentence's first token by a large number, like 1
(perhaps, a smaller number could be used too). Then a user search can be
conducted with a proximity query: some tokens ~5000 (the recently
committed complexphrase parser supports rich phrase syntax, for example).
This of course expects that a sentence fits the 5000 window size and the
total number of sentences in the field * 10k does not exceed
Integer.MAX_VALUE. Then on the highlighter side you'd get the hits within
sentences naturally.

Is this something you are looking for?

Dmitry

On Mon, Mar 24, 2014 at 5:43 PM, Furkan KAMACI furkankam...@gmail.com
wrote:

Hi;

When I generate snippet via Solr I do not want to remove beginning of
any
sentence at the snippet. So I need to do a sentence detection. I think
that
I can do it before I send documents into Solr. I can put some special
characters that signs beginning or end of a sentence. Then I can use
that
information when generating snippet. On the other hand I should not
show
that special character to the user.

What do you think that how can I do it or do you have any other ideas
for
my purpose?

PS: I do not do it for English sentences.

Thanks;
Furkan KAMACI

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Re: Using Sentence Information For Snippet Generation

2014-04-05 Thread Furkan KAMACI

Hi Dmitry;

I think that such kind of hacking may reduce the search speed. I think that
it should be done with boundary scanner isn't it? I think that bs.type=LINE
is what I am looking for? There is one more point. I want to do that for
Turkish language and I think that I should customize it or if I put special
characters to point boundaries I can use simple boundary scanner?

Thanks;
Furkan KAMACI



2014-03-24 21:14 GMT+02:00 Dmitry Kan solrexp...@gmail.com:

 Hi Furkan,

 I have done an implementation with a custom filler (special character)
 sequence in between sentences. A better solution I landed at was increasing
 the position of each sentence's first token by a large number, like 1
 (perhaps, a smaller number could be used too). Then a user search can be
 conducted with a proximity query: some tokens ~5000 (the recently
 committed complexphrase parser supports rich phrase syntax, for example).
 This of course expects that a sentence fits the 5000 window size and the
 total number of sentences in the field * 10k does not exceed
 Integer.MAX_VALUE. Then on the highlighter side you'd get the hits within
 sentences naturally.

 Is this something you are looking for?

 Dmitry



 On Mon, Mar 24, 2014 at 5:43 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:

  Hi;
 
  When I generate snippet via Solr I do not want to remove beginning of any
  sentence at the snippet. So I need to do a sentence detection. I think
 that
  I can do it before I send documents into Solr. I can put some special
  characters that signs beginning or end of a sentence. Then I can use that
  information when generating snippet. On the other hand I should not show
  that special character to the user.
 
  What do you think that how can I do it or do you have any other ideas for
  my purpose?
 
  PS: I do not do it for English sentences.
 
  Thanks;
  Furkan KAMACI
 



 --
 Dmitry
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan

Using Sentence Information For Snippet Generation

2014-03-24 Thread Furkan KAMACI

Hi;

When I generate snippet via Solr I do not want to remove beginning of any
sentence at the snippet. So I need to do a sentence detection. I think that
I can do it before I send documents into Solr. I can put some special
characters that signs beginning or end of a sentence. Then I can use that
information when generating snippet. On the other hand I should not show
that special character to the user.

What do you think that how can I do it or do you have any other ideas for
my purpose?

PS: I do not do it for English sentences.

Thanks;
Furkan KAMACI

Re: Using Sentence Information For Snippet Generation

2014-03-24 Thread Dmitry Kan

Hi Furkan,

I have done an implementation with a custom filler (special character)
sequence in between sentences. A better solution I landed at was increasing
the position of each sentence's first token by a large number, like 1
(perhaps, a smaller number could be used too). Then a user search can be
conducted with a proximity query: some tokens ~5000 (the recently
committed complexphrase parser supports rich phrase syntax, for example).
This of course expects that a sentence fits the 5000 window size and the
total number of sentences in the field * 10k does not exceed
Integer.MAX_VALUE. Then on the highlighter side you'd get the hits within
sentences naturally.

Is this something you are looking for?

Dmitry



On Mon, Mar 24, 2014 at 5:43 PM, Furkan KAMACI furkankam...@gmail.comwrote:

 Hi;

 When I generate snippet via Solr I do not want to remove beginning of any
 sentence at the snippet. So I need to do a sentence detection. I think that
 I can do it before I send documents into Solr. I can put some special
 characters that signs beginning or end of a sentence. Then I can use that
 information when generating snippet. On the other hand I should not show
 that special character to the user.

 What do you think that how can I do it or do you have any other ideas for
 my purpose?

 PS: I do not do it for English sentences.

 Thanks;
 Furkan KAMACI




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Re: Using Sentence Information For Snippet Generation

Re: Using Sentence Information For Snippet Generation

Using Sentence Information For Snippet Generation

Re: Using Sentence Information For Snippet Generation

4 matches

Site Navigation

Mail list logo

Footer information