Re: Using Sentence Information For Snippet Generation

2014-04-07 Thread Dmitry Kan
Furkan,

I haven't worked with the boundary scanner before, but one thing I had to
tweak with position increments was the highlighter component itself.
Because it started to throw exceptions. The solution is described in this
thread (a conversation with myself :) )

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3CCAHUAEU_qjKcgzrxtM=x90_j8i5v0a5h0mtq4b0+0etxc7q0...@mail.gmail.com%3E

HTH,
Dmitry


On Sun, Apr 6, 2014 at 12:44 AM, Furkan KAMACI furkankam...@gmail.comwrote:

 Hi Dmitry;

 I think that such kind of hacking may reduce the search speed. I think that
 it should be done with boundary scanner isn't it? I think that bs.type=LINE
 is what I am looking for? There is one more point. I want to do that for
 Turkish language and I think that I should customize it or if I put special
 characters to point boundaries I can use simple boundary scanner?

 Thanks;
 Furkan KAMACI



 2014-03-24 21:14 GMT+02:00 Dmitry Kan solrexp...@gmail.com:

  Hi Furkan,
 
  I have done an implementation with a custom filler (special character)
  sequence in between sentences. A better solution I landed at was
 increasing
  the position of each sentence's first token by a large number, like 1
  (perhaps, a smaller number could be used too). Then a user search can be
  conducted with a proximity query: some tokens ~5000 (the recently
  committed complexphrase parser supports rich phrase syntax, for example).
  This of course expects that a sentence fits the 5000 window size and the
  total number of sentences in the field * 10k does not exceed
  Integer.MAX_VALUE. Then on the highlighter side you'd get the hits within
  sentences naturally.
 
  Is this something you are looking for?
 
  Dmitry
 
 
 
  On Mon, Mar 24, 2014 at 5:43 PM, Furkan KAMACI furkankam...@gmail.com
  wrote:
 
   Hi;
  
   When I generate snippet via Solr I do not want to remove beginning of
 any
   sentence at the snippet. So I need to do a sentence detection. I think
  that
   I can do it before I send documents into Solr. I can put some special
   characters that signs beginning or end of a sentence. Then I can use
 that
   information when generating snippet. On the other hand I should not
 show
   that special character to the user.
  
   What do you think that how can I do it or do you have any other ideas
 for
   my purpose?
  
   PS: I do not do it for English sentences.
  
   Thanks;
   Furkan KAMACI
  
 
 
 
  --
  Dmitry
  Blog: http://dmitrykan.blogspot.com
  Twitter: http://twitter.com/dmitrykan
 




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Re: Using Sentence Information For Snippet Generation

2014-04-05 Thread Furkan KAMACI
Hi Dmitry;

I think that such kind of hacking may reduce the search speed. I think that
it should be done with boundary scanner isn't it? I think that bs.type=LINE
is what I am looking for? There is one more point. I want to do that for
Turkish language and I think that I should customize it or if I put special
characters to point boundaries I can use simple boundary scanner?

Thanks;
Furkan KAMACI



2014-03-24 21:14 GMT+02:00 Dmitry Kan solrexp...@gmail.com:

 Hi Furkan,

 I have done an implementation with a custom filler (special character)
 sequence in between sentences. A better solution I landed at was increasing
 the position of each sentence's first token by a large number, like 1
 (perhaps, a smaller number could be used too). Then a user search can be
 conducted with a proximity query: some tokens ~5000 (the recently
 committed complexphrase parser supports rich phrase syntax, for example).
 This of course expects that a sentence fits the 5000 window size and the
 total number of sentences in the field * 10k does not exceed
 Integer.MAX_VALUE. Then on the highlighter side you'd get the hits within
 sentences naturally.

 Is this something you are looking for?

 Dmitry



 On Mon, Mar 24, 2014 at 5:43 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:

  Hi;
 
  When I generate snippet via Solr I do not want to remove beginning of any
  sentence at the snippet. So I need to do a sentence detection. I think
 that
  I can do it before I send documents into Solr. I can put some special
  characters that signs beginning or end of a sentence. Then I can use that
  information when generating snippet. On the other hand I should not show
  that special character to the user.
 
  What do you think that how can I do it or do you have any other ideas for
  my purpose?
 
  PS: I do not do it for English sentences.
 
  Thanks;
  Furkan KAMACI
 



 --
 Dmitry
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan



Using Sentence Information For Snippet Generation

2014-03-24 Thread Furkan KAMACI
Hi;

When I generate snippet via Solr I do not want to remove beginning of any
sentence at the snippet. So I need to do a sentence detection. I think that
I can do it before I send documents into Solr. I can put some special
characters that signs beginning or end of a sentence. Then I can use that
information when generating snippet. On the other hand I should not show
that special character to the user.

What do you think that how can I do it or do you have any other ideas for
my purpose?

PS: I do not do it for English sentences.

Thanks;
Furkan KAMACI


Re: Using Sentence Information For Snippet Generation

2014-03-24 Thread Dmitry Kan
Hi Furkan,

I have done an implementation with a custom filler (special character)
sequence in between sentences. A better solution I landed at was increasing
the position of each sentence's first token by a large number, like 1
(perhaps, a smaller number could be used too). Then a user search can be
conducted with a proximity query: some tokens ~5000 (the recently
committed complexphrase parser supports rich phrase syntax, for example).
This of course expects that a sentence fits the 5000 window size and the
total number of sentences in the field * 10k does not exceed
Integer.MAX_VALUE. Then on the highlighter side you'd get the hits within
sentences naturally.

Is this something you are looking for?

Dmitry



On Mon, Mar 24, 2014 at 5:43 PM, Furkan KAMACI furkankam...@gmail.comwrote:

 Hi;

 When I generate snippet via Solr I do not want to remove beginning of any
 sentence at the snippet. So I need to do a sentence detection. I think that
 I can do it before I send documents into Solr. I can put some special
 characters that signs beginning or end of a sentence. Then I can use that
 information when generating snippet. On the other hand I should not show
 that special character to the user.

 What do you think that how can I do it or do you have any other ideas for
 my purpose?

 PS: I do not do it for English sentences.

 Thanks;
 Furkan KAMACI




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan