Furkan, I haven't worked with the boundary scanner before, but one thing I had to tweak with position increments was the highlighter component itself. Because it started to throw exceptions. The solution is described in this thread (a conversation with myself :) )
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3CCAHUAEU_qjKcgzrxtM=x90_j8i5v0a5h0mtq4b0+0etxc7q0...@mail.gmail.com%3E HTH, Dmitry On Sun, Apr 6, 2014 at 12:44 AM, Furkan KAMACI <furkankam...@gmail.com>wrote: > Hi Dmitry; > > I think that such kind of hacking may reduce the search speed. I think that > it should be done with boundary scanner isn't it? I think that bs.type=LINE > is what I am looking for? There is one more point. I want to do that for > Turkish language and I think that I should customize it or if I put special > characters to point boundaries I can use simple boundary scanner? > > Thanks; > Furkan KAMACI > > > > 2014-03-24 21:14 GMT+02:00 Dmitry Kan <solrexp...@gmail.com>: > > > Hi Furkan, > > > > I have done an implementation with a custom filler (special character) > > sequence in between sentences. A better solution I landed at was > increasing > > the position of each sentence's first token by a large number, like 10000 > > (perhaps, a smaller number could be used too). Then a user search can be > > conducted with a proximity query: "some tokens" ~5000 (the recently > > committed complexphrase parser supports rich phrase syntax, for example). > > This of course expects that a sentence fits the 5000 window size and the > > total number of sentences in the field * 10k does not exceed > > Integer.MAX_VALUE. Then on the highlighter side you'd get the hits within > > sentences naturally. > > > > Is this something you are looking for? > > > > Dmitry > > > > > > > > On Mon, Mar 24, 2014 at 5:43 PM, Furkan KAMACI <furkankam...@gmail.com > > >wrote: > > > > > Hi; > > > > > > When I generate snippet via Solr I do not want to remove beginning of > any > > > sentence at the snippet. So I need to do a sentence detection. I think > > that > > > I can do it before I send documents into Solr. I can put some special > > > characters that signs beginning or end of a sentence. Then I can use > that > > > information when generating snippet. On the other hand I should not > show > > > that special character to the user. > > > > > > What do you think that how can I do it or do you have any other ideas > for > > > my purpose? > > > > > > PS: I do not do it for English sentences. > > > > > > Thanks; > > > Furkan KAMACI > > > > > > > > > > > -- > > Dmitry > > Blog: http://dmitrykan.blogspot.com > > Twitter: http://twitter.com/dmitrykan > > > -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan