Re: Using Sentence Information For Snippet Generation
Furkan, I haven't worked with the boundary scanner before, but one thing I had to tweak with position increments was the highlighter component itself. Because it started to throw exceptions. The solution is described in this thread (a conversation with myself :) ) http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3CCAHUAEU_qjKcgzrxtM=x90_j8i5v0a5h0mtq4b0+0etxc7q0...@mail.gmail.com%3E HTH, Dmitry On Sun, Apr 6, 2014 at 12:44 AM, Furkan KAMACI furkankam...@gmail.comwrote: Hi Dmitry; I think that such kind of hacking may reduce the search speed. I think that it should be done with boundary scanner isn't it? I think that bs.type=LINE is what I am looking for? There is one more point. I want to do that for Turkish language and I think that I should customize it or if I put special characters to point boundaries I can use simple boundary scanner? Thanks; Furkan KAMACI 2014-03-24 21:14 GMT+02:00 Dmitry Kan solrexp...@gmail.com: Hi Furkan, I have done an implementation with a custom filler (special character) sequence in between sentences. A better solution I landed at was increasing the position of each sentence's first token by a large number, like 1 (perhaps, a smaller number could be used too). Then a user search can be conducted with a proximity query: some tokens ~5000 (the recently committed complexphrase parser supports rich phrase syntax, for example). This of course expects that a sentence fits the 5000 window size and the total number of sentences in the field * 10k does not exceed Integer.MAX_VALUE. Then on the highlighter side you'd get the hits within sentences naturally. Is this something you are looking for? Dmitry On Mon, Mar 24, 2014 at 5:43 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; When I generate snippet via Solr I do not want to remove beginning of any sentence at the snippet. So I need to do a sentence detection. I think that I can do it before I send documents into Solr. I can put some special characters that signs beginning or end of a sentence. Then I can use that information when generating snippet. On the other hand I should not show that special character to the user. What do you think that how can I do it or do you have any other ideas for my purpose? PS: I do not do it for English sentences. Thanks; Furkan KAMACI -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan
Re: Using Sentence Information For Snippet Generation
Hi Dmitry; I think that such kind of hacking may reduce the search speed. I think that it should be done with boundary scanner isn't it? I think that bs.type=LINE is what I am looking for? There is one more point. I want to do that for Turkish language and I think that I should customize it or if I put special characters to point boundaries I can use simple boundary scanner? Thanks; Furkan KAMACI 2014-03-24 21:14 GMT+02:00 Dmitry Kan solrexp...@gmail.com: Hi Furkan, I have done an implementation with a custom filler (special character) sequence in between sentences. A better solution I landed at was increasing the position of each sentence's first token by a large number, like 1 (perhaps, a smaller number could be used too). Then a user search can be conducted with a proximity query: some tokens ~5000 (the recently committed complexphrase parser supports rich phrase syntax, for example). This of course expects that a sentence fits the 5000 window size and the total number of sentences in the field * 10k does not exceed Integer.MAX_VALUE. Then on the highlighter side you'd get the hits within sentences naturally. Is this something you are looking for? Dmitry On Mon, Mar 24, 2014 at 5:43 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; When I generate snippet via Solr I do not want to remove beginning of any sentence at the snippet. So I need to do a sentence detection. I think that I can do it before I send documents into Solr. I can put some special characters that signs beginning or end of a sentence. Then I can use that information when generating snippet. On the other hand I should not show that special character to the user. What do you think that how can I do it or do you have any other ideas for my purpose? PS: I do not do it for English sentences. Thanks; Furkan KAMACI -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan
Using Sentence Information For Snippet Generation
Hi; When I generate snippet via Solr I do not want to remove beginning of any sentence at the snippet. So I need to do a sentence detection. I think that I can do it before I send documents into Solr. I can put some special characters that signs beginning or end of a sentence. Then I can use that information when generating snippet. On the other hand I should not show that special character to the user. What do you think that how can I do it or do you have any other ideas for my purpose? PS: I do not do it for English sentences. Thanks; Furkan KAMACI
Re: Using Sentence Information For Snippet Generation
Hi Furkan, I have done an implementation with a custom filler (special character) sequence in between sentences. A better solution I landed at was increasing the position of each sentence's first token by a large number, like 1 (perhaps, a smaller number could be used too). Then a user search can be conducted with a proximity query: some tokens ~5000 (the recently committed complexphrase parser supports rich phrase syntax, for example). This of course expects that a sentence fits the 5000 window size and the total number of sentences in the field * 10k does not exceed Integer.MAX_VALUE. Then on the highlighter side you'd get the hits within sentences naturally. Is this something you are looking for? Dmitry On Mon, Mar 24, 2014 at 5:43 PM, Furkan KAMACI furkankam...@gmail.comwrote: Hi; When I generate snippet via Solr I do not want to remove beginning of any sentence at the snippet. So I need to do a sentence detection. I think that I can do it before I send documents into Solr. I can put some special characters that signs beginning or end of a sentence. Then I can use that information when generating snippet. On the other hand I should not show that special character to the user. What do you think that how can I do it or do you have any other ideas for my purpose? PS: I do not do it for English sentences. Thanks; Furkan KAMACI -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan