Re: phrase matches returning near matches
yep seems that’s the answer. The highlighting is done separately by the rails app, so I’ll look into proper solr highlighting. thanks a lot for the use of your ears, much improved understanding! cheers, Alistair -- mov eax,1 mov ebx,0 int 80h On 16/06/2015 16:33, Erick Erickson erickerick...@gmail.com wrote: Hmmm. First, highlighting should work here. If you have it configured to work on the dc.description field. As to whether the phrase management changes is near enough, I pretty much guarantee it is. This is where the admin/analysis page can answer this type of question authoritatively since it's based exactly on your particular analysis chain. Best, Erick On Tue, Jun 16, 2015 at 8:25 AM, Alistair Young alistair.yo...@uhi.ac.uk wrote: yes prolly not a bug. The highlighting is on but nothing is highlighted. Perhaps this text is triggering it? 'consider the impacts of land management changes’ that would seem reasonable. It’s not a direct match so no highlighting (the highlighting does work on a direct match) but 'management changes’ must be near enough ‘manage change’ to trigger a result. Alistair -- mov eax,1 mov ebx,0 int 80h On 16/06/2015 16:18, Erick Erickson erickerick...@gmail.com wrote: I agree with Allesandro the behavior you're describing is _not_ correct at all given your description. So either 1 There's something interesting about your configuration that doesn't seem important that you haven't told us, although what it could be is a mystery to me too ;) 2 it's matching on something else. Note that the phrase has been stemmed, so something in there besides management might stem to manag and/or something other than changes might stem to chang and the two of _them_ happen to be next to each other. are managers changing? for instance. Or even something less likely. Perhaps turn on highlighting and see if it pops out? 3 you've uncovered a bug. Although I suspect others would have reported it and the unit tests would have barfed all over the place. One other thing you can do. Go to the admin/analysis page and turn on the verbose check box. Put management is undergoing many changes in both the query and index boxes. The result (it's kind of hard to read I'll admit) will include the position of each token after all the analysis is done. Phrase queries (without slop) should only be matching adjacent positions. So the question is whether the position info looks correct Best, Erick On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: According to your debug you are using a default Lucene Query Parser. This surprise me as i would expect with that query a match with distance 0 between the 2 terms . Are you sure nothing else is that field that matches the phrase query ? From the documentation Lucene supports finding words are a within a specific distance away. To do a proximity search use the tilde, ~, symbol at the end of a Phrase. For example to search for a apache and jakarta within 10 words of each other in a document use the search: jakarta apache~10 Cheers 2015-06-16 11:33 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk: it¹s a useful behaviour. I¹d just like to understand where it¹s deciding the document is relevant. debug output is: lst name=debug str name=rawquerystringdc.description:manage change/str str name=querystringdc.description:manage change/str str name=parsedqueryPhraseQuery(dc.description:manag chang)/str str name=parsedquery_toStringdc.description:manag chang/str lst name=explain str name=tst:test 1.2008798 = (MATCH) weight(dc.description:manag chang in 221) [DefaultSimilarity], result of: 1.2008798 = fieldWeight in 221, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = phraseFreq=1.0 9.6070385 = idf(), sum of: 4.0365543 = idf(docFreq=101, maxDocs=2125) 5.5704846 = idf(docFreq=21, maxDocs=2125) 0.125 = fieldNorm(doc=221) /str /lst str name=QParserLuceneQParser/str lst name=timing double name=time41.0/double lst name=prepare double name=time3.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time0.0/double /lst /lst lst name=process double name=time35.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats
Re: phrase matches returning near matches
I agree with Allesandro the behavior you're describing is _not_ correct at all given your description. So either 1 There's something interesting about your configuration that doesn't seem important that you haven't told us, although what it could be is a mystery to me too ;) 2 it's matching on something else. Note that the phrase has been stemmed, so something in there besides management might stem to manag and/or something other than changes might stem to chang and the two of _them_ happen to be next to each other. are managers changing? for instance. Or even something less likely. Perhaps turn on highlighting and see if it pops out? 3 you've uncovered a bug. Although I suspect others would have reported it and the unit tests would have barfed all over the place. One other thing you can do. Go to the admin/analysis page and turn on the verbose check box. Put management is undergoing many changes in both the query and index boxes. The result (it's kind of hard to read I'll admit) will include the position of each token after all the analysis is done. Phrase queries (without slop) should only be matching adjacent positions. So the question is whether the position info looks correct Best, Erick On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: According to your debug you are using a default Lucene Query Parser. This surprise me as i would expect with that query a match with distance 0 between the 2 terms . Are you sure nothing else is that field that matches the phrase query ? From the documentation Lucene supports finding words are a within a specific distance away. To do a proximity search use the tilde, ~, symbol at the end of a Phrase. For example to search for a apache and jakarta within 10 words of each other in a document use the search: jakarta apache~10 Cheers 2015-06-16 11:33 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk: it¹s a useful behaviour. I¹d just like to understand where it¹s deciding the document is relevant. debug output is: lst name=debug str name=rawquerystringdc.description:manage change/str str name=querystringdc.description:manage change/str str name=parsedqueryPhraseQuery(dc.description:manag chang)/str str name=parsedquery_toStringdc.description:manag chang/str lst name=explain str name=tst:test 1.2008798 = (MATCH) weight(dc.description:manag chang in 221) [DefaultSimilarity], result of: 1.2008798 = fieldWeight in 221, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = phraseFreq=1.0 9.6070385 = idf(), sum of: 4.0365543 = idf(docFreq=101, maxDocs=2125) 5.5704846 = idf(docFreq=21, maxDocs=2125) 0.125 = fieldNorm(doc=221) /str /lst str name=QParserLuceneQParser/str lst name=timing double name=time41.0/double lst name=prepare double name=time3.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time0.0/double /lst /lst lst name=process double name=time35.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time35.0/double /lst /lst /lst /lst thanks, Alistair -- mov eax,1 mov ebx,0 int 80h On 16/06/2015 11:26, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can you show us how the query is parsed ? You didn't tell us nothing about the query parser you are using. Enable the debugQuery=true will show you how the query is parsed and this will be quite useful for us. Cheers 2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk: Hiya, I've been looking for documentation that would point to where I could modify or explain why 'near neighbours' are returned from a phrase search. If I search for: manage change I get back a document that contains this will help in your management of lots more words... changes. It's relevant but I'd like to understand why solr is returning it. Is it a combination of fuzzy/slop? The distance between the two variations of the two words in the document is quite large. thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- -- Benedetti Alessandro Visiting card :
Re: phrase matches returning near matches
Hmmm. First, highlighting should work here. If you have it configured to work on the dc.description field. As to whether the phrase management changes is near enough, I pretty much guarantee it is. This is where the admin/analysis page can answer this type of question authoritatively since it's based exactly on your particular analysis chain. Best, Erick On Tue, Jun 16, 2015 at 8:25 AM, Alistair Young alistair.yo...@uhi.ac.uk wrote: yes prolly not a bug. The highlighting is on but nothing is highlighted. Perhaps this text is triggering it? 'consider the impacts of land management changes’ that would seem reasonable. It’s not a direct match so no highlighting (the highlighting does work on a direct match) but 'management changes’ must be near enough ‘manage change’ to trigger a result. Alistair -- mov eax,1 mov ebx,0 int 80h On 16/06/2015 16:18, Erick Erickson erickerick...@gmail.com wrote: I agree with Allesandro the behavior you're describing is _not_ correct at all given your description. So either 1 There's something interesting about your configuration that doesn't seem important that you haven't told us, although what it could be is a mystery to me too ;) 2 it's matching on something else. Note that the phrase has been stemmed, so something in there besides management might stem to manag and/or something other than changes might stem to chang and the two of _them_ happen to be next to each other. are managers changing? for instance. Or even something less likely. Perhaps turn on highlighting and see if it pops out? 3 you've uncovered a bug. Although I suspect others would have reported it and the unit tests would have barfed all over the place. One other thing you can do. Go to the admin/analysis page and turn on the verbose check box. Put management is undergoing many changes in both the query and index boxes. The result (it's kind of hard to read I'll admit) will include the position of each token after all the analysis is done. Phrase queries (without slop) should only be matching adjacent positions. So the question is whether the position info looks correct Best, Erick On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: According to your debug you are using a default Lucene Query Parser. This surprise me as i would expect with that query a match with distance 0 between the 2 terms . Are you sure nothing else is that field that matches the phrase query ? From the documentation Lucene supports finding words are a within a specific distance away. To do a proximity search use the tilde, ~, symbol at the end of a Phrase. For example to search for a apache and jakarta within 10 words of each other in a document use the search: jakarta apache~10 Cheers 2015-06-16 11:33 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk: it¹s a useful behaviour. I¹d just like to understand where it¹s deciding the document is relevant. debug output is: lst name=debug str name=rawquerystringdc.description:manage change/str str name=querystringdc.description:manage change/str str name=parsedqueryPhraseQuery(dc.description:manag chang)/str str name=parsedquery_toStringdc.description:manag chang/str lst name=explain str name=tst:test 1.2008798 = (MATCH) weight(dc.description:manag chang in 221) [DefaultSimilarity], result of: 1.2008798 = fieldWeight in 221, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = phraseFreq=1.0 9.6070385 = idf(), sum of: 4.0365543 = idf(docFreq=101, maxDocs=2125) 5.5704846 = idf(docFreq=21, maxDocs=2125) 0.125 = fieldNorm(doc=221) /str /lst str name=QParserLuceneQParser/str lst name=timing double name=time41.0/double lst name=prepare double name=time3.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time0.0/double /lst /lst lst name=process double name=time35.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time35.0/double /lst /lst /lst /lst thanks, Alistair -- mov eax,1 mov ebx,0 int 80h On 16/06/2015 11:26, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can you show us how the query is parsed ? You
Re: phrase matches returning near matches
yes prolly not a bug. The highlighting is on but nothing is highlighted. Perhaps this text is triggering it? 'consider the impacts of land management changes’ that would seem reasonable. It’s not a direct match so no highlighting (the highlighting does work on a direct match) but 'management changes’ must be near enough ‘manage change’ to trigger a result. Alistair -- mov eax,1 mov ebx,0 int 80h On 16/06/2015 16:18, Erick Erickson erickerick...@gmail.com wrote: I agree with Allesandro the behavior you're describing is _not_ correct at all given your description. So either 1 There's something interesting about your configuration that doesn't seem important that you haven't told us, although what it could be is a mystery to me too ;) 2 it's matching on something else. Note that the phrase has been stemmed, so something in there besides management might stem to manag and/or something other than changes might stem to chang and the two of _them_ happen to be next to each other. are managers changing? for instance. Or even something less likely. Perhaps turn on highlighting and see if it pops out? 3 you've uncovered a bug. Although I suspect others would have reported it and the unit tests would have barfed all over the place. One other thing you can do. Go to the admin/analysis page and turn on the verbose check box. Put management is undergoing many changes in both the query and index boxes. The result (it's kind of hard to read I'll admit) will include the position of each token after all the analysis is done. Phrase queries (without slop) should only be matching adjacent positions. So the question is whether the position info looks correct Best, Erick On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: According to your debug you are using a default Lucene Query Parser. This surprise me as i would expect with that query a match with distance 0 between the 2 terms . Are you sure nothing else is that field that matches the phrase query ? From the documentation Lucene supports finding words are a within a specific distance away. To do a proximity search use the tilde, ~, symbol at the end of a Phrase. For example to search for a apache and jakarta within 10 words of each other in a document use the search: jakarta apache~10 Cheers 2015-06-16 11:33 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk: it¹s a useful behaviour. I¹d just like to understand where it¹s deciding the document is relevant. debug output is: lst name=debug str name=rawquerystringdc.description:manage change/str str name=querystringdc.description:manage change/str str name=parsedqueryPhraseQuery(dc.description:manag chang)/str str name=parsedquery_toStringdc.description:manag chang/str lst name=explain str name=tst:test 1.2008798 = (MATCH) weight(dc.description:manag chang in 221) [DefaultSimilarity], result of: 1.2008798 = fieldWeight in 221, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = phraseFreq=1.0 9.6070385 = idf(), sum of: 4.0365543 = idf(docFreq=101, maxDocs=2125) 5.5704846 = idf(docFreq=21, maxDocs=2125) 0.125 = fieldNorm(doc=221) /str /lst str name=QParserLuceneQParser/str lst name=timing double name=time41.0/double lst name=prepare double name=time3.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time0.0/double /lst /lst lst name=process double name=time35.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time35.0/double /lst /lst /lst /lst thanks, Alistair -- mov eax,1 mov ebx,0 int 80h On 16/06/2015 11:26, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can you show us how the query is parsed ? You didn't tell us nothing about the query parser you are using. Enable the debugQuery=true will show you how the query is parsed and this will be quite useful for us. Cheers 2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk: Hiya, I've been looking for documentation that would point to where I could modify or explain why 'near neighbours' are returned from a phrase search. If I search for: manage change I
Re: phrase matches returning near matches
This might be an issue with your stemmer. management being stemmed to manage, changes being stemmed to change then the terms match. You can use the solr admin UI to test your indexing and query analysis chains to see if this is happening. On 6/16/2015 3:22 AM, Alistair Young wrote: Hiya, I've been looking for documentation that would point to where I could modify or explain why 'near neighbours' are returned from a phrase search. If I search for: manage change I get back a document that contains this will help in your management of lots more words... changes. It's relevant but I'd like to understand why solr is returning it. Is it a combination of fuzzy/slop? The distance between the two variations of the two words in the document is quite large. thanks, Alistair -- mov eax,1 mov ebx,0 int 80h
Re: phrase matches returning near matches
Can you show us how the query is parsed ? You didn't tell us nothing about the query parser you are using. Enable the debugQuery=true will show you how the query is parsed and this will be quite useful for us. Cheers 2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk: Hiya, I've been looking for documentation that would point to where I could modify or explain why 'near neighbours' are returned from a phrase search. If I search for: manage change I get back a document that contains this will help in your management of lots more words... changes. It's relevant but I'd like to understand why solr is returning it. Is it a combination of fuzzy/slop? The distance between the two variations of the two words in the document is quite large. thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: phrase matches returning near matches
it¹s a useful behaviour. I¹d just like to understand where it¹s deciding the document is relevant. debug output is: lst name=debug str name=rawquerystringdc.description:manage change/str str name=querystringdc.description:manage change/str str name=parsedqueryPhraseQuery(dc.description:manag chang)/str str name=parsedquery_toStringdc.description:manag chang/str lst name=explain str name=tst:test 1.2008798 = (MATCH) weight(dc.description:manag chang in 221) [DefaultSimilarity], result of: 1.2008798 = fieldWeight in 221, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = phraseFreq=1.0 9.6070385 = idf(), sum of: 4.0365543 = idf(docFreq=101, maxDocs=2125) 5.5704846 = idf(docFreq=21, maxDocs=2125) 0.125 = fieldNorm(doc=221) /str /lst str name=QParserLuceneQParser/str lst name=timing double name=time41.0/double lst name=prepare double name=time3.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time0.0/double /lst /lst lst name=process double name=time35.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time35.0/double /lst /lst /lst /lst thanks, Alistair -- mov eax,1 mov ebx,0 int 80h On 16/06/2015 11:26, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can you show us how the query is parsed ? You didn't tell us nothing about the query parser you are using. Enable the debugQuery=true will show you how the query is parsed and this will be quite useful for us. Cheers 2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk: Hiya, I've been looking for documentation that would point to where I could modify or explain why 'near neighbours' are returned from a phrase search. If I search for: manage change I get back a document that contains this will help in your management of lots more words... changes. It's relevant but I'd like to understand why solr is returning it. Is it a combination of fuzzy/slop? The distance between the two variations of the two words in the document is quite large. thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: phrase matches returning near matches
According to your debug you are using a default Lucene Query Parser. This surprise me as i would expect with that query a match with distance 0 between the 2 terms . Are you sure nothing else is that field that matches the phrase query ? From the documentation Lucene supports finding words are a within a specific distance away. To do a proximity search use the tilde, ~, symbol at the end of a Phrase. For example to search for a apache and jakarta within 10 words of each other in a document use the search: jakarta apache~10 Cheers 2015-06-16 11:33 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk: it¹s a useful behaviour. I¹d just like to understand where it¹s deciding the document is relevant. debug output is: lst name=debug str name=rawquerystringdc.description:manage change/str str name=querystringdc.description:manage change/str str name=parsedqueryPhraseQuery(dc.description:manag chang)/str str name=parsedquery_toStringdc.description:manag chang/str lst name=explain str name=tst:test 1.2008798 = (MATCH) weight(dc.description:manag chang in 221) [DefaultSimilarity], result of: 1.2008798 = fieldWeight in 221, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = phraseFreq=1.0 9.6070385 = idf(), sum of: 4.0365543 = idf(docFreq=101, maxDocs=2125) 5.5704846 = idf(docFreq=21, maxDocs=2125) 0.125 = fieldNorm(doc=221) /str /lst str name=QParserLuceneQParser/str lst name=timing double name=time41.0/double lst name=prepare double name=time3.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time0.0/double /lst /lst lst name=process double name=time35.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time35.0/double /lst /lst /lst /lst thanks, Alistair -- mov eax,1 mov ebx,0 int 80h On 16/06/2015 11:26, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can you show us how the query is parsed ? You didn't tell us nothing about the query parser you are using. Enable the debugQuery=true will show you how the query is parsed and this will be quite useful for us. Cheers 2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk: Hiya, I've been looking for documentation that would point to where I could modify or explain why 'near neighbours' are returned from a phrase search. If I search for: manage change I get back a document that contains this will help in your management of lots more words... changes. It's relevant but I'd like to understand why solr is returning it. Is it a combination of fuzzy/slop? The distance between the two variations of the two words in the document is quite large. thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England