Re: bug in termfreq? was Re: is it possible to do a sort without query?
are you boosting your docs? 2011/8/8 Jason Toy jason...@gmail.com I am trying to test out and compare different sorts and scoring. When I use dismax to search for indie music with: qf=all_lists_textq=indie+musicdefType=dismaxrows=100 I see some stuff that seems irrelevant, meaning in top results I see only 1 or 2 mentions of indie music, but when I look further down the list I do see other docs that have more occurrences of indie music. So I a want to test by comparing the the different queries versus seeing a list of docs ranked specifically by the count of occurrences of the phrase indie music On Mon, Aug 8, 2011 at 2:19 PM, Markus Jelsma markus.jel...@openindex.io wrote: Dismax queries can. But sort=termfreq(all_lists_text,'indie+music') is not using dismax. Apparenty termfreq function can not? I am not familiar with the termfreq function. It simply returns the TF of the given _term_ as it is indexed of the current document. Sorting on TF like this seems strange as by default queries are already sorted that way since TF plays a big role in the final score. To understand why you'd need to reindex, you might want to read up on how lucene actually works, to get a basic understanding of how different indexing choices effect what is possible at query time. Lucene In Action is a pretty good book. On 8/8/2011 5:02 PM, Jason Toy wrote: Are not Dismax queries able to search for phrases using the default index(which is what I am using?) If I can already do phrase searches, I don't understand why I would need to reindex t be able to access phrases from a function. On Mon, Aug 8, 2011 at 1:49 PM, Markus Jelsmamarkus.jel...@openindex.iowrote: Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100 And I get back 4571232 hits. That's normal, you issue a catch all query. Sorting should work but.. All the results don't have the phrase indie music anywhere in their data. Does termfreq not support phrases? No, it is TERM frequency and indie music is not one term. I don't know how this function parses your input but it might not understand your + escape and think it's one term constisting of exactly that. If not, how can I sort specifically by termfreq of a phrase? You cannot. What you can do is index multiple terms as one term using the shingle filter. Take care, it can significantly increase your index size and number of unique terms. On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko ale...@superdownloads.com.br wrote: You can use the standard query parser and pass q=*:* 2011/8/8 Jason Toyjason...@gmail.com I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms? -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533 -- - sent from my mobile 6176064373 -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
bug in termfreq? was Re: is it possible to do a sort without query?
Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100 And I get back 4571232 hits. All the results don't have the phrase indie music anywhere in their data. Does termfreq not support phrases? If not, how can I sort specifically by termfreq of a phrase? On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko ale...@superdownloads.com.br wrote: You can use the standard query parser and pass q=*:* 2011/8/8 Jason Toy jason...@gmail.com I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms? -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533 -- - sent from my mobile 6176064373
Re: bug in termfreq? was Re: is it possible to do a sort without query?
On 8/8/2011 4:34 PM, Jason Toy wrote: Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100 And I get back 4571232 hits. That would be the total number of docs, I guess. Since your query is *:*, ie find everything. All the results don't have the phrase indie music anywhere in their data. You are only sorting on termfreq of indie music, you are not querying documents that contain it.
Re: bug in termfreq? was Re: is it possible to do a sort without query?
Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100 And I get back 4571232 hits. That's normal, you issue a catch all query. Sorting should work but.. All the results don't have the phrase indie music anywhere in their data. Does termfreq not support phrases? No, it is TERM frequency and indie music is not one term. I don't know how this function parses your input but it might not understand your + escape and think it's one term constisting of exactly that. If not, how can I sort specifically by termfreq of a phrase? You cannot. What you can do is index multiple terms as one term using the shingle filter. Take care, it can significantly increase your index size and number of unique terms. On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko ale...@superdownloads.com.br wrote: You can use the standard query parser and pass q=*:* 2011/8/8 Jason Toy jason...@gmail.com I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms? -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: bug in termfreq? was Re: is it possible to do a sort without query?
Are not Dismax queries able to search for phrases using the default index(which is what I am using?) If I can already do phrase searches, I don't understand why I would need to reindex t be able to access phrases from a function. On Mon, Aug 8, 2011 at 1:49 PM, Markus Jelsma markus.jel...@openindex.iowrote: Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100 And I get back 4571232 hits. That's normal, you issue a catch all query. Sorting should work but.. All the results don't have the phrase indie music anywhere in their data. Does termfreq not support phrases? No, it is TERM frequency and indie music is not one term. I don't know how this function parses your input but it might not understand your + escape and think it's one term constisting of exactly that. If not, how can I sort specifically by termfreq of a phrase? You cannot. What you can do is index multiple terms as one term using the shingle filter. Take care, it can significantly increase your index size and number of unique terms. On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko ale...@superdownloads.com.br wrote: You can use the standard query parser and pass q=*:* 2011/8/8 Jason Toy jason...@gmail.com I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms? -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533 -- - sent from my mobile 6176064373
Re: bug in termfreq? was Re: is it possible to do a sort without query?
Dismax queries can. But sort=termfreq(all_lists_text,'indie+music') is not using dismax. Apparenty termfreq function can not? I am not familiar with the termfreq function. To understand why you'd need to reindex, you might want to read up on how lucene actually works, to get a basic understanding of how different indexing choices effect what is possible at query time. Lucene In Action is a pretty good book. On 8/8/2011 5:02 PM, Jason Toy wrote: Are not Dismax queries able to search for phrases using the default index(which is what I am using?) If I can already do phrase searches, I don't understand why I would need to reindex t be able to access phrases from a function. On Mon, Aug 8, 2011 at 1:49 PM, Markus Jelsmamarkus.jel...@openindex.iowrote: Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100 And I get back 4571232 hits. That's normal, you issue a catch all query. Sorting should work but.. All the results don't have the phrase indie music anywhere in their data. Does termfreq not support phrases? No, it is TERM frequency and indie music is not one term. I don't know how this function parses your input but it might not understand your + escape and think it's one term constisting of exactly that. If not, how can I sort specifically by termfreq of a phrase? You cannot. What you can do is index multiple terms as one term using the shingle filter. Take care, it can significantly increase your index size and number of unique terms. On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko ale...@superdownloads.com.br wrote: You can use the standard query parser and pass q=*:* 2011/8/8 Jason Toyjason...@gmail.com I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms? -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: bug in termfreq? was Re: is it possible to do a sort without query?
Are not Dismax queries able to search for phrases using the default index(which is what I am using?) If I can already do phrase searches, I don't understand why I would need to reindex t be able to access phrases from a function. Executing a Lucene phrase query is not the same as term frequency (phrase != term). A phrase consists of multiple terms and Lucene has an inverted term index, not an inverted phrase index (unless your index your data that way). On Mon, Aug 8, 2011 at 1:49 PM, Markus Jelsma markus.jel...@openindex.iowrote: Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100 And I get back 4571232 hits. That's normal, you issue a catch all query. Sorting should work but.. All the results don't have the phrase indie music anywhere in their data. Does termfreq not support phrases? No, it is TERM frequency and indie music is not one term. I don't know how this function parses your input but it might not understand your + escape and think it's one term constisting of exactly that. If not, how can I sort specifically by termfreq of a phrase? You cannot. What you can do is index multiple terms as one term using the shingle filter. Take care, it can significantly increase your index size and number of unique terms. On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko ale...@superdownloads.com.br wrote: You can use the standard query parser and pass q=*:* 2011/8/8 Jason Toy jason...@gmail.com I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms? -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: bug in termfreq? was Re: is it possible to do a sort without query?
Dismax queries can. But sort=termfreq(all_lists_text,'indie+music') is not using dismax. Apparenty termfreq function can not? I am not familiar with the termfreq function. It simply returns the TF of the given _term_ as it is indexed of the current document. Sorting on TF like this seems strange as by default queries are already sorted that way since TF plays a big role in the final score. To understand why you'd need to reindex, you might want to read up on how lucene actually works, to get a basic understanding of how different indexing choices effect what is possible at query time. Lucene In Action is a pretty good book. On 8/8/2011 5:02 PM, Jason Toy wrote: Are not Dismax queries able to search for phrases using the default index(which is what I am using?) If I can already do phrase searches, I don't understand why I would need to reindex t be able to access phrases from a function. On Mon, Aug 8, 2011 at 1:49 PM, Markus Jelsmamarkus.jel...@openindex.iowrote: Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100 And I get back 4571232 hits. That's normal, you issue a catch all query. Sorting should work but.. All the results don't have the phrase indie music anywhere in their data. Does termfreq not support phrases? No, it is TERM frequency and indie music is not one term. I don't know how this function parses your input but it might not understand your + escape and think it's one term constisting of exactly that. If not, how can I sort specifically by termfreq of a phrase? You cannot. What you can do is index multiple terms as one term using the shingle filter. Take care, it can significantly increase your index size and number of unique terms. On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko ale...@superdownloads.com.br wrote: You can use the standard query parser and pass q=*:* 2011/8/8 Jason Toyjason...@gmail.com I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms? -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: bug in termfreq? was Re: is it possible to do a sort without query?
I am trying to test out and compare different sorts and scoring. When I use dismax to search for indie music with: qf=all_lists_textq=indie+musicdefType=dismaxrows=100 I see some stuff that seems irrelevant, meaning in top results I see only 1 or 2 mentions of indie music, but when I look further down the list I do see other docs that have more occurrences of indie music. So I a want to test by comparing the the different queries versus seeing a list of docs ranked specifically by the count of occurrences of the phrase indie music On Mon, Aug 8, 2011 at 2:19 PM, Markus Jelsma markus.jel...@openindex.iowrote: Dismax queries can. But sort=termfreq(all_lists_text,'indie+music') is not using dismax. Apparenty termfreq function can not? I am not familiar with the termfreq function. It simply returns the TF of the given _term_ as it is indexed of the current document. Sorting on TF like this seems strange as by default queries are already sorted that way since TF plays a big role in the final score. To understand why you'd need to reindex, you might want to read up on how lucene actually works, to get a basic understanding of how different indexing choices effect what is possible at query time. Lucene In Action is a pretty good book. On 8/8/2011 5:02 PM, Jason Toy wrote: Are not Dismax queries able to search for phrases using the default index(which is what I am using?) If I can already do phrase searches, I don't understand why I would need to reindex t be able to access phrases from a function. On Mon, Aug 8, 2011 at 1:49 PM, Markus Jelsmamarkus.jel...@openindex.iowrote: Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100 And I get back 4571232 hits. That's normal, you issue a catch all query. Sorting should work but.. All the results don't have the phrase indie music anywhere in their data. Does termfreq not support phrases? No, it is TERM frequency and indie music is not one term. I don't know how this function parses your input but it might not understand your + escape and think it's one term constisting of exactly that. If not, how can I sort specifically by termfreq of a phrase? You cannot. What you can do is index multiple terms as one term using the shingle filter. Take care, it can significantly increase your index size and number of unique terms. On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko ale...@superdownloads.com.br wrote: You can use the standard query parser and pass q=*:* 2011/8/8 Jason Toyjason...@gmail.com I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms? -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533 -- - sent from my mobile 6176064373
Re: bug in termfreq? was Re: is it possible to do a sort without query?
If your want to understand and debug the scoring you can use debugQuery=true to see how different documents score. Most of the time docs with both terms are on top of the result set unless norms are interferring. To understand your should check the Solr relevancy wiki but the Lucene docs are much better although very low level. http://wiki.apache.org/solr/SolrRelevancyCookbook http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/search/Similarity.html Your question is more a relevance question than about the termfreq function. To be short, don't use those kind of functions if you don't yet understand similarity as describe in the Lucene docs. I am trying to test out and compare different sorts and scoring. When I use dismax to search for indie music with: qf=all_lists_textq=indie+musicdefType=dismaxrows=100 I see some stuff that seems irrelevant, meaning in top results I see only 1 or 2 mentions of indie music, but when I look further down the list I do see other docs that have more occurrences of indie music. So I a want to test by comparing the the different queries versus seeing a list of docs ranked specifically by the count of occurrences of the phrase indie music On Mon, Aug 8, 2011 at 2:19 PM, Markus Jelsma markus.jel...@openindex.iowrote: Dismax queries can. But sort=termfreq(all_lists_text,'indie+music') is not using dismax. Apparenty termfreq function can not? I am not familiar with the termfreq function. It simply returns the TF of the given _term_ as it is indexed of the current document. Sorting on TF like this seems strange as by default queries are already sorted that way since TF plays a big role in the final score. To understand why you'd need to reindex, you might want to read up on how lucene actually works, to get a basic understanding of how different indexing choices effect what is possible at query time. Lucene In Action is a pretty good book. On 8/8/2011 5:02 PM, Jason Toy wrote: Are not Dismax queries able to search for phrases using the default index(which is what I am using?) If I can already do phrase searches, I don't understand why I would need to reindex t be able to access phrases from a function. On Mon, Aug 8, 2011 at 1:49 PM, Markus Jelsmamarkus.jel...@openindex.iowrote: Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100 And I get back 4571232 hits. That's normal, you issue a catch all query. Sorting should work but.. All the results don't have the phrase indie music anywhere in their data. Does termfreq not support phrases? No, it is TERM frequency and indie music is not one term. I don't know how this function parses your input but it might not understand your + escape and think it's one term constisting of exactly that. If not, how can I sort specifically by termfreq of a phrase? You cannot. What you can do is index multiple terms as one term using the shingle filter. Take care, it can significantly increase your index size and number of unique terms. On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko ale...@superdownloads.com.br wrote: You can use the standard query parser and pass q=*:* 2011/8/8 Jason Toyjason...@gmail.com I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms? -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533