Function query result as a filter query
Hi, Is it possible to constrain a resultset using a filter query to only return the top 100 documents for a particular field? Say I have a field called 'hits' that has the total number of hits for that item. I want to return only the documents that have the top 100 highest hits. I want something like this: fq=ord(hits):[* TO 100] But that does not appear to work - I don't think I can use a function query for the source of a query. I want it as a filter query so I can also use it as a facet query. Cheers, Pete
Using a function in a filter query
I want to filter my result set before I search. I know the correct way to do this is by using the filter query (fq) parameter. However, I want to filter based on the output of a function performed on a field. I have a field 'rating' which is an integer in the range of 1 to ~75000. The upper limit may change. I want to filter to the top 500 items with the highest 'rating'. In SQL this would be something like: ... ORDER BY rating DESC LIMIT 500 I think I can get the documents in solr ranked by rating descending by using the function rord(rating), so basically I would like to do: fq=rord(rating):[0 TO 500] But that does not seem possible. Does anyone know what else I could do? -- Pete Smith Senior Developer No.9 | 6 Portal Way | London | W3 6RU | T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 LOVEFiLM.com
using NGramTokenizerFactory for partial matching
Hi, I want to use the NGramTokenizerFactory tokeniser to enable partial matching on a field in my index. For instance for the field: Lorem ipsum I want it to match lor lorem and lorem i. However I am finding it matches the first two but not the third - the white space is causing problems. Here are the relevant parts of my config: fieldType name=text_substring class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.NGramTokenizerFactory minGramSize=3 maxGramSize=15 / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=title_partial type=text_substring indexed=true stored=true required=true / I believe it is due to the mingramsize setting and that is applying to each word. Can anyone tell me how I can support what I want to do? Cheers, Pete -- Pete Smith Developer No.9 | 6 Portal Way | London | W3 6RU | T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 LOVEFiLM.com
RE: How to correctly boost results in Solr Dismax query
Thank you Dean. I thought I was on the right track with BQ but it was the skewing of results that was frustrating me. I'll try out your suggestion. Cheers, Pete On Mon, 2009-03-16 at 10:29 +0800, Dean Missikowski (Consultant), CLSA wrote: Hi, My experience is that the BQ parameter can be used with any query type. You can define boosts on the query fields (qf) that are used with the query terms (q) in your query, AND you can define additional boosts for fields that are not used with the query terms through the bq or bf parameters. I think the relative weight that assigning a particular boost to a field via BQ has on the overall scoring needs to take into consideration the other fields in your query. If you're searching on titles, you might want to consider setting omitNorms=true (means don't generate length normalization vectors) for title in your schema.xml, and if you're using Solr 1.4 omitTf=true (means don't generate term frequency vectors), so that results aren't skewed by short and long titles, or titles that contain multiple occurrences of the same term (setting these requires you to reindex). I think this should have the effect of making BQ boosts like bq=media:DVD^2bq=media:BLU-RAY^1.5 more effective. -- Dean -Original Message- From: Pete Smith [mailto:pete.sm...@lovefilm.com] Sent: 13/03/2009 7:11 PM To: solr-user@lucene.apache.org Subject: Re: How to correctly boost results in Solr Dismax query Hi, On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote: bq works only with q.alt query and not with q queries. So, in your case you would be using qf parameter for field boosting, you will have to give both the fields in qf parameter i.e. both title and media. try this str name=qfmedia^1.0 title^100.0/str But with that, how will it know to rank media:DVD higher than media:BLU-RAY? Cheers, Pete Pete Smith-3 wrote: Hi Amit, Thanks again for your reply. I am understanding it a bit better but I think it would help if I posted an example. Say I have three records: doc long name=id1/long str name=mediaBLU-RAY/str str name=titleIndiana Jones and the Kingdom of the Crystal Skull/str /doc doc long name=id2/long str name=mediaDVD/str str name=titleIndiana Jones and the Kingdom of the Crystal Skull/str /doc doc long name=id3/long str name=mediaDVD/str str name=titleCasino Royale/str /doc Now, if I search for indiana: select?q=indiana I want the first two rows to come back (not the third as it does not contain 'indiana'). I would like record 2 to be scored higher than record 1 as it's media type is DVD. At the moment I have in my config: str name=qftitle/str And i was trying to boost by media having a specific value by using 'bq' but from what you told me that is incorrect. Cheers, Pete On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote: Pete, Sorry, if wasnt clear. Here is the explanation. Suppose you have 2 records and they have films and media as 2 columns. Now first record has values like films=Indiana and media=blue ray and 2nd record has values like films=Bond and media=Indiana Values for qf parameters str name=qfmedia^2.0 films^1.0/str Now, search for q=Indiana .. it should display both of the records but record #2 will display above than the 1st. Let me know if you still have questions. Cheers, amit Pete Smith-3 wrote: Hi Amit, Thanks very much for your reply. What you said makes things a bit clearer but I am still a bit confused. On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote: If you want to boost the records with their field value then you must use q query parameter instead of q.alt. 'q' parameter actually uses qf parameters from solrConfig for field boosting. From the documentation for Dismax queries, I thought that q is simply a keyword parameter: From http://wiki.apache.org/solr/DisMaxRequestHandler: q The guts of the search defining the main query. This is designed to be support raw input strings provided by users with no special escaping. '+' and '-' characters are treated as mandatory and prohibited modifiers for the subsequent terms. Text wrapped in balanced quote characters '' are treated as phrases, any query containing an odd number of quote characters is evaluated as if there were no quote characters at all. Wildcards in this q parameter are not supported. And I thought 'qf' is a list of fields and boost scores: From http://wiki.apache.org/solr/DisMaxRequestHandler: qf (Query Fields) List of fields and the boosts to associate with each of them when building DisjunctionMaxQueries from the user's query. The format supported is fieldOne^2.3 fieldTwo
Re: How to correctly boost results in Solr Dismax query
Hi Amit, Thanks very much for your reply. What you said makes things a bit clearer but I am still a bit confused. On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote: If you want to boost the records with their field value then you must use q query parameter instead of q.alt. 'q' parameter actually uses qf parameters from solrConfig for field boosting. From the documentation for Dismax queries, I thought that q is simply a keyword parameter: From http://wiki.apache.org/solr/DisMaxRequestHandler: q The guts of the search defining the main query. This is designed to be support raw input strings provided by users with no special escaping. '+' and '-' characters are treated as mandatory and prohibited modifiers for the subsequent terms. Text wrapped in balanced quote characters '' are treated as phrases, any query containing an odd number of quote characters is evaluated as if there were no quote characters at all. Wildcards in this q parameter are not supported. And I thought 'qf' is a list of fields and boost scores: From http://wiki.apache.org/solr/DisMaxRequestHandler: qf (Query Fields) List of fields and the boosts to associate with each of them when building DisjunctionMaxQueries from the user's query. The format supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that fieldOne has a boost of 2.3, fieldTwo has the default boost, and fieldThree has a boost of 0.4 ... this indicates that matches in fieldOne are much more significant than matches in fieldTwo, which are more significant than matches in fieldThree. But if I want to, say, search for films with 'indiana' in the title, with media=DVD scoring higher than media=BLU-RAY then do I need to do something like: solr/select?q=indiana And in my config: str name=qfmedia^2/str But I don't see where the actual *contents* of the media field would determine the boost. Sorry if I have misunderstood what you mean. Cheers, Pete Pete Smith-3 wrote: Hi, I have managed to build an index in Solr which I can search on keyword, produce facets, query facets etc. This is all working great. I have implemented my search using a dismax query so it searches predetermined fields. However, my results are coming back sorted by score which appears to be calculated by keyword relevancy only. I would like to adjust the score where fields have pre-determined values. I think I can do this with boost query and boost functions but the documentation here: http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3 Is not particularly helpful. I tried adding adding a bq argument to my search: bq=media:DVD^2 (yes, this is an index of films!) but I find when I start adding more and more: bq=media:DVD^2bq=media:BLU-RAY^1.5 I find the negative results - e.g. films that are DVD but are not BLU-RAY get negatively affected in their score. In the end it all seems to even out and my score is as it was before i started boosting. I must be doing this wrong and I wonder whether boost function comes in somewhere. Any ideas on how to correctly use boost? Cheers, Pete -- Pete Smith Developer No.9 | 6 Portal Way | London | W3 6RU | T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 LOVEFiLM.com -- Pete Smith Developer No.9 | 6 Portal Way | London | W3 6RU | T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 LOVEFiLM.com
Re: How to correctly boost results in Solr Dismax query
Hi Amit, Thanks again for your reply. I am understanding it a bit better but I think it would help if I posted an example. Say I have three records: doc long name=id1/long str name=mediaBLU-RAY/str str name=titleIndiana Jones and the Kingdom of the Crystal Skull/str /doc doc long name=id2/long str name=mediaDVD/str str name=titleIndiana Jones and the Kingdom of the Crystal Skull/str /doc doc long name=id3/long str name=mediaDVD/str str name=titleCasino Royale/str /doc Now, if I search for indiana: select?q=indiana I want the first two rows to come back (not the third as it does not contain 'indiana'). I would like record 2 to be scored higher than record 1 as it's media type is DVD. At the moment I have in my config: str name=qftitle/str And i was trying to boost by media having a specific value by using 'bq' but from what you told me that is incorrect. Cheers, Pete On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote: Pete, Sorry, if wasnt clear. Here is the explanation. Suppose you have 2 records and they have films and media as 2 columns. Now first record has values like films=Indiana and media=blue ray and 2nd record has values like films=Bond and media=Indiana Values for qf parameters str name=qfmedia^2.0 films^1.0/str Now, search for q=Indiana .. it should display both of the records but record #2 will display above than the 1st. Let me know if you still have questions. Cheers, amit Pete Smith-3 wrote: Hi Amit, Thanks very much for your reply. What you said makes things a bit clearer but I am still a bit confused. On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote: If you want to boost the records with their field value then you must use q query parameter instead of q.alt. 'q' parameter actually uses qf parameters from solrConfig for field boosting. From the documentation for Dismax queries, I thought that q is simply a keyword parameter: From http://wiki.apache.org/solr/DisMaxRequestHandler: q The guts of the search defining the main query. This is designed to be support raw input strings provided by users with no special escaping. '+' and '-' characters are treated as mandatory and prohibited modifiers for the subsequent terms. Text wrapped in balanced quote characters '' are treated as phrases, any query containing an odd number of quote characters is evaluated as if there were no quote characters at all. Wildcards in this q parameter are not supported. And I thought 'qf' is a list of fields and boost scores: From http://wiki.apache.org/solr/DisMaxRequestHandler: qf (Query Fields) List of fields and the boosts to associate with each of them when building DisjunctionMaxQueries from the user's query. The format supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that fieldOne has a boost of 2.3, fieldTwo has the default boost, and fieldThree has a boost of 0.4 ... this indicates that matches in fieldOne are much more significant than matches in fieldTwo, which are more significant than matches in fieldThree. But if I want to, say, search for films with 'indiana' in the title, with media=DVD scoring higher than media=BLU-RAY then do I need to do something like: solr/select?q=indiana And in my config: str name=qfmedia^2/str But I don't see where the actual *contents* of the media field would determine the boost. Sorry if I have misunderstood what you mean. Cheers, Pete Pete Smith-3 wrote: Hi, I have managed to build an index in Solr which I can search on keyword, produce facets, query facets etc. This is all working great. I have implemented my search using a dismax query so it searches predetermined fields. However, my results are coming back sorted by score which appears to be calculated by keyword relevancy only. I would like to adjust the score where fields have pre-determined values. I think I can do this with boost query and boost functions but the documentation here: http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3 Is not particularly helpful. I tried adding adding a bq argument to my search: bq=media:DVD^2 (yes, this is an index of films!) but I find when I start adding more and more: bq=media:DVD^2bq=media:BLU-RAY^1.5 I find the negative results - e.g. films that are DVD but are not BLU-RAY get negatively affected in their score. In the end it all seems to even out and my score is as it was before i started boosting. I must be doing this wrong and I wonder whether boost function comes in somewhere. Any ideas on how to correctly use boost? Cheers, Pete -- Pete Smith Developer No.9 | 6 Portal Way | London | W3 6RU | T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 LOVEFiLM.com -- Pete Smith Developer No.9
Re: How to correctly boost results in Solr Dismax query
Hi, On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote: bq works only with q.alt query and not with q queries. So, in your case you would be using qf parameter for field boosting, you will have to give both the fields in qf parameter i.e. both title and media. try this str name=qfmedia^1.0 title^100.0/str But with that, how will it know to rank media:DVD higher than media:BLU-RAY? Cheers, Pete Pete Smith-3 wrote: Hi Amit, Thanks again for your reply. I am understanding it a bit better but I think it would help if I posted an example. Say I have three records: doc long name=id1/long str name=mediaBLU-RAY/str str name=titleIndiana Jones and the Kingdom of the Crystal Skull/str /doc doc long name=id2/long str name=mediaDVD/str str name=titleIndiana Jones and the Kingdom of the Crystal Skull/str /doc doc long name=id3/long str name=mediaDVD/str str name=titleCasino Royale/str /doc Now, if I search for indiana: select?q=indiana I want the first two rows to come back (not the third as it does not contain 'indiana'). I would like record 2 to be scored higher than record 1 as it's media type is DVD. At the moment I have in my config: str name=qftitle/str And i was trying to boost by media having a specific value by using 'bq' but from what you told me that is incorrect. Cheers, Pete On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote: Pete, Sorry, if wasnt clear. Here is the explanation. Suppose you have 2 records and they have films and media as 2 columns. Now first record has values like films=Indiana and media=blue ray and 2nd record has values like films=Bond and media=Indiana Values for qf parameters str name=qfmedia^2.0 films^1.0/str Now, search for q=Indiana .. it should display both of the records but record #2 will display above than the 1st. Let me know if you still have questions. Cheers, amit Pete Smith-3 wrote: Hi Amit, Thanks very much for your reply. What you said makes things a bit clearer but I am still a bit confused. On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote: If you want to boost the records with their field value then you must use q query parameter instead of q.alt. 'q' parameter actually uses qf parameters from solrConfig for field boosting. From the documentation for Dismax queries, I thought that q is simply a keyword parameter: From http://wiki.apache.org/solr/DisMaxRequestHandler: q The guts of the search defining the main query. This is designed to be support raw input strings provided by users with no special escaping. '+' and '-' characters are treated as mandatory and prohibited modifiers for the subsequent terms. Text wrapped in balanced quote characters '' are treated as phrases, any query containing an odd number of quote characters is evaluated as if there were no quote characters at all. Wildcards in this q parameter are not supported. And I thought 'qf' is a list of fields and boost scores: From http://wiki.apache.org/solr/DisMaxRequestHandler: qf (Query Fields) List of fields and the boosts to associate with each of them when building DisjunctionMaxQueries from the user's query. The format supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that fieldOne has a boost of 2.3, fieldTwo has the default boost, and fieldThree has a boost of 0.4 ... this indicates that matches in fieldOne are much more significant than matches in fieldTwo, which are more significant than matches in fieldThree. But if I want to, say, search for films with 'indiana' in the title, with media=DVD scoring higher than media=BLU-RAY then do I need to do something like: solr/select?q=indiana And in my config: str name=qfmedia^2/str But I don't see where the actual *contents* of the media field would determine the boost. Sorry if I have misunderstood what you mean. Cheers, Pete Pete Smith-3 wrote: Hi, I have managed to build an index in Solr which I can search on keyword, produce facets, query facets etc. This is all working great. I have implemented my search using a dismax query so it searches predetermined fields. However, my results are coming back sorted by score which appears to be calculated by keyword relevancy only. I would like to adjust the score where fields have pre-determined values. I think I can do this with boost query and boost functions but the documentation here: http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3 Is not particularly helpful. I tried adding adding a bq argument to my search: bq=media:DVD^2 (yes, this is an index of films!) but I find when I start adding more
How to correctly boost results in Solr Dismax query
Hi, I have managed to build an index in Solr which I can search on keyword, produce facets, query facets etc. This is all working great. I have implemented my search using a dismax query so it searches predetermined fields. However, my results are coming back sorted by score which appears to be calculated by keyword relevancy only. I would like to adjust the score where fields have pre-determined values. I think I can do this with boost query and boost functions but the documentation here: http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3 Is not particularly helpful. I tried adding adding a bq argument to my search: bq=media:DVD^2 (yes, this is an index of films!) but I find when I start adding more and more: bq=media:DVD^2bq=media:BLU-RAY^1.5 I find the negative results - e.g. films that are DVD but are not BLU-RAY get negatively affected in their score. In the end it all seems to even out and my score is as it was before i started boosting. I must be doing this wrong and I wonder whether boost function comes in somewhere. Any ideas on how to correctly use boost? Cheers, Pete -- Pete Smith Developer No.9 | 6 Portal Way | London | W3 6RU | T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111 LOVEFiLM.com