Function query result as a filter query

2009-09-22 Thread Pete Smith
Hi,

Is it possible to constrain a resultset using a filter query to only
return the top 100 documents for a particular field?

Say I have a field called 'hits' that has the total number of hits for
that item. I want to return only the documents that have the top 100
highest hits.

I want something like this:

fq=ord(hits):[* TO 100]

But that does not appear to work - I don't think I can use a function
query for the source of a query. I want it as a filter query so I can
also use it as a facet query.

Cheers,
Pete



Using a function in a filter query

2009-04-20 Thread Pete Smith

I want to filter my result set before I search. I know the correct way
to do this is by using the filter query (fq) parameter. However, I want
to filter based on the output of a function performed on a field.

I have a field 'rating' which is an integer in the range of 1 to ~75000.
The upper limit may change. I want to filter to the top 500 items with
the highest 'rating'. In SQL this would be something like:

... ORDER BY rating DESC LIMIT 500

I think I can get the documents in solr ranked by rating descending by
using the function rord(rating), so basically I would like to do:

fq=rord(rating):[0 TO 500]

But that does not seem possible. Does anyone know what else I could do?



-- 
Pete Smith
Senior Developer

No.9 | 6 Portal Way | London | W3 6RU |
T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111

LOVEFiLM.com


using NGramTokenizerFactory for partial matching

2009-04-07 Thread Pete Smith
Hi,

I want to use the NGramTokenizerFactory tokeniser to enable partial
matching on a field in my index. For instance for the field:

Lorem ipsum

I want it to match lor lorem and lorem i. However I am finding it
matches the first two but not the third - the white space is causing
problems. Here are the relevant parts of my config: 

fieldType name=text_substring class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.NGramTokenizerFactory
minGramSize=3 maxGramSize=15 /  
filter class=solr.LowerCaseFilterFactory/  
  /analyzer
/fieldType

field name=title_partial type=text_substring indexed=true
stored=true required=true /

I believe it is due to the mingramsize setting and that is applying to
each word. Can anyone tell me how I can support what I want to do?

Cheers,
Pete

-- 
Pete Smith
Developer

No.9 | 6 Portal Way | London | W3 6RU |
T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111

LOVEFiLM.com


RE: How to correctly boost results in Solr Dismax query

2009-03-16 Thread Pete Smith
Thank you Dean. I thought I was on the right track with BQ but it was
the skewing of results that was frustrating me. I'll try out your
suggestion.

Cheers,
Pete

On Mon, 2009-03-16 at 10:29 +0800, Dean Missikowski (Consultant), CLSA
wrote:
 Hi,
 
 My experience is that the BQ parameter can be used with any query type.
 You can define boosts on the query fields (qf) that are used with the
 query terms (q) in your query, AND you can define additional boosts for
 fields that are not used with the query terms through the bq or bf
 parameters. 
 
 I think the relative weight that assigning a particular boost to a field
 via BQ has on the overall scoring needs to take into consideration the
 other fields in your query. If you're searching on titles, you might
 want to consider setting omitNorms=true (means don't generate length
 normalization vectors) for title in your schema.xml, and if you're using
 Solr 1.4 omitTf=true (means don't generate term frequency vectors), so
 that results aren't skewed by short and long titles, or titles that
 contain multiple occurrences of the same term (setting these requires
 you to reindex). I think this should have the effect of making BQ boosts
 like bq=media:DVD^2bq=media:BLU-RAY^1.5 more effective. 
 
 -- Dean
 
 -Original Message-
 From: Pete Smith [mailto:pete.sm...@lovefilm.com] 
 Sent: 13/03/2009 7:11 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to correctly boost results in Solr Dismax query
 
 Hi,
 
 On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote:
  bq works only with q.alt query and not with q queries. So, in your
 case you
  would be using qf parameter for field boosting, you will have to give
 both
  the fields in qf parameter i.e. both title and media.
  
  try this
  
  str name=qfmedia^1.0 title^100.0/str
 
 But with that, how will it know to rank media:DVD higher than
 media:BLU-RAY?
 
 Cheers,
 Pete
 
 
  Pete Smith-3 wrote:
   
   Hi Amit,
   
   Thanks again for your reply. I am understanding it a bit better but
 I
   think it would help if I posted an example. Say I have three
 records:
   
   doc
   long name=id1/long
   str name=mediaBLU-RAY/str
   str name=titleIndiana Jones and the Kingdom of the Crystal
   Skull/str
   /doc
   doc
   long name=id2/long
   str name=mediaDVD/str
   str name=titleIndiana Jones and the Kingdom of the Crystal
   Skull/str
   /doc
   doc
   long name=id3/long
   str name=mediaDVD/str
   str name=titleCasino Royale/str
   /doc
   
   Now, if I search for indiana: select?q=indiana
   
   I want the first two rows to come back (not the third as it does not
   contain 'indiana'). I would like record 2 to be scored higher than
   record 1 as it's media type is DVD.
   
   At the moment I have in my config:
   
   str name=qftitle/str
   
   And i was trying to boost by media having a specific value by using
 'bq'
   but from what you told me that is incorrect.
   
   Cheers,
   Pete
   
   
   On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
   Pete,
   
   Sorry, if wasnt clear. Here is the explanation.
   
   Suppose you have 2 records and they have films and media as 2
 columns.
   
   Now first record has values like films=Indiana and media=blue
 ray
   and 2nd record has values like films=Bond and media=Indiana
   
   Values for qf parameters
   
   str name=qfmedia^2.0 films^1.0/str
   
   Now, search for q=Indiana .. it should display both of the records
 but
   record #2 will display above than the 1st.
   
   Let me know if you still have questions.
   
   Cheers,
   amit
   
   
   Pete Smith-3 wrote:

Hi Amit,

Thanks very much for your reply. What you said makes things a bit
clearer but I am still a bit confused.

On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
If you want to boost the records with their field value then you
 must
   use
q
query parameter instead of q.alt. 'q' parameter actually uses qf
parameters
from solrConfig for field boosting.

   From the documentation for Dismax queries, I thought that q is
 simply
a keyword parameter:

   From http://wiki.apache.org/solr/DisMaxRequestHandler:
q
The guts of the search defining the main query. This is
 designed to
   be
support raw input strings provided by users with no special
 escaping.
'+' and '-' characters are treated as mandatory and
 prohibited
modifiers for the subsequent terms. Text wrapped in balanced
 quote
characters '' are treated as phrases, any query containing an
 odd
number of quote characters is evaluated as if there were no quote
characters at all. Wildcards in this q parameter are not
 supported. 

And I thought 'qf' is a list of fields and boost scores:

   From http://wiki.apache.org/solr/DisMaxRequestHandler:
qf (Query Fields)
List of fields and the boosts to associate with each of them
 when
building DisjunctionMaxQueries from the user's query. The format
supported is fieldOne^2.3 fieldTwo

Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread Pete Smith
Hi Amit,

Thanks very much for your reply. What you said makes things a bit
clearer but I am still a bit confused.

On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
 If you want to boost the records with their field value then you must use q
 query parameter instead of q.alt. 'q' parameter actually uses qf parameters
 from solrConfig for field boosting.

From the documentation for Dismax queries, I thought that q is simply
a keyword parameter:

From http://wiki.apache.org/solr/DisMaxRequestHandler:
q
The guts of the search defining the main query. This is designed to be
support raw input strings provided by users with no special escaping.
'+' and '-' characters are treated as mandatory and prohibited
modifiers for the subsequent terms. Text wrapped in balanced quote
characters '' are treated as phrases, any query containing an odd
number of quote characters is evaluated as if there were no quote
characters at all. Wildcards in this q parameter are not supported. 

And I thought 'qf' is a list of fields and boost scores:

From http://wiki.apache.org/solr/DisMaxRequestHandler:
qf (Query Fields)
List of fields and the boosts to associate with each of them when
building DisjunctionMaxQueries from the user's query. The format
supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
fieldOne has a boost of 2.3, fieldTwo has the default boost, and
fieldThree has a boost of 0.4 ... this indicates that matches in
fieldOne are much more significant than matches in fieldTwo, which are
more significant than matches in fieldThree. 

But if I want to, say, search for films with 'indiana' in the title,
with media=DVD scoring higher than media=BLU-RAY then do I need to do
something like:

solr/select?q=indiana

And in my config:

str name=qfmedia^2/str

But I don't see where the actual *contents* of the media field would
determine the boost.

Sorry if I have misunderstood what you mean.

Cheers,
Pete

 Pete Smith-3 wrote:
  
  Hi,
  
  I have managed to build an index in Solr which I can search on keyword,
  produce facets, query facets etc. This is all working great. I have
  implemented my search using a dismax query so it searches predetermined
  fields.
  
  However, my results are coming back sorted by score which appears to be
  calculated by keyword relevancy only. I would like to adjust the score
  where fields have pre-determined values. I think I can do this with
  boost query and boost functions but the documentation here:
  
  http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
  
  Is not particularly helpful. I tried adding adding a bq argument to my
  search: 
  
  bq=media:DVD^2
  
  (yes, this is an index of films!) but I find when I start adding more
  and more:
  
  bq=media:DVD^2bq=media:BLU-RAY^1.5
  
  I find the negative results - e.g. films that are DVD but are not
  BLU-RAY get negatively affected in their score. In the end it all seems
  to even out and my score is as it was before i started boosting.
  
  I must be doing this wrong and I wonder whether boost function comes
  in somewhere. Any ideas on how to correctly use boost?
  
  Cheers,
  Pete
  
  -- 
  Pete Smith
  Developer
  
  No.9 | 6 Portal Way | London | W3 6RU |
  T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
  
  LOVEFiLM.com
  
  
 
-- 
Pete Smith
Developer

No.9 | 6 Portal Way | London | W3 6RU |
T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111

LOVEFiLM.com


Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread Pete Smith
Hi Amit,

Thanks again for your reply. I am understanding it a bit better but I
think it would help if I posted an example. Say I have three records:

doc
long name=id1/long
str name=mediaBLU-RAY/str
str name=titleIndiana Jones and the Kingdom of the Crystal
Skull/str
/doc
doc
long name=id2/long
str name=mediaDVD/str
str name=titleIndiana Jones and the Kingdom of the Crystal
Skull/str
/doc
doc
long name=id3/long
str name=mediaDVD/str
str name=titleCasino Royale/str
/doc

Now, if I search for indiana: select?q=indiana

I want the first two rows to come back (not the third as it does not
contain 'indiana'). I would like record 2 to be scored higher than
record 1 as it's media type is DVD.

At the moment I have in my config:

str name=qftitle/str

And i was trying to boost by media having a specific value by using 'bq'
but from what you told me that is incorrect.

Cheers,
Pete


On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
 Pete,
 
 Sorry, if wasnt clear. Here is the explanation.
 
 Suppose you have 2 records and they have films and media as 2 columns.
 
 Now first record has values like films=Indiana and media=blue ray
 and 2nd record has values like films=Bond and media=Indiana
 
 Values for qf parameters
 
 str name=qfmedia^2.0 films^1.0/str
 
 Now, search for q=Indiana .. it should display both of the records but
 record #2 will display above than the 1st.
 
 Let me know if you still have questions.
 
 Cheers,
 amit
 
 
 Pete Smith-3 wrote:
  
  Hi Amit,
  
  Thanks very much for your reply. What you said makes things a bit
  clearer but I am still a bit confused.
  
  On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
  If you want to boost the records with their field value then you must use
  q
  query parameter instead of q.alt. 'q' parameter actually uses qf
  parameters
  from solrConfig for field boosting.
  
 From the documentation for Dismax queries, I thought that q is simply
  a keyword parameter:
  
 From http://wiki.apache.org/solr/DisMaxRequestHandler:
  q
  The guts of the search defining the main query. This is designed to be
  support raw input strings provided by users with no special escaping.
  '+' and '-' characters are treated as mandatory and prohibited
  modifiers for the subsequent terms. Text wrapped in balanced quote
  characters '' are treated as phrases, any query containing an odd
  number of quote characters is evaluated as if there were no quote
  characters at all. Wildcards in this q parameter are not supported. 
  
  And I thought 'qf' is a list of fields and boost scores:
  
 From http://wiki.apache.org/solr/DisMaxRequestHandler:
  qf (Query Fields)
  List of fields and the boosts to associate with each of them when
  building DisjunctionMaxQueries from the user's query. The format
  supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
  fieldOne has a boost of 2.3, fieldTwo has the default boost, and
  fieldThree has a boost of 0.4 ... this indicates that matches in
  fieldOne are much more significant than matches in fieldTwo, which are
  more significant than matches in fieldThree. 
  
  But if I want to, say, search for films with 'indiana' in the title,
  with media=DVD scoring higher than media=BLU-RAY then do I need to do
  something like:
  
  solr/select?q=indiana
  
  And in my config:
  
  str name=qfmedia^2/str
  
  But I don't see where the actual *contents* of the media field would
  determine the boost.
  
  Sorry if I have misunderstood what you mean.
  
  Cheers,
  Pete
  
  Pete Smith-3 wrote:
   
   Hi,
   
   I have managed to build an index in Solr which I can search on keyword,
   produce facets, query facets etc. This is all working great. I have
   implemented my search using a dismax query so it searches predetermined
   fields.
   
   However, my results are coming back sorted by score which appears to be
   calculated by keyword relevancy only. I would like to adjust the score
   where fields have pre-determined values. I think I can do this with
   boost query and boost functions but the documentation here:
   
  
  http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
   
   Is not particularly helpful. I tried adding adding a bq argument to my
   search: 
   
   bq=media:DVD^2
   
   (yes, this is an index of films!) but I find when I start adding more
   and more:
   
   bq=media:DVD^2bq=media:BLU-RAY^1.5
   
   I find the negative results - e.g. films that are DVD but are not
   BLU-RAY get negatively affected in their score. In the end it all seems
   to even out and my score is as it was before i started boosting.
   
   I must be doing this wrong and I wonder whether boost function comes
   in somewhere. Any ideas on how to correctly use boost?
   
   Cheers,
   Pete
   
   -- 
   Pete Smith
   Developer
   
   No.9 | 6 Portal Way | London | W3 6RU |
   T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
   
   LOVEFiLM.com
   
   
  
  -- 
  Pete Smith
  Developer
  
  No.9

Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread Pete Smith
Hi,

On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote:
 bq works only with q.alt query and not with q queries. So, in your case you
 would be using qf parameter for field boosting, you will have to give both
 the fields in qf parameter i.e. both title and media.
 
 try this
 
 str name=qfmedia^1.0 title^100.0/str

But with that, how will it know to rank media:DVD higher than
media:BLU-RAY?

Cheers,
Pete


 Pete Smith-3 wrote:
  
  Hi Amit,
  
  Thanks again for your reply. I am understanding it a bit better but I
  think it would help if I posted an example. Say I have three records:
  
  doc
  long name=id1/long
  str name=mediaBLU-RAY/str
  str name=titleIndiana Jones and the Kingdom of the Crystal
  Skull/str
  /doc
  doc
  long name=id2/long
  str name=mediaDVD/str
  str name=titleIndiana Jones and the Kingdom of the Crystal
  Skull/str
  /doc
  doc
  long name=id3/long
  str name=mediaDVD/str
  str name=titleCasino Royale/str
  /doc
  
  Now, if I search for indiana: select?q=indiana
  
  I want the first two rows to come back (not the third as it does not
  contain 'indiana'). I would like record 2 to be scored higher than
  record 1 as it's media type is DVD.
  
  At the moment I have in my config:
  
  str name=qftitle/str
  
  And i was trying to boost by media having a specific value by using 'bq'
  but from what you told me that is incorrect.
  
  Cheers,
  Pete
  
  
  On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
  Pete,
  
  Sorry, if wasnt clear. Here is the explanation.
  
  Suppose you have 2 records and they have films and media as 2 columns.
  
  Now first record has values like films=Indiana and media=blue ray
  and 2nd record has values like films=Bond and media=Indiana
  
  Values for qf parameters
  
  str name=qfmedia^2.0 films^1.0/str
  
  Now, search for q=Indiana .. it should display both of the records but
  record #2 will display above than the 1st.
  
  Let me know if you still have questions.
  
  Cheers,
  amit
  
  
  Pete Smith-3 wrote:
   
   Hi Amit,
   
   Thanks very much for your reply. What you said makes things a bit
   clearer but I am still a bit confused.
   
   On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
   If you want to boost the records with their field value then you must
  use
   q
   query parameter instead of q.alt. 'q' parameter actually uses qf
   parameters
   from solrConfig for field boosting.
   
  From the documentation for Dismax queries, I thought that q is simply
   a keyword parameter:
   
  From http://wiki.apache.org/solr/DisMaxRequestHandler:
   q
   The guts of the search defining the main query. This is designed to
  be
   support raw input strings provided by users with no special escaping.
   '+' and '-' characters are treated as mandatory and prohibited
   modifiers for the subsequent terms. Text wrapped in balanced quote
   characters '' are treated as phrases, any query containing an odd
   number of quote characters is evaluated as if there were no quote
   characters at all. Wildcards in this q parameter are not supported. 
   
   And I thought 'qf' is a list of fields and boost scores:
   
  From http://wiki.apache.org/solr/DisMaxRequestHandler:
   qf (Query Fields)
   List of fields and the boosts to associate with each of them when
   building DisjunctionMaxQueries from the user's query. The format
   supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
   fieldOne has a boost of 2.3, fieldTwo has the default boost, and
   fieldThree has a boost of 0.4 ... this indicates that matches in
   fieldOne are much more significant than matches in fieldTwo, which are
   more significant than matches in fieldThree. 
   
   But if I want to, say, search for films with 'indiana' in the title,
   with media=DVD scoring higher than media=BLU-RAY then do I need to do
   something like:
   
   solr/select?q=indiana
   
   And in my config:
   
   str name=qfmedia^2/str
   
   But I don't see where the actual *contents* of the media field would
   determine the boost.
   
   Sorry if I have misunderstood what you mean.
   
   Cheers,
   Pete
   
   Pete Smith-3 wrote:

Hi,

I have managed to build an index in Solr which I can search on
  keyword,
produce facets, query facets etc. This is all working great. I have
implemented my search using a dismax query so it searches
  predetermined
fields.

However, my results are coming back sorted by score which appears to
  be
calculated by keyword relevancy only. I would like to adjust the
  score
where fields have pre-determined values. I think I can do this with
boost query and boost functions but the documentation here:

   
  
  http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3

Is not particularly helpful. I tried adding adding a bq argument to
  my
search: 

bq=media:DVD^2

(yes, this is an index of films!) but I find when I start adding
  more

How to correctly boost results in Solr Dismax query

2009-03-12 Thread Pete Smith
Hi,

I have managed to build an index in Solr which I can search on keyword,
produce facets, query facets etc. This is all working great. I have
implemented my search using a dismax query so it searches predetermined
fields.

However, my results are coming back sorted by score which appears to be
calculated by keyword relevancy only. I would like to adjust the score
where fields have pre-determined values. I think I can do this with
boost query and boost functions but the documentation here:

http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3

Is not particularly helpful. I tried adding adding a bq argument to my
search: 

bq=media:DVD^2

(yes, this is an index of films!) but I find when I start adding more
and more:

bq=media:DVD^2bq=media:BLU-RAY^1.5

I find the negative results - e.g. films that are DVD but are not
BLU-RAY get negatively affected in their score. In the end it all seems
to even out and my score is as it was before i started boosting.

I must be doing this wrong and I wonder whether boost function comes
in somewhere. Any ideas on how to correctly use boost?

Cheers,
Pete

-- 
Pete Smith
Developer

No.9 | 6 Portal Way | London | W3 6RU |
T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111

LOVEFiLM.com