Re: Weighted facet strings
: Subject: Weighted facet strings First off: a terminology clarification. what you are describing has very little to do with facets. it's true that your category field is a facet of your documents, but in the context of your question, you aren't asking about any facet related features of solr. what you are really asking about is specifying weighted importance on individual values indexed in the category field of your documents. The suggestion in another reply to use use multiple fields (cat_weight_1, cat_weight_2, etc...) and then boost those fields accordingly is a classic, easy to implement solution to this type of problem that works relaly well when the cardinality of weights is low and fixed (in your case 1-5) Another way people have dealt with problems like this historicly is to keyword stuff the category field -- so if a document has category weights: foo=5, bar=3 yak=1 you index foo foo foo foo foo bar bar bar yak in the category field. As long as you use a similarity that defines tf() as an identity function, and doesn't use length norm, this also works really well. (There are also tricks you can do using custom update processors or tokenizers to let you send foo=5 over the wire and have it index the foo token with a termFreq of 5) Looking forward: the best way to solve this problem in theory is using Payloads, but there aren't a lot of options currently availbable for leveraging payloads in Solrs query APIs / Parsers, so you'd probably have to write something custom. How you actaully execute the queries depends on hte approach you take at indexig -- lets assume you do the keyword stuffing approach... : - filter: category=some_category_name, query: *.* - Results should be score by : the above mentioned weight q=cat:some_category_name sort=score desc ...with a simple tf() func the default score will do exactly what you want of you could use the same {!boost} solution as below with *:* : - filter: category=some_category_name, query: some_keyword - Results should be : scored by a combination of the score of 'some_keyword' and the above mentioned : weight you just have to define what you mean by combination in terms of solr query functions. easies is multiplicitively with the {!boost} parser... q={!boost b=tf(cat,'some_category_name')}some_keyword fq=cat:some_category_name sort = score desc : - filter: none, query: some_category_name - Documents with category : 'some_category_name' should be found as well as documents which contain the term : 'some_category_name'. Results should be scored by a combination of the score of : 'some_keyword' and the above mentioned weight ...you could do this by including your category field in the qf of a dismax search. assuming you want a isngle solution that works for all of these, and your query: some_keyword example includes the possibility that some_keyword is also a cateogry name (and you want it's weight taking it account as well) then an all inclusive solution would probably be something like... q={!boost b=tf(cat,'some_category_name') defType=}some_keyword qf = cat^10 otherfields^5 fq=cat:some_category_name sort = score desc -Hoss
Re: Weighted facet strings
One kind of hacky way to accomplish some of those tasks involves creating a lot more Solr fields. (This kind of 'de-normalization' is often the answer to how to make Solr do something). So facet fields are ordinarily not tokenized or normalized at all. But that doesn't work very well for matching query terms. So if you want actual queries to match on these categories, you probably want an additional field that is tokenized/analyzed. If you want to boost different category assignments differently, you probably want _multiple_ additional tokenized/analyzed fields. So for instance, create separate analyzed fields for each category 'weight', perhaps using the default 'text' analysis type. categor_text_weight_1 category_text_weight_2 etc Then use dismax to query, include all those category_text_* fields in the 'qf', and boost the higher weight ones more than the lower weight ones. That will handle a number of your use cases, but not all of them. Your first two cases are the most problematic: filter: category=some_category_name, query: *.* - Results should be score by the above mentioned weight So Solr doesn't really work like that. Normally a filter does not effect the scoring of the actual results _at all_. But if you change the query to: fq=category:some_category q=some_category defType=dismax qf=category_text_weight1, category_text_weight2^10, category_text_weight3^20 THEN, with the multiple analyzed category_text_weight_* fields, as described above, I think it should do what you want. You may have to play with exactly what boost to give to each field. But your second use case is still tricky. Solr doesn't really do exactly what you ask, but by using this method I think you can figure out hacky ways to accomplish it. I'm not sure if it will solve all of your use cases, but maybe this will give you a start to figuring it out. On 8/5/2011 6:55 AM, Michael Lorz wrote: Hi all, I have documents which are (manually) tagged whith categories. Each category-document relation has a weight between 1 and 5: 5: document fits perfectly in this category, . . 1: document may be considered as belonging to this category. I would now like to use this information with solr. At the moment, I don't use the weight at all: field name=category type=string indexed=true stored=true multiValued=true/ Both the category as well as the document body are specified as query fields (str name=qf in solrconfig.xml). What I would like is the following: - filter: category=some_category_name, query: *.* - Results should be score by the above mentioned weight - filter: category=some_category_name, query: some_keyword - Results should be scored by a combination of the score of 'some_keyword' and the above mentioned weight - filter: none, query: some_category_name - Documents with category 'some_category_name' should be found as well as documents which contain the term 'some_category_name'. Results should be scored by a combination of the score of 'some_keyword' and the above mentioned weight Do you have any ideas how this could be done? Thanks in advance Michi
Weighted facet strings
Hi all, I have documents which are (manually) tagged whith categories. Each category-document relation has a weight between 1 and 5: 5: document fits perfectly in this category, . . 1: document may be considered as belonging to this category. I would now like to use this information with solr. At the moment, I don't use the weight at all: field name=category type=string indexed=true stored=true multiValued=true/ Both the category as well as the document body are specified as query fields (str name=qf in solrconfig.xml). What I would like is the following: - filter: category=some_category_name, query: *.* - Results should be score by the above mentioned weight - filter: category=some_category_name, query: some_keyword - Results should be scored by a combination of the score of 'some_keyword' and the above mentioned weight - filter: none, query: some_category_name - Documents with category 'some_category_name' should be found as well as documents which contain the term 'some_category_name'. Results should be scored by a combination of the score of 'some_keyword' and the above mentioned weight Do you have any ideas how this could be done? Thanks in advance Michi