Re: Weighted facet strings

2011-08-09 Thread Chris Hostetter

: Subject: Weighted facet strings

First off: a terminology clarification.  what you are describing has very
little to do with facets.  it's true that your category field is a
facet of your documents, but in the context of your question, you aren't
asking about any facet related features of solr.
 
what you are really asking about is specifying weighted importance on
individual values indexed in the category field of your documents.

The suggestion in another reply to use use multiple fields (cat_weight_1, 
cat_weight_2, etc...) and then boost those fields accordingly is a 
classic, easy to implement solution to this type of problem that works 
relaly well when the cardinality of weights is low and fixed (in your 
case 1-5)

Another way people have dealt with problems like this historicly is to 
keyword stuff the category field -- so if a document has category 
weights: foo=5, bar=3 yak=1 you index foo foo foo foo foo bar bar bar 
yak in the category field.  As long as you use a similarity that defines 
tf() as an identity function, and doesn't use length norm, this also works 
really well.  (There are also tricks you can do using custom update 
processors or tokenizers to let you send foo=5 over the wire and have it 
index the foo token with a termFreq of 5)

Looking forward: the best way to solve this problem in theory is using 
Payloads, but there aren't a lot of options currently availbable for 
leveraging payloads in Solrs query APIs / Parsers, so you'd probably have 
to write something custom.


How you actaully execute the queries depends on hte approach you take at 
indexig -- lets assume you do the keyword stuffing approach...

: - filter: category=some_category_name, query: *.*  - Results should be score 
by 
: the above mentioned weight

q=cat:some_category_name 
 sort=score desc

...with a simple tf() func the default score will do exactly what you want

of you could use the same {!boost} solution as below with *:* 

: - filter: category=some_category_name, query: some_keyword - Results should 
be 
: scored by a combination of the score of 'some_keyword' and the above 
mentioned 
: weight

you just have to define what you mean by combination in terms of solr 
query functions.  easies is multiplicitively with the {!boost} parser...

q={!boost b=tf(cat,'some_category_name')}some_keyword 
 fq=cat:some_category_name 
 sort = score desc

: - filter: none, query: some_category_name - Documents with category 
: 'some_category_name' should be found as well as documents which contain the 
term 
: 'some_category_name'. Results should be scored by a combination of the score 
of 
: 'some_keyword' and the above mentioned weight

...you could do this by including your category field in the qf of a 
dismax search.

assuming you want a isngle solution that works for all of these, and your 
query: some_keyword example includes the possibility that some_keyword 
is also a cateogry name (and you want it's weight taking it account as 
well) then an all inclusive solution would probably be something like...

q={!boost b=tf(cat,'some_category_name') defType=}some_keyword
 qf = cat^10 otherfields^5
 fq=cat:some_category_name
 sort = score desc




-Hoss


Re: Weighted facet strings

2011-08-08 Thread Jonathan Rochkind
One kind of hacky way to accomplish some of those tasks involves 
creating a lot more Solr fields. (This kind of 'de-normalization' is 
often the answer to how to make Solr do something).


So facet fields are ordinarily not tokenized or normalized at all. But 
that doesn't work very well for matching query terms.  So if you want 
actual queries to match on these categories, you probably want an 
additional field that is tokenized/analyzed.  If you want to boost 
different category assignments differently, you probably want _multiple_ 
additional tokenized/analyzed fields.


So for instance, create separate analyzed fields for each category 
'weight', perhaps using the default 'text' analysis type.


categor_text_weight_1
category_text_weight_2
etc

Then use dismax to query, include all those category_text_* fields in 
the 'qf', and boost the higher weight ones more than the lower weight ones.


That will handle a number of your use cases, but not all of them.

Your first two cases are the most problematic:

filter: category=some_category_name, query: *.* - Results should be 
score by the above mentioned weight 


So Solr doesn't really work like that. Normally a filter does not effect 
the scoring of the actual results _at all_. But if you change the query to:


fq=category:some_category
q=some_category
defType=dismax
qf=category_text_weight1, category_text_weight2^10, 
category_text_weight3^20


THEN, with the multiple analyzed category_text_weight_* fields, as 
described above, I think it should do what you want. You may have to 
play with exactly what boost to give to each field.


But your second use case is still tricky.

Solr doesn't really do exactly what you ask, but by using this method I 
think you can figure out hacky ways to accomplish it.  I'm not sure if 
it will solve all of your use cases, but maybe this will give you a 
start to figuring it out.



On 8/5/2011 6:55 AM, Michael Lorz wrote:

Hi all,

I have documents which are (manually) tagged whith categories. Each
category-document relation has a weight between 1 and 5:

5: document fits perfectly in this category,
.
.
1: document may be considered as belonging to this category.


I would now like to use this information with solr. At the moment, I don't use
the weight at all:

field name=category type=string indexed=true stored=true
multiValued=true/

Both the category as well as the document body are specified as query fields
(str name=qf  in solrconfig.xml).


What I would like is the following:

- filter: category=some_category_name, query: *.*  - Results should be score by
the above mentioned weight
- filter: category=some_category_name, query: some_keyword - Results should be
scored by a combination of the score of 'some_keyword' and the above mentioned
weight
- filter: none, query: some_category_name - Documents with category
'some_category_name' should be found as well as documents which contain the term
'some_category_name'. Results should be scored by a combination of the score of
'some_keyword' and the above mentioned weight


Do you have any ideas how this could be done?

Thanks in advance
Michi


Weighted facet strings

2011-08-05 Thread Michael Lorz
Hi all,

I have documents which are (manually) tagged whith categories. Each 
category-document relation has a weight between 1 and 5: 

5: document fits perfectly in this category,
.
. 
1: document may be considered as belonging to this category. 


I would now like to use this information with solr. At the moment, I don't use 
the weight at all:

field name=category type=string indexed=true stored=true 
multiValued=true/

Both the category as well as the document body are specified as query fields 
(str name=qf in solrconfig.xml).


What I would like is the following:

- filter: category=some_category_name, query: *.*  - Results should be score by 
the above mentioned weight
- filter: category=some_category_name, query: some_keyword - Results should be 
scored by a combination of the score of 'some_keyword' and the above mentioned 
weight
- filter: none, query: some_category_name - Documents with category 
'some_category_name' should be found as well as documents which contain the 
term 
'some_category_name'. Results should be scored by a combination of the score of 
'some_keyword' and the above mentioned weight


Do you have any ideas how this could be done?

Thanks in advance
Michi