: I have a question around boosting. I wanted to use the &boost= to write a
: nested query that will boost a document based on categorical preferences.

You have no idea how stoked I am to see you working on this in a real 
world application.

: Currently I have the weights set to the z-score equivalent of a user's
: preference for that category which is simply how many standard deviations
: above the global average is this user's preference for that movie category.
: 
: My question though is basically whether or not semantically the equation
: query(category:Drama)*<some weight> + query(category:Comedy)*<some weight>
: + query(category:Action)*<some weight> makes sense?

My gut says that your apprach makes sense -- but if i'm 
understadning you correclty, i think that you need to add "1" to 
all your weights: the "boost" is a multiplier, so if someone's rating for 
every category is is 0 std devs above the average rating (ie: the most 
average person imaginable), you don't wnat to give every moving in every 
category a score of 0.

Are you picking the "top 3" categories the user prefers as a cut off, or 
are you arbitrarily using N category boosts for however many N categories 
the user is above the global average in their pref for that category?

Are your prefrences coming from explicit user feedback on the categories 
(ie: "rate how much you like comedies on a scale of 1-5") or are you 
infering it from user ratings of the movies themselves? (ie: "rate this 
movie, which happens to be an scifi,action,comedy, on a scale of 1-5") ... 
because if it's hte later you probably want to be careful to also 
normalize based on how many categories the movie is in.

the other thing to consider is wether you want to include "negative 
prefrences" (ie: weights less then 1) based on how many std dev the user's 
average is *below* the global average for a category .. in this case i 
*think* you'd want to divide the raw value from -1 to get a useful 
multiplier.

Alternatively: you oculd experiment with using the weights as exponents 
instead of multipliers...

b=sum(pow(query($cat1),1.482),pow(query($cat2),0.1199),pow(query($cat3),1.448))

...that would simplify the math you'd have to worry about both for the 
"totally boring average user" (x**0 = 1) and for the categories users hate 
(x**-5 = some positive fraction that will act as a penalty) ... but you'd 
definitley need to run some tests to see if it "over boosts" as the std 
dev variations get really high (might want to take a root first before 
using them as the exponent)



-Hoss

Reply via email to