: I have a question around boosting. I wanted to use the &boost= to write a : nested query that will boost a document based on categorical preferences.
You have no idea how stoked I am to see you working on this in a real world application. : Currently I have the weights set to the z-score equivalent of a user's : preference for that category which is simply how many standard deviations : above the global average is this user's preference for that movie category. : : My question though is basically whether or not semantically the equation : query(category:Drama)*<some weight> + query(category:Comedy)*<some weight> : + query(category:Action)*<some weight> makes sense? My gut says that your apprach makes sense -- but if i'm understadning you correclty, i think that you need to add "1" to all your weights: the "boost" is a multiplier, so if someone's rating for every category is is 0 std devs above the average rating (ie: the most average person imaginable), you don't wnat to give every moving in every category a score of 0. Are you picking the "top 3" categories the user prefers as a cut off, or are you arbitrarily using N category boosts for however many N categories the user is above the global average in their pref for that category? Are your prefrences coming from explicit user feedback on the categories (ie: "rate how much you like comedies on a scale of 1-5") or are you infering it from user ratings of the movies themselves? (ie: "rate this movie, which happens to be an scifi,action,comedy, on a scale of 1-5") ... because if it's hte later you probably want to be careful to also normalize based on how many categories the movie is in. the other thing to consider is wether you want to include "negative prefrences" (ie: weights less then 1) based on how many std dev the user's average is *below* the global average for a category .. in this case i *think* you'd want to divide the raw value from -1 to get a useful multiplier. Alternatively: you oculd experiment with using the weights as exponents instead of multipliers... b=sum(pow(query($cat1),1.482),pow(query($cat2),0.1199),pow(query($cat3),1.448)) ...that would simplify the math you'd have to worry about both for the "totally boring average user" (x**0 = 1) and for the categories users hate (x**-5 = some positive fraction that will act as a penalty) ... but you'd definitley need to run some tests to see if it "over boosts" as the std dev variations get really high (might want to take a root first before using them as the exponent) -Hoss