Re: Boosting documents by categorical preferences
Chris, Sounds good! Thanks for the tips.. I'll be glad to submit my talk to this as I have a writeup pretty much ready to go. Cheers Amit On Tue, Jan 28, 2014 at 11:24 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : The initial results seem to be kinda promising... of course there are many : more optimizations I could do like decay user ratings over time to indicate : that preferences decay over time so a 5 rating a year ago doesn't count as : much as a 5 rating today. : : Hope this helps others. I'll open source what I have soon and post back. If : there is feedback or other thoughts let me know! Hey Amit, Glad to hear your user based boosting experiments are paying off. I would definitely love to see a more detailed writeup down the road showing off how it affects your final user metrics -- or perhaps even give a session on your technique at ApacheCon? http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp -Hoss http://www.lucidworks.com/
Re: Boosting documents by categorical preferences
: The initial results seem to be kinda promising... of course there are many : more optimizations I could do like decay user ratings over time to indicate : that preferences decay over time so a 5 rating a year ago doesn't count as : much as a 5 rating today. : : Hope this helps others. I'll open source what I have soon and post back. If : there is feedback or other thoughts let me know! Hey Amit, Glad to hear your user based boosting experiments are paying off. I would definitely love to see a more detailed writeup down the road showing off how it affects your final user metrics -- or perhaps even give a session on your technique at ApacheCon? http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp -Hoss http://www.lucidworks.com/
Re: Boosting documents by categorical preferences
Hi Chris (and others interested in this), Sorry for dropping off.. I got sidetracked with other work and came back to this and finally got a V1 of this implemented. The final process is as follows: 1) Pre-compute the global categorical num_ratings/average/std-dev (so for Action the average rating may be 3.49 with stdDev of .99) 2) For a given user, retrieve the last X (X for me is 10) ratings and compute the user's categorical affinities by taking the average rating for all movies in that particular category (Action) subtract the global cat average and divide by cat std_dev. Furthermore, multiply this by the fraction of total user ratings in that category. - For example, if a user's last 10 ratings consisted of 9/10 Drama and 1/10 Thriller, the z-score of the Thriller should be discounted relative to that of the Drama so that it's more prominent the user's preference (either positive or negative) to Drama. 3) Sort by the absolute value of the z-score (Thanks Hossman.. great thought). 4) Return the top 3 (arbitrary number) 5) Modify the query to look like the following: qq=tom hanksq={!boost b=$b defType=edismax v=$qq}cat1=category:Childrencat2=category:Fantasycat3=category:Animationb=sum(1,sum(product(query($cat1),0.22267872),product(query($cat2),0.21630952),product(query($cat3),0.21120241))) basically b = 1+(pref1*query(category:something1) + pref2*query(category:something2) + pref3*query(category:something3)) The initial results seem to be kinda promising... of course there are many more optimizations I could do like decay user ratings over time to indicate that preferences decay over time so a 5 rating a year ago doesn't count as much as a 5 rating today. Hope this helps others. I'll open source what I have soon and post back. If there is feedback or other thoughts let me know! Cheers Amit On Fri, Nov 22, 2013 at 11:38 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I thought about that but my concern/question was how. If I used the pow : function then I'm still boosting the bad categories by a small : amount..alternatively I could multiply by a negative number but does that : work as expected? I'm not sure i understand your concern: negative powers would give you values less then 1, positive powers would give you values greater then 1, and then you'd use those values as multiplicitive boosts -- so the values less then 1 would penalize the scores of existing matching docs in the categories the user dislikes. Oh wait ... i see, in your original email (and in my subsequent suggested tweak to use pow()) you were talking about sum()ing up these 3 category boosts (and i cut/pasted sum() in my example as well) ... yeah, using multiplcation there would make more sense if you wanted to do the negative prefrences as well, because then then score of any matching doc will be reduced if it matches on an undesired category -- and the amount it will be reduced will be determined by how strongly it matches on that category (ie: the base score returned by the nested query() func) and how negative the undesired prefrence value (ie: the pow() exponent) is qq=... q={!boost b=$b v=$qq} b=prod(pow(query($cat1,cat1z)),pow(query($cat2,cat2z)),pow(query($cat3,cat3z)) cat1=...action... cat1z=1.48 cat2=...comedy... cat2z=1.33 cat3=...kids... cat3z=-1.7 -Hoss
Re: Boosting documents by categorical preferences
: I thought about that but my concern/question was how. If I used the pow : function then I'm still boosting the bad categories by a small : amount..alternatively I could multiply by a negative number but does that : work as expected? I'm not sure i understand your concern: negative powers would give you values less then 1, positive powers would give you values greater then 1, and then you'd use those values as multiplicitive boosts -- so the values less then 1 would penalize the scores of existing matching docs in the categories the user dislikes. Oh wait ... i see, in your original email (and in my subsequent suggested tweak to use pow()) you were talking about sum()ing up these 3 category boosts (and i cut/pasted sum() in my example as well) ... yeah, using multiplcation there would make more sense if you wanted to do the negative prefrences as well, because then then score of any matching doc will be reduced if it matches on an undesired category -- and the amount it will be reduced will be determined by how strongly it matches on that category (ie: the base score returned by the nested query() func) and how negative the undesired prefrence value (ie: the pow() exponent) is qq=... q={!boost b=$b v=$qq} b=prod(pow(query($cat1,cat1z)),pow(query($cat2,cat2z)),pow(query($cat3,cat3z)) cat1=...action... cat1z=1.48 cat2=...comedy... cat2z=1.33 cat3=...kids... cat3z=-1.7 -Hoss
Re: Boosting documents by categorical preferences
I thought about that but my concern/question was how. If I used the pow function then I'm still boosting the bad categories by a small amount..alternatively I could multiply by a negative number but does that work as expected? I haven't done much with negative boosting except for the sledgehammer approach of category exclusion through filters. Thanks Amit On Nov 19, 2013 8:51 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : My approach was something like: : 1) Look at the categories that the user has preferred and compute the : z-score : 2) Pick the top 3 among those : 3) Use those to boost search results. I think that totaly makes sense ... the additional bit i was suggesting that you consider is that instead of picking the highest 3 z-scores, pick the z-scores with the greatest absolute value ... that way if someone is a very booring person and their positive interests are all basically exactly the same as the mean for everyone else, but they have some very strong dis-interests you don't bother boosting on those miniscule interests and instead you negatively boost on the things they are antogonistic against. -Hoss
Re: Boosting documents by categorical preferences
: My approach was something like: : 1) Look at the categories that the user has preferred and compute the : z-score : 2) Pick the top 3 among those : 3) Use those to boost search results. I think that totaly makes sense ... the additional bit i was suggesting that you consider is that instead of picking the highest 3 z-scores, pick the z-scores with the greatest absolute value ... that way if someone is a very booring person and their positive interests are all basically exactly the same as the mean for everyone else, but they have some very strong dis-interests you don't bother boosting on those miniscule interests and instead you negatively boost on the things they are antogonistic against. -Hoss
Re: Boosting documents by categorical preferences
Hey Chris, Sorry for the delay and thanks for your response. This was inspired by your talk on boosting and biasing that you presented way back when at a meetup. I'm glad that my general approach seems to make sense. My approach was something like: 1) Look at the categories that the user has preferred and compute the z-score 2) Pick the top 3 among those 3) Use those to boost search results. I'll look at using the boosts as an exponent instead of a multiplier as I think that would make sense.. also as it handles the 0 case. This is for a prototype I am doing but I'll share the results one day in a meetup as I think it'll be kinda interesting. Thanks again Amit On Thu, Nov 14, 2013 at 11:11 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I have a question around boosting. I wanted to use the boost= to write a : nested query that will boost a document based on categorical preferences. You have no idea how stoked I am to see you working on this in a real world application. : Currently I have the weights set to the z-score equivalent of a user's : preference for that category which is simply how many standard deviations : above the global average is this user's preference for that movie category. : : My question though is basically whether or not semantically the equation : query(category:Drama)*some weight + query(category:Comedy)*some weight : + query(category:Action)*some weight makes sense? My gut says that your apprach makes sense -- but if i'm understadning you correclty, i think that you need to add 1 to all your weights: the boost is a multiplier, so if someone's rating for every category is is 0 std devs above the average rating (ie: the most average person imaginable), you don't wnat to give every moving in every category a score of 0. Are you picking the top 3 categories the user prefers as a cut off, or are you arbitrarily using N category boosts for however many N categories the user is above the global average in their pref for that category? Are your prefrences coming from explicit user feedback on the categories (ie: rate how much you like comedies on a scale of 1-5) or are you infering it from user ratings of the movies themselves? (ie: rate this movie, which happens to be an scifi,action,comedy, on a scale of 1-5) ... because if it's hte later you probably want to be careful to also normalize based on how many categories the movie is in. the other thing to consider is wether you want to include negative prefrences (ie: weights less then 1) based on how many std dev the user's average is *below* the global average for a category .. in this case i *think* you'd want to divide the raw value from -1 to get a useful multiplier. Alternatively: you oculd experiment with using the weights as exponents instead of multipliers... b=sum(pow(query($cat1),1.482),pow(query($cat2),0.1199),pow(query($cat3),1.448)) ...that would simplify the math you'd have to worry about both for the totally boring average user (x**0 = 1) and for the categories users hate (x**-5 = some positive fraction that will act as a penalty) ... but you'd definitley need to run some tests to see if it over boosts as the std dev variations get really high (might want to take a root first before using them as the exponent) -Hoss
Re: Boosting documents by categorical preferences
: I have a question around boosting. I wanted to use the boost= to write a : nested query that will boost a document based on categorical preferences. You have no idea how stoked I am to see you working on this in a real world application. : Currently I have the weights set to the z-score equivalent of a user's : preference for that category which is simply how many standard deviations : above the global average is this user's preference for that movie category. : : My question though is basically whether or not semantically the equation : query(category:Drama)*some weight + query(category:Comedy)*some weight : + query(category:Action)*some weight makes sense? My gut says that your apprach makes sense -- but if i'm understadning you correclty, i think that you need to add 1 to all your weights: the boost is a multiplier, so if someone's rating for every category is is 0 std devs above the average rating (ie: the most average person imaginable), you don't wnat to give every moving in every category a score of 0. Are you picking the top 3 categories the user prefers as a cut off, or are you arbitrarily using N category boosts for however many N categories the user is above the global average in their pref for that category? Are your prefrences coming from explicit user feedback on the categories (ie: rate how much you like comedies on a scale of 1-5) or are you infering it from user ratings of the movies themselves? (ie: rate this movie, which happens to be an scifi,action,comedy, on a scale of 1-5) ... because if it's hte later you probably want to be careful to also normalize based on how many categories the movie is in. the other thing to consider is wether you want to include negative prefrences (ie: weights less then 1) based on how many std dev the user's average is *below* the global average for a category .. in this case i *think* you'd want to divide the raw value from -1 to get a useful multiplier. Alternatively: you oculd experiment with using the weights as exponents instead of multipliers... b=sum(pow(query($cat1),1.482),pow(query($cat2),0.1199),pow(query($cat3),1.448)) ...that would simplify the math you'd have to worry about both for the totally boring average user (x**0 = 1) and for the categories users hate (x**-5 = some positive fraction that will act as a penalty) ... but you'd definitley need to run some tests to see if it over boosts as the std dev variations get really high (might want to take a root first before using them as the exponent) -Hoss
Boosting documents by categorical preferences
Hi all, I have a question around boosting. I wanted to use the boost= to write a nested query that will boost a document based on categorical preferences. For a movie search for example, say that a user likes drama, comedy, and action. I could use things like qq=q={!boost%20b=$b%20defType=edismax%20v=$qq}b=sum(product(query($cat1),1.482),product(query($cat2),0.1199),product(query($cat3),1.448))cat1=category:Dramacat2=category:Comedycat3=category:Action where cat1=Drama cat2=Comedy cat3=Action Currently I have the weights set to the z-score equivalent of a user's preference for that category which is simply how many standard deviations above the global average is this user's preference for that movie category. My question though is basically whether or not semantically the equation query(category:Drama)*some weight + query(category:Comedy)*some weight + query(category:Action)*some weight makes sense? What are some techniques people use to boost documents based on discrete things like category, manufacturer, genre etc? Thanks! Amit