Re: Boosting documents by categorical preferences

2014-01-30 Thread Amit Nithian
Chris,

Sounds good! Thanks for the tips.. I'll be glad to submit my talk to this
as I have a writeup pretty much ready to go.

Cheers
Amit


On Tue, Jan 28, 2014 at 11:24 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : The initial results seem to be kinda promising... of course there are
 many
 : more optimizations I could do like decay user ratings over time to
 indicate
 : that preferences decay over time so a 5 rating a year ago doesn't count
 as
 : much as a 5 rating today.
 :
 : Hope this helps others. I'll open source what I have soon and post back.
 If
 : there is feedback or other thoughts let me know!

 Hey Amit,

 Glad to hear your user based boosting experiments are paying off.  I would
 definitely love to see a more detailed writeup down the road showing off
 how it affects your final user metrics -- or perhaps even give a session
 on your technique at ApacheCon?


 http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp


 -Hoss
 http://www.lucidworks.com/



Re: Boosting documents by categorical preferences

2014-01-28 Thread Chris Hostetter

: The initial results seem to be kinda promising... of course there are many
: more optimizations I could do like decay user ratings over time to indicate
: that preferences decay over time so a 5 rating a year ago doesn't count as
: much as a 5 rating today.
: 
: Hope this helps others. I'll open source what I have soon and post back. If
: there is feedback or other thoughts let me know!

Hey Amit,

Glad to hear your user based boosting experiments are paying off.  I would 
definitely love to see a more detailed writeup down the road showing off 
how it affects your final user metrics -- or perhaps even give a session 
on your technique at ApacheCon?

http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp


-Hoss
http://www.lucidworks.com/


Re: Boosting documents by categorical preferences

2014-01-27 Thread Amit Nithian
Hi Chris (and others interested in this),

Sorry for dropping off.. I got sidetracked with other work and came back to
this and finally got a V1 of this implemented.

The final process is as follows:
1) Pre-compute the global categorical num_ratings/average/std-dev (so for
Action the average rating may be 3.49 with stdDev of .99)
2) For a given user, retrieve the last X (X for me is 10) ratings and
compute the user's categorical affinities by taking the average rating for
all movies in that particular category (Action) subtract the global cat
average and divide by cat std_dev. Furthermore, multiply this by the
fraction of total user ratings in that category.
   - For example, if a user's last 10 ratings consisted of 9/10 Drama and
1/10 Thriller, the z-score of the Thriller should be discounted relative to
that of the Drama so that it's more prominent the user's preference (either
positive or negative) to Drama.
3) Sort by the absolute value of the z-score (Thanks Hossman.. great
thought).
4) Return the top 3 (arbitrary number)
5) Modify the query to look like the following:

qq=tom hanksq={!boost b=$b defType=edismax
v=$qq}cat1=category:Childrencat2=category:Fantasycat3=category:Animationb=sum(1,sum(product(query($cat1),0.22267872),product(query($cat2),0.21630952),product(query($cat3),0.21120241)))

basically b = 1+(pref1*query(category:something1) +
pref2*query(category:something2) + pref3*query(category:something3))

The initial results seem to be kinda promising... of course there are many
more optimizations I could do like decay user ratings over time to indicate
that preferences decay over time so a 5 rating a year ago doesn't count as
much as a 5 rating today.

Hope this helps others. I'll open source what I have soon and post back. If
there is feedback or other thoughts let me know!

Cheers
Amit


On Fri, Nov 22, 2013 at 11:38 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : I thought about that but my concern/question was how. If I used the pow
 : function then I'm still boosting the bad categories by a small
 : amount..alternatively I could multiply by a negative number but does that
 : work as expected?

 I'm not sure i understand your concern: negative powers would give you
 values less then 1, positive powers would give you values greater then 1,
 and then you'd use those values as multiplicitive boosts -- so the values
 less then 1 would penalize the scores of existing matching docs in the
 categories the user dislikes.

 Oh wait ... i see, in your original email (and in my subsequent suggested
 tweak to use pow()) you were talking about sum()ing up these 3 category
 boosts (and i cut/pasted sum() in my example as well) ... yeah,
 using multiplcation there would make more sense if you wanted to do the
 negative prefrences as well, because then then score of any matching doc
 will be reduced if it matches on an undesired category -- and the
 amount it will be reduced will be determined by how strongly it
 matches on that category (ie: the base score returned by the nested
 query() func) and how negative the undesired prefrence value (ie:
 the pow() exponent) is


 qq=...
 q={!boost b=$b v=$qq}

 b=prod(pow(query($cat1,cat1z)),pow(query($cat2,cat2z)),pow(query($cat3,cat3z))
 cat1=...action...
 cat1z=1.48
 cat2=...comedy...
 cat2z=1.33
 cat3=...kids...
 cat3z=-1.7


 -Hoss



Re: Boosting documents by categorical preferences

2013-11-22 Thread Chris Hostetter

: I thought about that but my concern/question was how. If I used the pow
: function then I'm still boosting the bad categories by a small
: amount..alternatively I could multiply by a negative number but does that
: work as expected?

I'm not sure i understand your concern: negative powers would give you 
values less then 1, positive powers would give you values greater then 1, 
and then you'd use those values as multiplicitive boosts -- so the values 
less then 1 would penalize the scores of existing matching docs in the 
categories the user dislikes.

Oh wait ... i see, in your original email (and in my subsequent suggested 
tweak to use pow()) you were talking about sum()ing up these 3 category 
boosts (and i cut/pasted sum() in my example as well) ... yeah, 
using multiplcation there would make more sense if you wanted to do the 
negative prefrences as well, because then then score of any matching doc 
will be reduced if it matches on an undesired category -- and the 
amount it will be reduced will be determined by how strongly it 
matches on that category (ie: the base score returned by the nested 
query() func) and how negative the undesired prefrence value (ie: 
the pow() exponent) is


qq=...
q={!boost b=$b v=$qq}
b=prod(pow(query($cat1,cat1z)),pow(query($cat2,cat2z)),pow(query($cat3,cat3z))
cat1=...action...
cat1z=1.48
cat2=...comedy...
cat2z=1.33
cat3=...kids...
cat3z=-1.7


-Hoss


Re: Boosting documents by categorical preferences

2013-11-20 Thread Amit Nithian
I thought about that but my concern/question was how. If I used the pow
function then I'm still boosting the bad categories by a small
amount..alternatively I could multiply by a negative number but does that
work as expected?

I haven't done much with negative boosting except for the sledgehammer
approach of category exclusion through filters.

Thanks
Amit
On Nov 19, 2013 8:51 AM, Chris Hostetter hossman_luc...@fucit.org wrote:

 : My approach was something like:
 : 1) Look at the categories that the user has preferred and compute the
 : z-score
 : 2) Pick the top 3 among those
 : 3) Use those to boost search results.

 I think that totaly makes sense ... the additional bit i was suggesting
 that you consider is that instead of picking the highest 3 z-scores,
 pick the z-scores with the greatest absolute value ... that way if someone
 is a very booring person and their positive interests are all basically
 exactly the same as the mean for everyone else, but they have some very
 strong dis-interests you don't bother boosting on those miniscule
 interests and instead you negatively boost on the things they are
 antogonistic against.


 -Hoss



Re: Boosting documents by categorical preferences

2013-11-19 Thread Chris Hostetter
: My approach was something like:
: 1) Look at the categories that the user has preferred and compute the
: z-score
: 2) Pick the top 3 among those
: 3) Use those to boost search results.

I think that totaly makes sense ... the additional bit i was suggesting 
that you consider is that instead of picking the highest 3 z-scores, 
pick the z-scores with the greatest absolute value ... that way if someone 
is a very booring person and their positive interests are all basically 
exactly the same as the mean for everyone else, but they have some very 
strong dis-interests you don't bother boosting on those miniscule 
interests and instead you negatively boost on the things they are 
antogonistic against.


-Hoss


Re: Boosting documents by categorical preferences

2013-11-18 Thread Amit Nithian
Hey Chris,

Sorry for the delay and thanks for your response. This was inspired by your
talk on boosting and biasing that you presented way back when at a meetup.
I'm glad that my general approach seems to make sense.

My approach was something like:
1) Look at the categories that the user has preferred and compute the
z-score
2) Pick the top 3 among those
3) Use those to boost search results.

I'll look at using the boosts as an exponent instead of a multiplier as I
think that would make sense.. also as it handles the 0 case.

This is for a prototype I am doing but I'll share the results one day in a
meetup as I think it'll be kinda interesting.

Thanks again
Amit


On Thu, Nov 14, 2013 at 11:11 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : I have a question around boosting. I wanted to use the boost= to write a
 : nested query that will boost a document based on categorical preferences.

 You have no idea how stoked I am to see you working on this in a real
 world application.

 : Currently I have the weights set to the z-score equivalent of a user's
 : preference for that category which is simply how many standard deviations
 : above the global average is this user's preference for that movie
 category.
 :
 : My question though is basically whether or not semantically the equation
 : query(category:Drama)*some weight + query(category:Comedy)*some
 weight
 : + query(category:Action)*some weight makes sense?

 My gut says that your apprach makes sense -- but if i'm
 understadning you correclty, i think that you need to add 1 to
 all your weights: the boost is a multiplier, so if someone's rating for
 every category is is 0 std devs above the average rating (ie: the most
 average person imaginable), you don't wnat to give every moving in every
 category a score of 0.

 Are you picking the top 3 categories the user prefers as a cut off, or
 are you arbitrarily using N category boosts for however many N categories
 the user is above the global average in their pref for that category?

 Are your prefrences coming from explicit user feedback on the categories
 (ie: rate how much you like comedies on a scale of 1-5) or are you
 infering it from user ratings of the movies themselves? (ie: rate this
 movie, which happens to be an scifi,action,comedy, on a scale of 1-5) ...
 because if it's hte later you probably want to be careful to also
 normalize based on how many categories the movie is in.

 the other thing to consider is wether you want to include negative
 prefrences (ie: weights less then 1) based on how many std dev the user's
 average is *below* the global average for a category .. in this case i
 *think* you'd want to divide the raw value from -1 to get a useful
 multiplier.

 Alternatively: you oculd experiment with using the weights as exponents
 instead of multipliers...


 b=sum(pow(query($cat1),1.482),pow(query($cat2),0.1199),pow(query($cat3),1.448))

 ...that would simplify the math you'd have to worry about both for the
 totally boring average user (x**0 = 1) and for the categories users hate
 (x**-5 = some positive fraction that will act as a penalty) ... but you'd
 definitley need to run some tests to see if it over boosts as the std
 dev variations get really high (might want to take a root first before
 using them as the exponent)



 -Hoss



Re: Boosting documents by categorical preferences

2013-11-14 Thread Chris Hostetter

: I have a question around boosting. I wanted to use the boost= to write a
: nested query that will boost a document based on categorical preferences.

You have no idea how stoked I am to see you working on this in a real 
world application.

: Currently I have the weights set to the z-score equivalent of a user's
: preference for that category which is simply how many standard deviations
: above the global average is this user's preference for that movie category.
: 
: My question though is basically whether or not semantically the equation
: query(category:Drama)*some weight + query(category:Comedy)*some weight
: + query(category:Action)*some weight makes sense?

My gut says that your apprach makes sense -- but if i'm 
understadning you correclty, i think that you need to add 1 to 
all your weights: the boost is a multiplier, so if someone's rating for 
every category is is 0 std devs above the average rating (ie: the most 
average person imaginable), you don't wnat to give every moving in every 
category a score of 0.

Are you picking the top 3 categories the user prefers as a cut off, or 
are you arbitrarily using N category boosts for however many N categories 
the user is above the global average in their pref for that category?

Are your prefrences coming from explicit user feedback on the categories 
(ie: rate how much you like comedies on a scale of 1-5) or are you 
infering it from user ratings of the movies themselves? (ie: rate this 
movie, which happens to be an scifi,action,comedy, on a scale of 1-5) ... 
because if it's hte later you probably want to be careful to also 
normalize based on how many categories the movie is in.

the other thing to consider is wether you want to include negative 
prefrences (ie: weights less then 1) based on how many std dev the user's 
average is *below* the global average for a category .. in this case i 
*think* you'd want to divide the raw value from -1 to get a useful 
multiplier.

Alternatively: you oculd experiment with using the weights as exponents 
instead of multipliers...

b=sum(pow(query($cat1),1.482),pow(query($cat2),0.1199),pow(query($cat3),1.448))

...that would simplify the math you'd have to worry about both for the 
totally boring average user (x**0 = 1) and for the categories users hate 
(x**-5 = some positive fraction that will act as a penalty) ... but you'd 
definitley need to run some tests to see if it over boosts as the std 
dev variations get really high (might want to take a root first before 
using them as the exponent)



-Hoss


Boosting documents by categorical preferences

2013-11-12 Thread Amit Nithian
Hi all,

I have a question around boosting. I wanted to use the boost= to write a
nested query that will boost a document based on categorical preferences.

For a movie search for example, say that a user likes drama, comedy, and
action. I could use things like

qq=q={!boost%20b=$b%20defType=edismax%20v=$qq}b=sum(product(query($cat1),1.482),product(query($cat2),0.1199),product(query($cat3),1.448))cat1=category:Dramacat2=category:Comedycat3=category:Action

where cat1=Drama cat2=Comedy cat3=Action

Currently I have the weights set to the z-score equivalent of a user's
preference for that category which is simply how many standard deviations
above the global average is this user's preference for that movie category.

My question though is basically whether or not semantically the equation
query(category:Drama)*some weight + query(category:Comedy)*some weight
+ query(category:Action)*some weight makes sense?

What are some techniques people use to boost documents based on discrete
things like category, manufacturer, genre etc?

Thanks!
Amit