Re: Realtime update of similarity matrices

James Donnelly Mon, 22 Jun 2015 09:00:55 -0700

Ted, thanks for the video - enjoyable and insightful.

Gustavo, a good read, and a reminder of how far I have to go.  More maths
later - fun!

Pat, I need to read more and take my time understanding how cut-offs in LLR
derived co-occurence can be exploited in practice.  I accept that useful
real-time model updates are an edge case, but I may have to face edge cases.

I mentioned the framework I'm putting together - I didn't mention that
we're a SAAS business.  The product will serve multiple use cases.

The cold start capabilities of the multi-modal approach are appealing.  I
can see content recommendations filling the gap while we build the
user-item model - this won't work for all product types of course.

There are clients whose 'products' are fairly short lived where the initial
burst of user-item interactions would definitely be useful.  I take your
point that small increment sets might not impact the model in most cases.

My take-out from the responses so far is that the real time question can
wait until phase n of the project without sacrificing much value.   I'm
looking forward to learning what is possible - I see what you are saying
about the mutable vectors.

The great thing now for me is that I can do an end to end proof of concept
mostly by doing framework plumbing.  Maybe I'll look into doing multiple
cross-coocurrence indicators in once pass via the ItemSimilarityDriver, but
once we get the basics functioning, we'll probably be looking to engage a
Ted or a Pat if we can afford them :D

There is one final challenge for today I have not figured out though.
Let's say I have a new client (client #2), who sells shoes.  Let's say I
have an existing client (client #1), for whom we have captured a million
user-view/purchase interactions.  How can I recommend to client #2 based on
the model built from client #1?

The items in their respective inventories are similar by content, but not
identical.  So I need to map the content similarities across the product
data sets, then via that mapping, apply pseudo-collaborative filtering to
client #2's customers.

Thoughts?

Many thanks for your time once again.

On 22 June 2015 at 01:32, Pat Ferrel <[email protected]> wrote:

> Actually Mahout’s item and row similarity calculate the cooccurrence and
> cross-cooccurrence matrices, a search engine preforms the knn calc to
> return an ordered list of recs. The search query is user history the search
> engine calculates the most similar items from the cooccurrence matrix and
> cross-cooccurrence matrices by keeping them in different fields. This means
> there is only one query across several matrices. Solr and Elasticsearch are
> well know for speed and scalability in serving these queries.
>
> In a hypothetical  incremental model we might use the search engine as
> matrix storage since an incremental update to the matrix would be indexed
> in realtime by the engine. The update method Ted mentions is relatively
> simple and only requires that the cooccurrence matrices be mutable and two
> mutable vectors be kept in memory (item/column and user/row interaction
> counts).
>
> On Jun 19, 2015, at 6:47 PM, Gustavo Frederico <
> [email protected]> wrote:
>
> James,
>
>   From my days at the university I remember reinforcement learning (
> https://en.wikipedia.org/wiki/Reinforcement_learning )
>  I suspect reinforcement learning is interesting to explore in the problem
> of e-commerce recommendation. My academic stuff is really rusted, but it's
> one of the few models that represent well the synchronous/asynchronous
> problem that we see in e-commerce systems...
>  The models I'm seeing with Mahout + Solr  (by MapR et alli) have Solr do
> the work to calculate the co-occurrence indicators. So the fact Solr is
> indexing this 'from scratch' during offline learning 'throws the whole
> model into the garbage soon' and doesn't leave room for the
> optimization/reward step of reinforcement learning. I don't know if someone
> could go on the theoretical side and tell us if perhaps there's a 'mapping'
> between the reinforcement learning model and the traditional off-line
> training + on-line testing. Maybe there's a way to shorten the Solr
> indexing cycle, but I'm not sure how to 'inject' the reward in the model...
> just some thoughts...
>
> cheers
>
> Gustavo
>
>
>
> On Fri, Jun 19, 2015 at 5:35 AM, James Donnelly <[email protected]>
> wrote:
>
> > Hi,
> >
> > First of all, a big thanks to Ted and Pat, and all the authors and
> > developers around Mahout.
> >
> > I'm putting together an eCommerce recommendation framework, and have a
> > couple of questions from using the latest tools in Mahout 1.0.
> >
> > I've seen it hinted by Pat that real-time updates (incremental learning)
> > are made possible with the latest Mahout tools here:
> >
> >
> >
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
> >
> > But once I have gone through the first phase of data processing, I'm not
> > clear on the basic direction for maintaining the generated data, e.g with
> > added products and incremental user behaviour data.
> >
> > The only way I can see is to update my input data,  then re-run the
> entire
> > process of generating the similarity matrices using the itemSimilarity
> and
> > rowSImilarity jobs.  Is there a better way?
> >
> > James
> >
>
>

Re: Realtime update of similarity matrices

Reply via email to