Agreed. Downsampling was ignored in several places and with it a great deal of 
input is a noop. Without downsampling too many things need to change. 

Also everything is dependent on this rather vague sentence. “- determine if the 
new interaction element cross-occurs with A and if so calculate the llr score”, 
which needs a lot more explanation. Whether to use Mahout in-memory objects or 
reimplement some in high speed data structures is a big question.

The good thing I noticed in writing this is that model update and real time can 
be arbitrarily far apart, that the system degrades gracefully. So during high 
load it may fall behind but as long as user behavior is up-to-date and 
persisted (it will be) we are still in pretty good shape.


On Mar 26, 2017, at 6:26 PM, Ted Dunning <tdunn...@mapr.com> wrote:


I think that this analysis omits the fact that one user interaction causes many 
cooccurrences to change.

This becomes feasible if you include the effect of down-sampling, but that has 
to be in the algorithm.


From: Pat Ferrel <p...@occamsmachete.com>
Sent: Saturday, March 25, 2017 12:01:00 PM
To: Trevor Grant; user@mahout.apache.org
Cc: Ted Dunning; s...@apache.org
Subject: Lambda and Kappa CCO
 
This is an overview and proposal for turning the multi-modal Correlated 
Cross-Occurrence (CCO) recommender from Lambda-style into an online streaming 
incrementally updated Kappa-style learner.

# The CCO Recommender: Lambda-style

We have largely solved the problems of calculating the multi-modal Correlated 
Cross-Occurrence models and serving recommendations in real time from real time 
user behavior. The model sits in Lucene (Elasticsearch or Solr) in a scalable 
way and the typical query to produce personalized recommendations comes from 
real time user behavior completes with 25ms latency.

# CCO Algorithm

A = rows are users, columns are items they have “converted” on (purchase, read, 
watch). A represents the conversion event—the interaction that you want to 
recommend.
B = rows are users columns are items that the user has shown some preference 
for but not necessarily the same items as A. B represent a different 
interaction than A. B might be a preference for some category, brand, genre, or 
just a detailed item page view—or all of these in B, C, D, etc
h_a = a particular user’s history of A type interactions, a vector of items 
that our user converted on.
h_b = a particular user’s history of B type interactions, a vector of items 
that our user had B type interactions with.

CCO says:

[A’A]h_a + [A’B]h_b + [A’C]h_c = r; where r is the weighted items from A that 
represent personalized recommendations for our particular user.

The innovation here is that A, B, C, … represent multi-modal data. Interactions 
of all types and on item-sets of arbitrary types. In other words we can look at 
virtually any action or possible indicator of user preference or taste. We 
strengthen the above raw cross-occurrence and cooccurrence formula by 
performing:

[llr(A’A)]h_a + [llr(A’B)]h_b + … = r adding llr (log-likelihood ratio) 
correlation scoring to filter out coincidental cross-occurrences.

The model becomes [llr(A’A)], [llr(A’B)], … each has items from A in rows and 
items from A, B, … in columns. This sits in Lucene as one document per items in 
A with a field for each of A, B, C items whose user interactions most strongly 
correlate to the conversion event on the row item. Put another way, the model 
is items from A. B, C… what have the most similar user interaction from users.

To calculate r we need to find the most simllar items in the model to the 
history or behavior of our example user. Since Lucene is basically a K-Nearest 
Neighbors engine that is particularly well tuned to work with sparse data (our 
model is typically quite sparse) all we need to do is segment the user history 
into h_a, h_b … and use it as the multi-field query on the model. This performs 
the equivalent of:

[llr(A’A)]h_a + [llr(A’B)]h_b + … = r where we substitute cosine similarity of 
h_a to every row in [llr(A’A)]h_a for the tensor math. Further Lucene sorts by 
score and returns only the top ranking items. Even further we note that since 
we have performed a multi-field query it does the entire multi-field similarity 
calculation and vector segment addition before doing the sort. Lucene does this 
a a very performant manner so the entire query, including fetching user 
history, forming the Lucene query and executing it will take something like 25 
ms and is indefinitely scalable to any number of simultaneous queries.

Problem solved?

Well, yes and no. The above method I’ve label a Lambda-style recommender. It 
uses real time user history and makes recommendations in real time but it can 
only recommend items in A. So if A is changing rapidly, as when the items have 
short lifetimes like newsy items of social media things like tweets then A can 
get out of date in hours or minutes. The other downside of Lambda CCO is that 
we note that the entirety of the data in A, B, C … has to be re-examined every 
time new models are calculated. With data on the order of a terabyte or more, 
this is quite a cost in compute resources.

It is true that most typical A items will not change often. Think of the 
typical E-Commerce case where A represents items in a catalog, which change 
only infrequently. But the resources required to re-calculate the model remain 
high, even if they are only needed once per week. We have seen the existing 
Lambda-style system take many AWS x1.32xlarge instance hours to recalculate the 
model. This translates into a significant cost for model calculation alone.

# Streaming Online CCO Kappa-style

Since we have a good solution for serving query results from the model we might 
solve the Lambda CCO issues by converting the model re-calc into a streaming 
online incrementally updated process. In other words instead of replacing the 
model with a new one periodically let’s see if we can update it with each new 
interaction in real time.

Stripping away the query part of the CCO algorithm the question is can we 
update:

[llr(A’A)], [llr(A’B)], …

in real time.

Caveats: updates may be micro-batched, not with each new event, and real time 
really means “near real time” and the update frequency will be allowed to vary 
with input frequency.

We note a few things about the problem:

1) A, B, … are sparse matrices keyed by a user-id. This is possible to model as 
a hashmap in memory or as a persistent distributed indexed dataset in some fast 
NoSQL DB.
2) if we need to update the model we also need to update the data that produces 
the model, A, B, C in a persistent way.
3) a single interaction will only affect one segment of the model (A’A or A’B 
but not both) and one input matrix.
4) The log-likelihood ratio needs to know 1) if there is a cross-occurrence and 
2) the counts of interactions from several perspectives that correspond to row, 
column, and total density of the input matrices.
5) because of #4, the important data can be stored and updated as single 
elements of vectors, not requiring examining the entirety of input.

We can make several decisions now based on system requirements.

1) How important is it to persist A, B, C etc. If we save interaction logs we 
can always re-create them when a system is taken down and brought back up. This 
implies that in-memory hashmaps of sparse vectors are sufficient to the 
solution even though a “reboot” may require some time to perform. If the reboot 
time is critical we can represent A, B, ... in a mutable DB.
2) The serving layer persists A’A, … and is highly performant and scalable with 
one of the distributed versions of Lucene so we leave that in place. The caveat 
is that we must now add new load to Lucene in that we must update the model in 
place rather than recalculating it en masse periodically. This has the benefit 
of also persisting the model so be “reboot” resilient.
3) if we are persisting user interaction h_a, h_b, persistently in a DB then we 
have the making of queries.

This leaves only the collected interactions in A, B, … that must be accessed 
and updated in real time but if the were to disappear all we would loose is 
model real time model updating, the queries would still work and the system as 
a whole would continue to operate with very little notice of the lose. Therefor 
using in-memory data structures (which give the ultimate in speed) should be 
sufficient to the Kappa-style solution. They will be in separate hashmaps with 
a sparse vector per key, remembering that the key is an item from A and the 
vector is items from A, B, C ...

# Kappa algorithm

Transforming Lambda to Kappa requires:

- receiving a new interaction, store it to a persistent DB as per the Lambda 
implementation that we know is fast
- insert it into the hashmap that should contain it based on the interaction 
type (A, B, …)
- recalculate the element in the row and column non-zero element vectors and 
update them in-memory. These are used in llr and can be updated without 
re-counting matrix elements
- determine if the new interaction element cross-occurs with A and if so 
calculate the llr score
- fetch the model field for this cross-occurrence and add the new interaction 
item if it scores high enough to meet the same threshold as used in 
Lambda-style—several options are available here

Using the existing performant infrastructure of the Lambda recommender for 
persistence we have reduced the CCO model updates to in-memory hashmaps and 
vectors and potentially one new update of a single doc field in Lucene. 
Potentially being an important modifier because vey very often and update to 
the persistent model will not be required.

At any time we can recreate the live system by rebooting, but even if the model 
update mechanism stops, queries continue to be served with real time data. 
Disregarding DB or Lucene failure the system reverts from Kappa to Lambda 
gracefully and then back to Kappa with a restart of the system managing 
in-memory data.

Many diagrams and illustrations showing actual component choices to be 
inserted. Benchmarks for update timing also to be added.



Any thoughts? The Kappa-style implementation is not started so suggestions are 
welcome. The Lambda-style version is operational as The Universal Recommender 
built on PredictionIO, HBase, Elasticsearch, Spark, and Mahout. It is 
documented here: http://actionml.com/docs/ur <http://actionml.com/docs/ur>and 
here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html 
<http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html>

Reply via email to