I just finished watching a great presentation from a recent spark summit on
real time movie recommendations using spark.
https://spark-summit.org/east-2015/talk/real-time-recommendations-using-spar
k . For the purpose of email I am going to really simplify what they did. In
general their real time system took in data about what all users are
watching and calculates the most popular/trending shows. The results are
stored in a data base. When an individual user goes to ³movie guide² they
read the top 10 recommendations from a database.

My guess is the part of their system that services up recommendations to
users in real time is not implemented using spark. Its probably a bunch of
rest servers sitting behind a bunch of proxy servers and load balancers. The
rest servers read the recommendations calculated using spark streaming.

This got me thinking. So in general we have spark handling batch, ingestion
of real time data but not the part of the system that delivers the real time
user experience. Ideally I would like to have one unified platform.

Using spark streaming with a small window size of say 100 ms would meet my
SLA. Each window is going to contain many unrelated requests. In the
recommender system example map() would look up the user specific
recommendation for each request. The trick is how to return the response to
the correctly ³client². I could publish the response to some other system
(kafka? Or custom proxy?) that can truly return the data to the client. Is
this a good idea? What do people do in practice?

Also I assume I would have to use rdd.foreach() to some how mark the cause
the response data to be sent to the correct client.

Comments and suggestions appreciated.

Kind regards

Andy


Reply via email to