On Wed, Jun 24, 2015 at 12:02 PM, Nick Pentreath <nick.pentre...@gmail.com> wrote: > Oryx does almost the same but Oryx1 kept all user and item vectors in memory > (though I am not sure about whether Oryx2 still stores all user and item > vectors in memory or partitions in some way).
(Yes, this is a weakness, but makes things fast and easy to manage. My rule of thumb is 1M user/item vectors ~= 1GB RAM, comfortably, even with necessary ancillary structures. If you can afford N serving machines with a bunch of RAM, you can get away with this for a long while, but that's an "if") Scoring in memory is just the first step if it needs to be real-time -- scoring also probably needs to be even sub-linear in the number of items (i.e. don't even score all items) but this is a tangent relative to the Spark-related question. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org