On Fri, Mar 11, 2016 at 12:18 PM, Nick Pentreath
<nick.pentre...@gmail.com> wrote:
> In general, for serving situations MF models are stored in some other
> serving system, so that system may be better suited to do the actual
> fold-in. Sean's Oryx project does that, though I'm not sure offhand if that
> part is done in Spark or not.

(No this part isn't Spark; it's just manipulating arrays in memory.
Making the model is done in Spark, as is marshalling the input from a
Kafka topic.)


> I know Sean's old Myrrix project also used to support computing ALS with an
> initial set of input factors, so you could in theory incrementally compute
> on new data. I'm not sure if the newer Oryx project supports it though.

(Yes, exactly the same thing exists in oryx)


> @Sean, what are your thoughts on supporting an initial model (factors) in
> ALS? I personally have always just recomputed the model, but for very large
> scale stuff it can make a lot of sense obviously. What I'm not sure on is
> whether it gives good solutions (relative to recomputing) - I'd imagine it
> will tend to find a slightly better local minimum given a previous local
> minimum starting point... with the advantage that new users / items are
> incorporated. But of course users can do a full recompute periodically.

I'd prefer to be able to specify a model, since typically the initial
model takes 20-40 iterations to converge to a reasonable state, and
only needs a few more to converge to the same threshold given a
relatively small number of additional inputs. The difference can be a
lot of compute time.

This is one of the few things that got worse when I moved to Spark
since this capability was lost.

I had been too lazy to actually implement it though. But that'd be cool.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to