Github user lxmly commented on the issue:
https://github.com/apache/spark/pull/18337
I think this is the process of parallel batch gradient, and averaging the
sum of gradient is necessary.
I also find the update process of y_j factor is incorrect. For each rating
of the user, all of y_j factors associated with items in the user's action
history should be updated, but graphx svd++ only updates y_j factor of the item
in the rating.
$$
\forall j \in N(u):
y_j \leftarrow y_j + \gamma \cdot (e_{ui}\cdot|N(u)|^{-\frac{1}{2}}\cdot
q_i - \lambda \cdot y_j)
$$
@srowen
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]