Thank you for your quick reply. As far as I know, the update does not require negative observations, because the update rule
Xu = (YtCuY + λI)^-1 Yt Cu P(u) can be simplified by taking advantage of its algebraic structure, so negative observations are not needed. This is what I think at the first time I read the paper. What makes me confused is, after that, the paper (in Discussion section) says "Unlike explicit datasets, here *the model should take all user-item preferences as an input, including those which are not related to any input observation (thus hinting to a zero preference).* This is crucial, as the given observations are inherently biased towards a positive preference, and thus do not reflect well the user profile. However, taking all user-item values as an input to the model raises serious scalability issues – the number of all those pairs tends to significantly exceed the input size since a typical user would provide feedback only on a small fraction of the available items. We address this by exploiting the algebraic structure of the model, leading to an algorithm that scales linearly with the input size *while addressing the full scope of user-item pairs* without resorting to any sub-sampling." If my understanding is right, it seems that we need negative obs as input, but we dont use them during the updating. It is strange for me, because that will generate too many use-time pair, which is not possible. Thx for the confirmation. I will read the ALS implementation for more details. Hao -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/implicit-ALS-dataSet-tp7067p7086.html Sent from the Apache Spark User List mailing list archive at Nabble.com.