It's quite hard for me to get the mathematical concepts of the ALS recommenders. It would be great if someone could help me to figure out the details. This is my current status:
1. The item-feature (M) matrix is initialized using the average ratings and random values (explicit case) 2. The user-feature (U) matrix is solved using the partial derivative of the error function with respect to u_i (the columns of row-vectors of U) Supposed we use as many features as items are known and the error function does not use any regularization. Would U be solved within the first iteration? If not, I do not understand why more than one iteration is needed. Furthermore, I believe to have understood that using fewer features than items and also applying regularization, does not allow to solve U in a way that the stopping criterion can be met after only one iteration. Thus, iteration is required to gradually converge to the stopping criterion. I hope I have pointed out my problems clearly enough.
