RMSE would have the same potential issue. ALS-WR is going to prefer to minimize one error at the expense of letting another get much larger, whereas RMSE penalizes them all the same. It's maybe an indirect issue here at best -- there's a moderate mismatch between the metric and the nature of the algorithm.
I think most of the explanation is simply overfitting then, as this is test set error. I still think it is weird that the lowest MAE occurs at f=1; maybe there's a good simple reason for that I'm missing off the top of my head. FWIW When I tune for best parameters on this data set, according to a mean average precision metric, I end up with an optimum more like 15 features and lambda=0.05 (although, note, I'm using a different default alpha, 1, and a somewhat different definition of lambda). On Thu, May 9, 2013 at 2:11 PM, Gabor Bernat <[email protected]> wrote: > I know, but the same is true for the RMSE. > > This is based on the Movielens 100k dataset, and by using the frameworks > (random) sampling to split that into a training and an evaluation set. (the > RMSRecommenderEvaluator or AverageAbsoluteDifferenceRecommenderEvaluators > paramters - evaluation 1.0, training 0.75). > > Bernát GÁBOR > > > On Thu, May 9, 2013 at 3:05 PM, Sean Owen <[email protected]> wrote: > >> (The MAE metric may also be a complicating issue... it's measuring >> average error where all elements are equally weighted, but as the "WR" >> suggests in ALS-WR, the loss function being minimized weights >> different elements differently.) >> >> This is based on a test set right, separate from the training set? >> If you are able, measure the MAE on your training set too. If >> overfitting is the issue, you should see low error on the training >> set, and higher error on the test set, when f is high and lambda is >> low. >> >> On Thu, May 9, 2013 at 1:49 PM, Gabor Bernat <[email protected]> >> wrote: >> > Hello, >> > >> > Here it is: http://i.imgur.com/3e1eTE5.png >> > I've used 75% for training and 25% for evaluation. >> > >> > Well reasonably lambda gives close enough results, however not better. >> > >> > Thanks, >> > >> > >> > Bernát GÁBOR >> > >> > >> > On Thu, May 9, 2013 at 2:46 PM, Sean Owen <[email protected]> wrote: >> > >> >> This sounds like overfitting. More features lets you fit your training >> >> set better, but at some point, fitting too well means you fit other >> >> test data less well. Lambda resists overfitting, so setting it too low >> >> increases the overfitting problem. >> >> >> >> I assume you still get better test set results with a reasonable lambda? >> >> >> >> On Thu, May 9, 2013 at 1:38 PM, Gabor Bernat <[email protected]> >> >> wrote: >> >> > Hello, >> >> > >> >> > So I've been testing out the ALSWR with the Movielensk 100k dataset, >> and >> >> > I've run in some strange stuff. An example of this you can see in the >> >> > attached picture. >> >> > >> >> > So I've used feature count1,2,4,8,16,32, same for iteration and >> summed up >> >> > the results in a table. So for a lambda higher than 0.07 the more >> >> important >> >> > factor seems to be the iteration count, while increasing the feature >> >> count >> >> > may improve the result, however not that much. And this is what one >> could >> >> > expect from the algrithm, so that's okay. >> >> > >> >> > The strange stuff comes for lambdas smaller than 0.075. In this case >> the >> >> > more important part becames the feature count, hovewer not more but >> less >> >> is >> >> > better. Similary for the iteration count. Essentially the best score >> is >> >> > achieved for a really small lambda, and a single feature and iteration >> >> > count. How is this possible, am I missing something? >> >> > >> >> > >> >> > Bernát GÁBOR >> >> >>
