mob-ai commented on issue #26124: [SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component URL: https://github.com/apache/spark/pull/26124#issuecomment-561467902 > > I still doubt whether existing testsuite is enough. > > 1, I suggest to add several testsuites to check whether FMClassifier and FMRegressor can learn the intercept correctly. > Like '// Test if we can correctly learn Y = 0.1 + 1.2X1 - 1.3X2 + 20X1X2' in https://github.com/apache/spark/pull/5591/files > > For example, we generate Y=Y = 0.1 + 1.2 X1 - 1.3 X2 + 20 X1 X2, and we can check whether 20 * X1 * X2 can be learned by checking dot(V1, V2) ~== 20. > > 2, I suggest to add some testcases with fitBias==false or/and fitLinear==false for FMRegressor: I already add testcases to cover above situation. The `generateFactorInteractionInput` function would random generate model weights then generate X correspondingly, `FMRegressor` need fit the weights correctly (include fitBias==false or/and fitLinear==false case). for FMClassifier: It is difficult to generate a dataset that FMClassifier fit perfectly. I generate weights randomly, then put in sigmoid function, it can't get 0/1 perfectly. If I want to generate 0/1 perfectly, the rawPrediction will very huge/small (sigmoid close to 1/0), but FMClassifier will fit the dataset using other model weights (the logloss still close to 0). So I only check the logloss correctness in FMClassifier.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
