[GitHub] [spark] mob-ai commented on issue #26124: [SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component

GitBox Tue, 03 Dec 2019 20:10:16 -0800

mob-ai commented on issue #26124: [SPARK-29224][ML]Implement Factorization 
Machines as a ml-pipeline component 
URL: https://github.com/apache/spark/pull/26124#issuecomment-561467902
 
 
   > > I still doubt whether existing testsuite is enough.
   > 
   > 1, I suggest to add several testsuites to check whether FMClassifier and 
FMRegressor can learn the intercept correctly.
   > Like '// Test if we can correctly learn Y = 0.1 + 1.2X1 - 1.3X2 + 20X1X2' 
in https://github.com/apache/spark/pull/5591/files
   > 
   > For example, we generate Y=Y = 0.1 + 1.2 X1 - 1.3 X2 + 20 X1 X2, and we 
can check whether 20 * X1 * X2 can be learned by checking dot(V1, V2) ~== 20.
   > 
   > 2, I suggest to add some testcases with fitBias==false or/and 
fitLinear==false
   
   for FMRegressor:
   I already add testcases to cover above situation. The 
`generateFactorInteractionInput` function would random generate model weights 
then generate X correspondingly, `FMRegressor` need fit the weights correctly 
(include fitBias==false or/and fitLinear==false case).
   
   for FMClassifier:
   It is difficult to generate a dataset that FMClassifier fit perfectly. I 
generate weights randomly, then put in sigmoid function, it can't get 0/1 
perfectly. If I want to generate 0/1 perfectly, the rawPrediction will very 
huge/small (sigmoid close to 1/0), but FMClassifier will fit the dataset using 
other model weights (the logloss still close to 0). So I only check the logloss 
correctness in FMClassifier.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mob-ai commented on issue #26124: [SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component

Reply via email to