huaxingao opened a new pull request #29060:
URL: https://github.com/apache/spark/pull/29060
# What changes were proposed in this pull request?
current problems:
```
mlp = MultilayerPerceptronClassifier(layers=[2, 2, 2], seed=123)
model = mlp.fit(df)
path = tempfile.mkdtemp()
model_path = path + "/mlp"
model.save(model_path)
model2 = MultilayerPerceptronClassificationModel.load(model_path)
self.assertEqual(model2.getSolver(), "l-bfgs") # this fails
because model2.getSolver() returns 'auto'
model2.transform(df)
# this fails with Exception
pyspark.sql.utils.IllegalArgumentException:
MultilayerPerceptronClassifier_dec859ed24ec parameter solver given invalid
value auto.
```
FMClassifier/Regression and GeneralizedLinearRegression have the same
problems.
Here are the root cause of the problems:
1. In HasSolver, both Scala and Python default solver to 'auto'
2. On Scala side, mlp overrides the default of solver to 'l-bfgs',
FMClassifier/Regression overrides the default of solver to 'adamW', and glr
overrides the default of solver to 'irls'
3. On Scala side, mlp overrides the default of solver in
MultilayerPerceptronClassificationParams, so both
MultilayerPerceptronClassification and MultilayerPerceptronClassificationModel
have 'l-bfgs' as default
4. On Python side, mlp overrides the default of solver in
MultilayerPerceptronClassification, so it has default as 'l-bfgs', but
MultilayerPerceptronClassificationModel doesn't override the default so it gets
the default from HasSolver which is 'auto'. In theory, we don't care about the
solver value or any other params values for
MultilayerPerceptronClassificationModel, because we have the fitted model
already. That's why on Python side, we never set default values for any of the
XXXModel.
5. when calling getSolver on the loaded mlp model, it calls this line of
code underneath:
```
def _transfer_params_from_java(self):
"""
Transforms the embedded params from the companion Java object.
"""
......
# SPARK-14931: Only check set params back to avoid default
params mismatch.
if self._java_obj.isSet(java_param):
value = _java2py(sc,
self._java_obj.getOrDefault(java_param))
self._set(**{param.name: value})
......
```
that's why model2.getSolver() returns 'auto'. The code doesn't get the
default Scala value (in this case 'l-bfgs') to set to Python param, so it takes
the default value (in this case 'auto') on Python side.
6. when calling model2.transform(df), it calls this underneath:
```
def _transfer_params_to_java(self):
"""
Transforms the embedded params to the companion Java object.
"""
......
if self.hasDefault(param):
pair = self._make_java_param_pair(param,
self._defaultParamMap[param])
pair_defaults.append(pair)
......
```
Again, it gets the Python default solver which is 'auto', and this caused
the Exception
7. Currently, on Scala side, for some of the algorithms, we set default
values in the XXXParam, so both estimator and transformer get the default
value. However, for some of the algorithms, we only set default in estimators,
and the XXXModel doesn't get the default value. On Python side, we never set
defaults for the XXXModel. This causes the default value inconsistency.
8. My proposed solution: set default params in XXXParam for both Scala and
Python, so both the estimator and transformer have the same default value for
both Scala and Python. I currently only changed solver in this PR. If everyone
is OK with the fix, I will change all the other params as well.
I hope my explanation makes sense to your folks :)
### Why are the changes needed?
Fix bug
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
existing and new tests
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]