Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/17849
Thanks @holdenk for the review! I think I wrote the description a little
too rushed, so let me clarify a bit...
The temporary "fix" will just create empty params in the model if they
exist in the Java model but not the Python one. There should be no risk of
having these added to the Python model since they are empty when created and
not yet defined with a value. These params will be set in 2 ways: 1) after the
model is fit in the call to `_copy_values` where the value is copied from the
estimator for any matching params, 2) when the model is loaded there is a call
to `_transfer_params_from_java` that will copy value if the the Java param has
been explicitly set (I think I need to add something here for the case that the
Java model has a default value but Python model doesn't).
I think the best way forward to get parity with the Scala API is to then
organize a JIRA with subtasks to update the Python ML class hierarchies to
match the Scala ones, so that the Params will be defined that way with proper
"get" and "set" methods too. It might be good to also have a Python test that
checks for matching params in Java for both the estimators and models. It
could be ignored by default and then enabled during the QA period. The
temporary fix here would continue to work and not interfere while the params
are being added. It could be removed once we feel that most of the params have
been properly added and close to matching the Scala API.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]