dongjoon-hyun commented on a change in pull request #26527: [SPARK-29691]
ensure Param objects are valid in fit, transform
URL: https://github.com/apache/spark/pull/26527#discussion_r346674023
##########
File path: .github/PULL_REQUEST_[SPARK-29691]
##########
@@ -0,0 +1,37 @@
+### What changes were proposed in this pull request?
+
+Estimator.fit() and Model.transform() accept a dictionary of extra parameters
whose values are used to
+overwrite those supplied at initialization or by default.
+The keys are presumed to be valid Param objects.
+It is proposed to extend the API to allow strings as keys when they can be
mapped to a valid parameter
+belonging to the target object, and otherwise to check that only Param objects
are supplied as keys.
+
+### Why are the changes needed?
+
+Param objects are created by and bound to an instance of Params (Estimator,
Model, or Transformer).
+They may be obtained from their parent as attributes, or by name through
getParam.
+
+The documentation does not state that keys must be valid Param objects, nor
describe how one may be
+obtained The current behavior is to silently ignore keys which are not valid
Param objects.
+
+### Does this PR introduce any user-facing change?
+
+Example:
+```
+extra = {"featuresCol": "features1"}
+lr = LogisticRegression()
+lr.fit(data, params=extra)
+```
+will now be equivalent to
+```
+lr = LogisticRegression(**extra)
+lr.fit(data)
+```
+Unrecognized parameters will now raise ValueError.
+
+(Note also that invalid parameters added to ParamGridBuilder which might have
been ignored could now
+cause errors, if eventually the bad key arrives in a call to a fit method.)
+
+### How was this patch tested?
+
+Added method test_copy_param_extras_check to test_param.py.
Review comment:
@JohnHBauer . Please remove this file . You already describe this
information correctly.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]