Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17849#discussion_r132530521
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -417,6 +417,54 @@ def test_logistic_regression_check_thresholds(self):
                 LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
             )
     
    +    @staticmethod
    +    def check_params(test_self, py_stage, check_params_exist=True):
    +        """
    +        Checks common requirements for Params.params:
    +          - set of params exist in Java and Python and are ordered by names
    +          - param parent has the same UID as the object's UID
    +          - default param value from Java matches value in Python
    +          - optionally check if all params from Java also exist in Python
    +        """
    +        py_stage_str = "%s %s" % (type(py_stage), py_stage)
    +        if not hasattr(py_stage, "_to_java"):
    +            return
    +        java_stage = py_stage._to_java()
    +        if java_stage is None:
    +            return
    +        test_self.assertEqual(py_stage.uid, java_stage.uid(), 
msg=py_stage_str)
    +        if check_params_exist:
    +            param_names = [p.name for p in py_stage.params]
    +            java_params = list(java_stage.params())
    +            java_param_names = [jp.name() for jp in java_params]
    +            test_self.assertEqual(
    +                param_names, sorted(java_param_names),
    +                "Param list in Python does not match Java for %s:\nJava = 
%s\nPython = %s"
    +                % (py_stage_str, java_param_names, param_names))
    --- End diff --
    
    I also changed the return to continue on line 454, this loop is checking 
all params so it was meant to skip over random seed params - not break out of 
the loop entirely (this is why that default value for MLP was missed).  I 
cleaned up the NaN checks, before it was just checking for Imputer params, but 
it should be the same for any params with NaN's as default values.  This is 
lines 460-462


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to