[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2018-03-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18982


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2018-03-23 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18982#discussion_r176828963
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -118,11 +118,18 @@ def _transfer_params_to_java(self):
 """
 Transforms the embedded params to the companion Java object.
 """
-paramMap = self.extractParamMap()
+pair_defaults = []
 for param in self.params:
-if param in paramMap:
-pair = self._make_java_param_pair(param, paramMap[param])
+if self.isSet(param):
+pair = self._make_java_param_pair(param, 
self._paramMap[param])
 self._java_obj.set(pair)
+if self.hasDefault(param):
+pair = self._make_java_param_pair(param, 
self._defaultParamMap[param])
+pair_defaults.append(pair)
+if len(pair_defaults) > 0:
+sc = SparkContext._active_spark_context
+pair_defaults_seq = sc._jvm.PythonUtils.toSeq(pair_defaults)
+self._java_obj.setDefault(pair_defaults_seq)
--- End diff --

I think this is reasonable, a few extra lines to avoid potential unwanted 
user surprise is worth it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2018-03-23 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/18982#discussion_r176825209
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -118,11 +118,18 @@ def _transfer_params_to_java(self):
 """
 Transforms the embedded params to the companion Java object.
 """
-paramMap = self.extractParamMap()
+pair_defaults = []
 for param in self.params:
-if param in paramMap:
-pair = self._make_java_param_pair(param, paramMap[param])
+if self.isSet(param):
+pair = self._make_java_param_pair(param, 
self._paramMap[param])
 self._java_obj.set(pair)
+if self.hasDefault(param):
+pair = self._make_java_param_pair(param, 
self._defaultParamMap[param])
+pair_defaults.append(pair)
+if len(pair_defaults) > 0:
+sc = SparkContext._active_spark_context
+pair_defaults_seq = sc._jvm.PythonUtils.toSeq(pair_defaults)
+self._java_obj.setDefault(pair_defaults_seq)
--- End diff --

My take is that while they should be the same, it's still possible they 
might not be. The user could extend their own classes or it's quite easy to 
change in Python. Although we don't really support this, if there was a 
mismatch the user would probably just get bad results and it would be really 
hard to figure out why. From the Python API, it would look like it was one 
value but actually using another in Scala. 

If you all think it's overly cautious to do this, I can take it out. I just 
thought it would be cheap insurance to just set these values regardless.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2018-03-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18982#discussion_r176316173
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -118,11 +118,18 @@ def _transfer_params_to_java(self):
 """
 Transforms the embedded params to the companion Java object.
 """
-paramMap = self.extractParamMap()
+pair_defaults = []
 for param in self.params:
-if param in paramMap:
-pair = self._make_java_param_pair(param, paramMap[param])
+if self.isSet(param):
+pair = self._make_java_param_pair(param, 
self._paramMap[param])
 self._java_obj.set(pair)
+if self.hasDefault(param):
+pair = self._make_java_param_pair(param, 
self._defaultParamMap[param])
+pair_defaults.append(pair)
+if len(pair_defaults) > 0:
+sc = SparkContext._active_spark_context
+pair_defaults_seq = sc._jvm.PythonUtils.toSeq(pair_defaults)
+self._java_obj.setDefault(pair_defaults_seq)
--- End diff --

If java side and python side the default params are the same, do we still 
need to set default params for the java object? Are't they already be set if 
they are default params?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2017-09-06 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18982#discussion_r137360294
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -118,11 +118,13 @@ def _transfer_params_to_java(self):
 """
 Transforms the embedded params to the companion Java object.
 """
-paramMap = self.extractParamMap()
 for param in self.params:
-if param in paramMap:
-pair = self._make_java_param_pair(param, paramMap[param])
+if param in self._paramMap:
+pair = self._make_java_param_pair(param, 
self._paramMap[param])
 self._java_obj.set(pair)
+if param in self._defaultParamMap:
--- End diff --

Sounds reasonable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2017-09-06 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18982#discussion_r137360175
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -455,6 +455,14 @@ def test_logistic_regression_check_thresholds(self):
 LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
 )
 
+def test_preserve_set_state(self):
+model = Binarizer()
+self.assertFalse(model.isSet("threshold"))
+model._transfer_params_to_java()
--- End diff --

I think that would be a reasonable thing to do, the slight increase in 
testing overhead is probably worth it, it keeps us from being too closely tied 
to the implementation details and we already use `SparkSessionTestCase` in a 
lot of places.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2017-08-21 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/18982#discussion_r134380704
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -118,11 +118,13 @@ def _transfer_params_to_java(self):
 """
 Transforms the embedded params to the companion Java object.
 """
-paramMap = self.extractParamMap()
 for param in self.params:
-if param in paramMap:
-pair = self._make_java_param_pair(param, paramMap[param])
+if param in self._paramMap:
+pair = self._make_java_param_pair(param, 
self._paramMap[param])
 self._java_obj.set(pair)
+if param in self._defaultParamMap:
--- End diff --

We usually make the assumption that Python defines the same default values 
as Java, in Spark ML at least, but given the circumstances of the JIRA - they 
defined their own Model - then it's still possible for `hasDefault` or the 
default value to return something different that Python would.  So I'm just 
being overly cautious here, but it's pretty cheap to just transfer the default 
values anyway right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2017-08-21 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/18982#discussion_r134380293
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -455,6 +455,14 @@ def test_logistic_regression_check_thresholds(self):
 LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
 )
 
+def test_preserve_set_state(self):
+model = Binarizer()
+self.assertFalse(model.isSet("threshold"))
+model._transfer_params_to_java()
--- End diff --

yeah, it would be a little better to call the actual `transform`, but we 
would still need to call `_transfer_params_from_java` or check `isSet` with a 
direct call to Java via py4j.  I was going to do this, but the `ParamTest` 
class doesn't already create a `SparkSession` - I'm sure it's just a small 
amount of overhead but that's why I thought to just use 
`_transfer_params_to_java`.

Do you think it would be worth it to change `ParamTests` to inherit from 
`SparkSessionTestCase` so a session is created and I could make a `DataFrame` 
to transform?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2017-08-18 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18982#discussion_r134075412
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -118,11 +118,13 @@ def _transfer_params_to_java(self):
 """
 Transforms the embedded params to the companion Java object.
 """
-paramMap = self.extractParamMap()
 for param in self.params:
-if param in paramMap:
-pair = self._make_java_param_pair(param, paramMap[param])
+if param in self._paramMap:
+pair = self._make_java_param_pair(param, 
self._paramMap[param])
 self._java_obj.set(pair)
+if param in self._defaultParamMap:
--- End diff --

Should this be an else if? No need to transfer the default value if we've 
explicitly set it to another value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2017-08-18 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18982#discussion_r134075445
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -455,6 +455,14 @@ def test_logistic_regression_check_thresholds(self):
 LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
 )
 
+def test_preserve_set_state(self):
+model = Binarizer()
+self.assertFalse(model.isSet("threshold"))
+model._transfer_params_to_java()
--- End diff --

Would it make sense to do an actual transform here instead of the two inner 
parts of the transform?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2017-08-17 Thread BryanCutler
GitHub user BryanCutler opened a pull request:

https://github.com/apache/spark/pull/18982

[SPARK-21685][PYTHON][ML] PySpark Params isSet state should not change 
after transform

## What changes were proposed in this pull request?

Currently when a PySpark Model is transformed, default params that have not 
been explicitly set are then set on the Java side on the call to 
`wrapper._transfer_values_to_java`.  This incorrectly changes the state of the 
Param as it should still be marked as a default value only.

## How was this patch tested?

Added a new test to verify that when transferring Params to Java, default 
params have their state preserved.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BryanCutler/spark 
pyspark-ml-param-to-java-defaults-SPARK-21685

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18982.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18982


commit bbe3ef70d2acb3647171621b02b4f73c001d5caf
Author: Bryan Cutler 
Date:   2017-08-17T23:59:43Z

added regression test for preserving Param set state when transferring to 
Java

commit 12030569df3aa92a227d6e7d06ceeb5121d853f9
Author: Bryan Cutler 
Date:   2017-08-17T23:59:54Z

changed _transfer_params_to_java to not set param in Java unless explicitly 
set




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org