[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21413


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-29 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21413#discussion_r191611779
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -619,6 +627,22 @@ def getSubsamplingRate(self):
 """
 return self.getOrDefault(self.subsamplingRate)
 
+@since("1.4.0")
+def setFeatureSubsetStrategy(self, value):
+"""
+Sets the value of :py:attr:`featureSubsetStrategy`.
+
+.. note:: Deprecated in 2.1.0 and will be removed in 2.4.0.
--- End diff --

Sorry.  Fixed. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-29 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/21413#discussion_r191609540
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -619,6 +627,22 @@ def getSubsamplingRate(self):
 """
 return self.getOrDefault(self.subsamplingRate)
 
+@since("1.4.0")
+def setFeatureSubsetStrategy(self, value):
+"""
+Sets the value of :py:attr:`featureSubsetStrategy`.
+
+.. note:: Deprecated in 2.1.0 and will be removed in 2.4.0.
--- End diff --

sorry, this should be `.. note:: Deprecated in 2.4.0 and will be removed in 
3.0.0.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-29 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21413#discussion_r191602398
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -619,6 +627,22 @@ def getSubsamplingRate(self):
 """
 return self.getOrDefault(self.subsamplingRate)
 
+@since("1.4.0")
+def setFeatureSubsetStrategy(self, value):
+"""
+Sets the value of :py:attr:`featureSubsetStrategy`.
+
+.. note:: Deprecated in 2.1.0 and will be removed in 3.0.0.
--- End diff --

Fixed. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-29 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/21413#discussion_r191581932
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -619,6 +627,22 @@ def getSubsamplingRate(self):
 """
 return self.getOrDefault(self.subsamplingRate)
 
+@since("1.4.0")
+def setFeatureSubsetStrategy(self, value):
+"""
+Sets the value of :py:attr:`featureSubsetStrategy`.
+
+.. note:: Deprecated in 2.1.0 and will be removed in 3.0.0.
--- End diff --

This should technically be marked as deprecated in 2.4.0, even though the 
Scala version was before


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-24 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21413#discussion_r190659883
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -619,6 +627,20 @@ def getSubsamplingRate(self):
 """
 return self.getOrDefault(self.subsamplingRate)
 
+@since("1.4.0")
+def setFeatureSubsetStrategy(self, value):
--- End diff --

Got it. Thanks a lot!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-24 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/21413#discussion_r190650505
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -619,6 +627,20 @@ def getSubsamplingRate(self):
 """
 return self.getOrDefault(self.subsamplingRate)
 
+@since("1.4.0")
+def setFeatureSubsetStrategy(self, value):
--- End diff --

`setFeatureSubsetStrategy` should only be in the GBT/RF estimators, while 
`getFeatureSubsetStrategy` can be in `TreeEnsembleParams` so it is inherited by 
both the estimators and models.  It's because we don't want methods to set 
training params in the Model classes. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-24 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21413#discussion_r190638735
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -619,6 +627,20 @@ def getSubsamplingRate(self):
 """
 return self.getOrDefault(self.subsamplingRate)
 
+@since("1.4.0")
+def setFeatureSubsetStrategy(self, value):
--- End diff --

@BryanCutler Thanks for your review. I will modify the code. One question: 
Shall I only put the ```setFeatureSubsetStrategy``` in GBT/RandomForest? or put 
both the setter and getter? I looked the ml python code, it seems to me that 
the getter and setter are always in pairs. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-23 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/21413#discussion_r190415802
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -619,6 +627,20 @@ def getSubsamplingRate(self):
 """
 return self.getOrDefault(self.subsamplingRate)
 
+@since("1.4.0")
+def setFeatureSubsetStrategy(self, value):
--- End diff --

this method should be in the GBT/RandomForest estimator classes 
(classification and regression) and the old method in RandomForestParams should 
be deprecated, would you mind doing this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-23 Thread huaxingao
GitHub user huaxingao opened a pull request:

https://github.com/apache/spark/pull/21413

[SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBTClassifier

## What changes were proposed in this pull request?

Add featureSubsetStrategy in GBTClassifier and GBTRegressor.  Also make 
GBTClassificationModel inherit from JavaClassificationModel instead of 
prediction model so it will have numClasses.

## How was this patch tested?

Add tests in doctest


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huaxingao/spark spark-23161

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21413.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21413


commit 16d19f4017bbcade79c59798052b0efacc59ea8b
Author: Huaxin Gao 
Date:   2018-05-23T17:36:35Z

Add missing APIs to Python GBTClassifier




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org