[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21413 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r191611779 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,22 @@ def getSubsamplingRate(self): """ return self.getOrDefault(self.subsamplingRate) +@since("1.4.0") +def setFeatureSubsetStrategy(self, value): +""" +Sets the value of :py:attr:`featureSubsetStrategy`. + +.. note:: Deprecated in 2.1.0 and will be removed in 2.4.0. --- End diff -- Sorry. Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r191609540 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,22 @@ def getSubsamplingRate(self): """ return self.getOrDefault(self.subsamplingRate) +@since("1.4.0") +def setFeatureSubsetStrategy(self, value): +""" +Sets the value of :py:attr:`featureSubsetStrategy`. + +.. note:: Deprecated in 2.1.0 and will be removed in 2.4.0. --- End diff -- sorry, this should be `.. note:: Deprecated in 2.4.0 and will be removed in 3.0.0.` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r191602398 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,22 @@ def getSubsamplingRate(self): """ return self.getOrDefault(self.subsamplingRate) +@since("1.4.0") +def setFeatureSubsetStrategy(self, value): +""" +Sets the value of :py:attr:`featureSubsetStrategy`. + +.. note:: Deprecated in 2.1.0 and will be removed in 3.0.0. --- End diff -- Fixed. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r191581932 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,22 @@ def getSubsamplingRate(self): """ return self.getOrDefault(self.subsamplingRate) +@since("1.4.0") +def setFeatureSubsetStrategy(self, value): +""" +Sets the value of :py:attr:`featureSubsetStrategy`. + +.. note:: Deprecated in 2.1.0 and will be removed in 3.0.0. --- End diff -- This should technically be marked as deprecated in 2.4.0, even though the Scala version was before --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r190659883 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,20 @@ def getSubsamplingRate(self): """ return self.getOrDefault(self.subsamplingRate) +@since("1.4.0") +def setFeatureSubsetStrategy(self, value): --- End diff -- Got it. Thanks a lot! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r190650505 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,20 @@ def getSubsamplingRate(self): """ return self.getOrDefault(self.subsamplingRate) +@since("1.4.0") +def setFeatureSubsetStrategy(self, value): --- End diff -- `setFeatureSubsetStrategy` should only be in the GBT/RF estimators, while `getFeatureSubsetStrategy` can be in `TreeEnsembleParams` so it is inherited by both the estimators and models. It's because we don't want methods to set training params in the Model classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r190638735 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,20 @@ def getSubsamplingRate(self): """ return self.getOrDefault(self.subsamplingRate) +@since("1.4.0") +def setFeatureSubsetStrategy(self, value): --- End diff -- @BryanCutler Thanks for your review. I will modify the code. One question: Shall I only put the ```setFeatureSubsetStrategy``` in GBT/RandomForest? or put both the setter and getter? I looked the ml python code, it seems to me that the getter and setter are always in pairs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r190415802 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,20 @@ def getSubsamplingRate(self): """ return self.getOrDefault(self.subsamplingRate) +@since("1.4.0") +def setFeatureSubsetStrategy(self, value): --- End diff -- this method should be in the GBT/RandomForest estimator classes (classification and regression) and the old method in RandomForestParams should be deprecated, would you mind doing this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21413 [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBTClassifier ## What changes were proposed in this pull request? Add featureSubsetStrategy in GBTClassifier and GBTRegressor. Also make GBTClassificationModel inherit from JavaClassificationModel instead of prediction model so it will have numClasses. ## How was this patch tested? Add tests in doctest You can merge this pull request into a Git repository by running: $ git pull https://github.com/huaxingao/spark spark-23161 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21413.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21413 commit 16d19f4017bbcade79c59798052b0efacc59ea8b Author: Huaxin Gao Date: 2018-05-23T17:36:35Z Add missing APIs to Python GBTClassifier --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org