[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @MLnick @WeichenXu123 @sethah Thanks for your help throughout the process. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @MLnick Please find some time to review it and let me know if we can proceed with this. Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @MLnick Thanks for the reviewing the code . Have done changes as suggested. Please proceed further if its good to go . Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r149886343 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala --- @@ -166,6 +166,40 @@ class GBTRegressorSuite extends SparkFunSuite with MLlibTestSparkContext } / + // Tests of feature subset strategy + / + test("Tests of feature subset strategy") { +val numClasses = 2 +val gbt = new GBTRegressor() + .setMaxDepth(3) + .setMaxIter(5) + .setSubsamplingRate(1.0) + .setStepSize(0.5) + .setSeed(123) + .setFeatureSubsetStrategy("all") + +// In this data, feature 1 is very important. +val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc) +val categoricalFeatures = Map.empty[Int, Int] +val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses) + +val importances = gbt.fit(df).featureImportances +val mostImportantFeature = importances.argmax +assert(mostImportantFeature === 1) +assert(importances.toArray.sum === 1.0) --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r149886323 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala --- @@ -173,6 +178,10 @@ object GBTRegressor extends DefaultParamsReadable[GBTRegressor] { @Since("2.0.0") override def load(path: String): GBTRegressor = super.load(path) + + @Since("2.3.0") --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r149886357 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala --- @@ -166,6 +166,40 @@ class GBTRegressorSuite extends SparkFunSuite with MLlibTestSparkContext } / + // Tests of feature subset strategy + / + test("Tests of feature subset strategy") { +val numClasses = 2 +val gbt = new GBTRegressor() + .setMaxDepth(3) + .setMaxIter(5) + .setSubsamplingRate(1.0) + .setStepSize(0.5) + .setSeed(123) + .setFeatureSubsetStrategy("all") + +// In this data, feature 1 is very important. +val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc) +val categoricalFeatures = Map.empty[Int, Int] +val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses) + +val importances = gbt.fit(df).featureImportances +val mostImportantFeature = importances.argmax +assert(mostImportantFeature === 1) +assert(importances.toArray.sum === 1.0) +assert(importances.toArray.forall(_ >= 0.0)) + +// GBT with different featureSubsetStrategy +val gbtWithFeatureSubset = gbt.setFeatureSubsetStrategy("1") +val importanceFeatures = gbtWithFeatureSubset.fit(df).featureImportances +val mostIF = importanceFeatures.argmax +assert(!(mostImportantFeature === mostIF)) +assert(importanceFeatures.toArray.sum === 1.0) +assert(importanceFeatures.toArray.forall(_ >= 0.0)) +assert(!(importanceFeatures.toDense.values.deep === importances.toDense.values.deep)) --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 Ping @MLnick @jkbradley . @sethah has given LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @MLnick @jkbradley Please find some time to review it . @sethah has given LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @sethah Build is passed :) ,have done the changes as suggested (setting maxIter and maxDepth). ping @MLnick or @jkbradley so we can move ahead with it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 Jenkins test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @sethah Its still failing , I don't think so its issue from my side. Please help --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @sethah Have done the changes as suggested ,but build is failing because of this error Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error? https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83354/ Please help on the same --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @sethah please find some time to look into the changes . Please let me know if further changes are required. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @sethah Thanks for reviewing code . I have done all the changed as suggested by you . Please review them and let me know if further changes are required. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r148197373 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -354,6 +356,41 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext } / + // Tests of feature subset strategy + / + test("Tests of feature subset strategy") { +val numClasses = 2 +val gbt = new GBTClassifier() --- End diff -- Removed stepSize , impurity and other parameters --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r148197388 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -354,6 +356,41 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext } / + // Tests of feature subset strategy + / + test("Tests of feature subset strategy") { +val numClasses = 2 +val gbt = new GBTClassifier() + .setImpurity("Gini") + .setMaxDepth(3) + .setMaxIter(5) + .setSubsamplingRate(1.0) + .setStepSize(0.5) + .setSeed(123) + .setFeatureSubsetStrategy("all") + +// In this data, feature 1 is very important. +val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc) +val categoricalFeatures = Map.empty[Int, Int] +val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses) + +val importances = gbt.fit(df).featureImportances +val mostImportantFeature = importances.argmax +assert(mostImportantFeature === 1) +assert(importances.toArray.sum === 1.0) +assert(importances.toArray.forall(_ >= 0.0)) + +// GBT with different featureSubsetStrategy +val gbtWithFeatureSubset = gbt.setFeatureSubsetStrategy("1") +val importanceFeatures = gbtWithFeatureSubset.fit(df).featureImportances +val mostIF = importanceFeatures.argmax +assert(!(mostImportantFeature === mostIF)) +assert(importanceFeatures.toArray.sum === 1.0) --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r148197380 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -354,6 +356,41 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext } / + // Tests of feature subset strategy + / + test("Tests of feature subset strategy") { +val numClasses = 2 +val gbt = new GBTClassifier() + .setImpurity("Gini") + .setMaxDepth(3) + .setMaxIter(5) + .setSubsamplingRate(1.0) + .setStepSize(0.5) + .setSeed(123) + .setFeatureSubsetStrategy("all") + +// In this data, feature 1 is very important. +val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc) +val categoricalFeatures = Map.empty[Int, Int] +val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses) + +val importances = gbt.fit(df).featureImportances +val mostImportantFeature = importances.argmax +assert(mostImportantFeature === 1) +assert(importances.toArray.sum === 1.0) --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r148197272 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala --- @@ -118,11 +119,12 @@ class DecisionTreeRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: S /** (private[ml]) Train a decision tree on an RDD */ private[ml] def train(data: RDD[LabeledPoint], - oldStrategy: OldStrategy): DecisionTreeRegressionModel = { + oldStrategy: OldStrategy, featureSubsetStrategy: String): DecisionTreeRegressionModel = { --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r148197248 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -192,6 +197,10 @@ object GBTClassifier extends DefaultParamsReadable[GBTClassifier] { @Since("2.0.0") override def load(path: String): GBTClassifier = super.load(path) + + @Since("2.3.0") + final val supportedFeatureSubsetStrategies: Array[String] = --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r148197285 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala --- @@ -118,11 +119,12 @@ class DecisionTreeRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: S /** (private[ml]) Train a decision tree on an RDD */ private[ml] def train(data: RDD[LabeledPoint], - oldStrategy: OldStrategy): DecisionTreeRegressionModel = { + oldStrategy: OldStrategy, featureSubsetStrategy: String): DecisionTreeRegressionModel = { val instr = Instrumentation.create(this, data) instr.logParams(params: _*) -val trees = RandomForest.run(data, oldStrategy, numTrees = 1, featureSubsetStrategy = "all", +val trees = RandomForest.run(data, oldStrategy, numTrees = 1, + featureSubsetStrategy, --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r148197257 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala --- @@ -108,7 +108,8 @@ class DecisionTreeRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: S val instr = Instrumentation.create(this, oldDataset) instr.logParams(params: _*) -val trees = RandomForest.run(oldDataset, strategy, numTrees = 1, featureSubsetStrategy = "all", +val trees = RandomForest.run(oldDataset, strategy, numTrees = 1, --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @MLnick @sethah please find some time to look into this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @MLnick @sethah its been more than couple of months since the code changes has done as suggested. It would be really great if you can find some time to review it . Please review the pull request --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @MLnick @sethah please find some time to look into this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @MLnick @sethah please find some time to look into this .It will be really great if we can include this feature in spark 2.3 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 ping @MLnick @sethah --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @MLnick @sethah ,please let me know ,if you are ok with the changes . So that we can proceed forward . Thanks for your help :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 ping @sethah @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r125484321 --- Diff: project/MimaExcludes.scala --- @@ -196,7 +196,10 @@ object MimaExcludes { ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.streaming.StreamingQueryException.startOffset"), ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.streaming.StreamingQueryException.endOffset"), ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.streaming.StreamingQueryException.this"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryException.query") + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryException.query"), + + // [SPARK-20199][MLLIB] Add featureSubSet to GradientBoostedTrees + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.tree.GradientBoostedTrees.this") --- End diff -- @MLnick Yes you are correct , I have removed it . Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r125484168 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala --- @@ -166,6 +166,45 @@ class GBTRegressorSuite extends SparkFunSuite with MLlibTestSparkContext } / + // Tests of feature subset strategy + / + test("Tests of feature subset strategy") { +val numClasses = 2 +val gbt = new GBTRegressor() + .setMaxDepth(3) + .setMaxIter(5) + .setSubsamplingRate(1.0) + .setStepSize(0.5) + .setSeed(123) + .setFeatureSubsetStrategy("all") + +// In this data, feature 1 is very important. +val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc) +val categoricalFeatures = Map.empty[Int, Int] +val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses) + +val importances = gbt.fit(df).featureImportances +val mostImportantFeature = importances.argmax +assert(mostImportantFeature === 1) +assert(importances.toArray.sum === 1.0) +assert(importances.toArray.forall(_ >= 0.0)) + +val gbtWithFeatureSubset = new GBTRegressor() --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r125484140 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -354,6 +356,47 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext } / + // Tests of feature subset strategy + / + test("Tests of feature subset strategy") { +val numClasses = 2 +val gbt = new GBTClassifier() + .setImpurity("Gini") + .setMaxDepth(3) + .setMaxIter(5) + .setSubsamplingRate(1.0) + .setStepSize(0.5) + .setSeed(123) + .setFeatureSubsetStrategy("all") + +// In this data, feature 1 is very important. +val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc) +val categoricalFeatures = Map.empty[Int, Int] +val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses) + +val importances = gbt.fit(df).featureImportances +val mostImportantFeature = importances.argmax +assert(mostImportantFeature === 1) +assert(importances.toArray.sum === 1.0) +assert(importances.toArray.forall(_ >= 0.0)) + +val gbtWithFeatureSubset = new GBTClassifier() --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @MLnick Thanks for reviewing . Have done all the changes suggested by you . Please review . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 ping @sethah @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 ping @sethah , please let me know if there is any update on it . Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @sethah Thanks for comment . Have done the changes as suggested in PR description. I'll wait for the review comments from your side :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r123422892 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -192,6 +196,9 @@ object GBTClassifier extends DefaultParamsReadable[GBTClassifier] { @Since("2.0.0") override def load(path: String): GBTClassifier = super.load(path) + + final val supportedFeatureSubsetStrategies: Array[String] = --- End diff -- @sethah Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @MLnick Thanks for reviewing . I have added comment . Please review them --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r123269246 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala --- @@ -359,38 +365,6 @@ private[ml] trait TreeEnsembleParams extends DecisionTreeParams { oldImpurity: OldImpurity): OldStrategy = { super.getOldStrategy(categoricalFeatures, numClasses, oldAlgo, oldImpurity, getSubsamplingRate) } -} - -/** - * Parameters for Random Forest algorithms. - */ -private[ml] trait RandomForestParams extends TreeEnsembleParams { --- End diff -- @MLnick Earlier featureSubsetStrategy, setFeatureSubsetStrategy, getFeatureSubsetStrategy were part of RandomForestParams , moved them to be part of TreeEnsembleParams , so that it can be accessed by both Random forest and GBT . I have not moved anything apart from that . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @sethah Thanks for reviewing the pull request . - Change the title to obey the proper format [SPARK-20199][ML] ... - Response : Done - Change title to reflect that both GBTClassifier and GBTRegressor are changed - Response : Done - Please remove all the text you did not write from the PR description - Response : Done - Add a test to check that the default values are correct for GBTClassifier/Regressor. See the test in logistic regression titled: "logistic regression: default params" for reference - Response : Done - I'd like to test that this change takes effect. One way might be to construct a small dataset where one feature is highly predictive and other features are less so, train with featureSubsetStrategy = "all" and with featureSubsetStrategy = "1" and they should not produce the same tree. I'm open to other, simpler ways to test it if you can think of some. - Response : Added test case to check for featureSubsetStrategy parameter. Creating two GBT trees ,one with subset strategy "all" and other with "1" . Comparing their most important feature and important features vector to make sure tree are different --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r123265283 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -192,6 +196,9 @@ object GBTClassifier extends DefaultParamsReadable[GBTClassifier] { @Since("2.0.0") override def load(path: String): GBTClassifier = super.load(path) + + final val supportedFeatureSubsetStrategies: Array[String] = --- End diff -- done . I will add this to GBTRegressor in next pull request (forgot to add in this one) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r123264432 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -136,6 +136,10 @@ class GBTClassifier @Since("1.4.0") ( @Since("1.4.0") override def setStepSize(value: Double): this.type = set(stepSize, value) + /** @group setParam */ + override def setFeatureSubsetStrategy(value: String): this.type = --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r123264336 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala --- @@ -319,8 +327,10 @@ private[spark] object GradientBoostedTrees extends Logging { logDebug("###") logDebug("Gradient boosting tree iteration " + m) logDebug("###") + val dt = new DecisionTreeRegressor().setSeed(seed + m) - val model = dt.train(data, treeStrategy) + --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r123264390 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala --- @@ -49,14 +49,16 @@ import org.apache.spark.rdd.RDD @Since("1.2.0") class GradientBoostedTrees private[spark] ( private val boostingStrategy: BoostingStrategy, -private val seed: Int) +private val seed: Int, +private val featureSubsetStrategy: String) --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r123264302 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala --- @@ -284,11 +290,13 @@ private[spark] object GradientBoostedTrees extends Logging { logDebug("##") logDebug("Building tree 0") logDebug("##") +logDebug("Featuer Subset Strategy " + featureSubsetStrategy) // Initialize tree timer.start("building tree 0") val firstTree = new DecisionTreeRegressor().setSeed(seed) -val firstTreeModel = firstTree.train(input, treeStrategy) + --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r123264215 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala --- @@ -73,19 +75,21 @@ private[spark] object GradientBoostedTrees extends Logging { input: RDD[LabeledPoint], validationInput: RDD[LabeledPoint], boostingStrategy: OldBoostingStrategy, - seed: Long): (Array[DecisionTreeRegressionModel], Array[Double]) = { + seed: Long, + featureSubsetStrategy: String): (Array[DecisionTreeRegressionModel], Array[Double]) = { --- End diff -- @sethah I tried to make it similar to RandomForest.scala . It have strategy and featureSubSetStrategy as separate parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r123264263 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala --- @@ -284,11 +290,13 @@ private[spark] object GradientBoostedTrees extends Logging { logDebug("##") logDebug("Building tree 0") logDebug("##") +logDebug("Featuer Subset Strategy " + featureSubsetStrategy) --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @MLnick @srowen Thanks for your comments , I will wait for someone to review .:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 ping @mpjlu . Please review the pull request --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @sethah agree with you . Sorry if I unnecessary bother , was eager to get reviews on pull request. Thanks for the suggestion , will keep in mind --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @mpjlu please find some time to review the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 Have done the changes suggested by @mpjlu . Please find some time to review the pull request . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @jkbradley Please review the pull request --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 can any one of admins please review the pull request --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 can any one of admins review the pull request --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 12d83aa is successful . Please review the pull request . @MLnick @sethah @mpjlu @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @mpjlu : Please review the changes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @mpjlu Thanks for reviewing the code . I have done the code changes as suggested . Build is passed with all test cases. Please review and let me know if further changes are required. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r119365434 --- Diff: project/MimaExcludes.scala --- @@ -37,11 +37,15 @@ object MimaExcludes { // Exclude rules for 2.3.x lazy val v23excludes = v22excludes ++ Seq( // [SPARK-20495][SQL] Add StorageLevel to cacheTable API - ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.cacheTable") + ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.cacheTable"), + +// [SPARK-20199][MLLIB] Add featureSubSet to GradientBoostedTrees + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.tree.GradientBoostedTrees.this") ) --- End diff -- I put it in V21 excludes . Please let me know ,if you are expecting something else --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r119364985 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala --- @@ -420,18 +394,18 @@ private[ml] trait RandomForestParams extends TreeEnsembleParams { */ final val featureSubsetStrategy: Param[String] = new Param[String](this, "featureSubsetStrategy", "The number of features to consider for splits at each tree node." + - s" Supported options: ${RandomForestParams.supportedFeatureSubsetStrategies.mkString(", ")}" + + s" Supported options: ${TreeEnsembleParams.supportedFeatureSubsetStrategies.mkString(", ")}" + s", (0.0-1.0], [1-n].", (value: String) => - RandomForestParams.supportedFeatureSubsetStrategies.contains( + TreeEnsembleParams.supportedFeatureSubsetStrategies.contains( value.toLowerCase(Locale.ROOT)) - || Try(value.toInt).filter(_ > 0).isSuccess - || Try(value.toDouble).filter(_ > 0).filter(_ <= 1.0).isSuccess) +|| Try(value.toInt).filter(_ > 0).isSuccess +|| Try(value.toDouble).filter(_ > 0).filter(_ <= 1.0).isSuccess) --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r119365001 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala --- @@ -441,12 +415,44 @@ private[ml] trait RandomForestParams extends TreeEnsembleParams { final def getFeatureSubsetStrategy: String = $(featureSubsetStrategy).toLowerCase(Locale.ROOT) } -private[spark] object RandomForestParams { - // These options should be lowercase. - final val supportedFeatureSubsetStrategies: Array[String] = -Array("auto", "all", "onethird", "sqrt", "log2").map(_.toLowerCase(Locale.ROOT)) + + +/** + * Parameters for Random Forest algorithms. + */ +private[ml] trait RandomForestParams extends TreeEnsembleParams { + + /** + * Number of trees to train (>= 1). + * If 1, then no bootstrapping is used. If > 1, then bootstrapping is done. + * TODO: Change to always do bootstrapping (simpler). SPARK-7130 + * (default = 20) + * + * Note: The reason that we cannot add this to both GBT and RF (i.e. in TreeEnsembleParams) + * is the param `maxIter` controls how many trees a GBT has. The semantics in the algorithms + * are a bit different. + * @group param + */ + final val numTrees: IntParam = new IntParam(this, "numTrees", "Number of trees to train (>= 1)", +ParamValidators.gtEq(1)) + + setDefault(numTrees -> 20) + + /** + * @deprecated This method is deprecated and will be removed in 3.0.0. + * @group setParam + */ + @deprecated("This method is deprecated and will be removed in 3.0.0.", "2.1.0") + def setNumTrees(value: Int): this.type = set(numTrees, value) + + /** @group getParam */ + final def getNumTrees: Int = $(numTrees) + + } + + --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r119364950 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala --- @@ -305,7 +305,7 @@ private[ml] object TreeRegressorParams { } private[ml] trait DecisionTreeRegressorParams extends DecisionTreeParams - with TreeRegressorParams with HasVarianceCol { + with TreeRegressorParams with HasVarianceCol { --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r119364792 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -136,12 +136,20 @@ class GBTClassifier @Since("1.4.0") ( @Since("1.4.0") override def setStepSize(value: Double): this.type = set(stepSize, value) + /** @group setParam */ + override def setFeatureSubsetStrategy(value: String): this.type = +set(featureSubsetStrategy, value) + // Parameters from GBTClassifierParams: /** @group setParam */ @Since("1.4.0") def setLossType(value: String): this.type = set(lossType, value) + + + + --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r119364846 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala --- @@ -99,6 +99,8 @@ class DecisionTreeRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: S @Since("2.0.0") def setVarianceCol(value: String): this.type = set(varianceCol, value) + + --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 can any one of admin ,please review the pull request. It would be really helpful. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 16ccbdf is successful . Please review the pull request . @MLnick @sethah @mpjlu @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 Can one of the admins please verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 [~arushkharbanda][~peng.m...@intel.com][~facai] [~srowen] Please review the pull request /approach,. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...
GitHub user pralabhkumar opened a pull request: https://github.com/apache/spark/pull/18118 SPARK-20199 : Provided featureSubsetStrategy to GBTClassifier ## What changes were proposed in this pull request? (Provided featureSubset Strategy to GBTClassifier a) Moved featureSubsetStrategy to TreeEnsembleParams b) Changed GBTClassifier to pass featureSubsetStrategy val firstTreeModel = firstTree.train(input, treeStrategy, featureSubsetStrategy)) ## How was this patch tested? a) Tested GradientBoostedTreeClassifierExample by adding .setFeatureSubsetStrategy with GBTClassifier (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pralabhkumar/spark develop Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18118.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18118 commit b0444fa75f4cc33a0c35cf88664a89a1c425e7a1 Author: Pralabh Kumar <pralabhku...@gmail.com> Date: 2017-05-26T07:16:32Z SPARK-20199 : Provided featureSubsetStrategy to GBTClassifier --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org