[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-10 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@MLnick @WeichenXu123 @sethah Thanks for your help throughout the process.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-09 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@MLnick Please find some time to review it and let me know if we can 
proceed with this. Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-08 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@MLnick  Thanks for the reviewing the code . Have done changes as 
suggested. 

Please proceed further if its good to go .

Thanks  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-11-08 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r149886343
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala ---
@@ -166,6 +166,40 @@ class GBTRegressorSuite extends SparkFunSuite with 
MLlibTestSparkContext
   }
 
   
/
+  // Tests of feature subset strategy
+  
/
+  test("Tests of feature subset strategy") {
+val numClasses = 2
+val gbt = new GBTRegressor()
+  .setMaxDepth(3)
+  .setMaxIter(5)
+  .setSubsamplingRate(1.0)
+  .setStepSize(0.5)
+  .setSeed(123)
+  .setFeatureSubsetStrategy("all")
+
+// In this data, feature 1 is very important.
+val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
+val categoricalFeatures = Map.empty[Int, Int]
+val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, 
numClasses)
+
+val importances = gbt.fit(df).featureImportances
+val mostImportantFeature = importances.argmax
+assert(mostImportantFeature === 1)
+assert(importances.toArray.sum === 1.0)
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-11-08 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r149886323
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala ---
@@ -173,6 +178,10 @@ object GBTRegressor extends 
DefaultParamsReadable[GBTRegressor] {
 
   @Since("2.0.0")
   override def load(path: String): GBTRegressor = super.load(path)
+
+  @Since("2.3.0")
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-11-08 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r149886357
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala ---
@@ -166,6 +166,40 @@ class GBTRegressorSuite extends SparkFunSuite with 
MLlibTestSparkContext
   }
 
   
/
+  // Tests of feature subset strategy
+  
/
+  test("Tests of feature subset strategy") {
+val numClasses = 2
+val gbt = new GBTRegressor()
+  .setMaxDepth(3)
+  .setMaxIter(5)
+  .setSubsamplingRate(1.0)
+  .setStepSize(0.5)
+  .setSeed(123)
+  .setFeatureSubsetStrategy("all")
+
+// In this data, feature 1 is very important.
+val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
+val categoricalFeatures = Map.empty[Int, Int]
+val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, 
numClasses)
+
+val importances = gbt.fit(df).featureImportances
+val mostImportantFeature = importances.argmax
+assert(mostImportantFeature === 1)
+assert(importances.toArray.sum === 1.0)
+assert(importances.toArray.forall(_ >= 0.0))
+
+// GBT with different featureSubsetStrategy
+val gbtWithFeatureSubset = gbt.setFeatureSubsetStrategy("1")
+val importanceFeatures = 
gbtWithFeatureSubset.fit(df).featureImportances
+val mostIF = importanceFeatures.argmax
+assert(!(mostImportantFeature === mostIF))
+assert(importanceFeatures.toArray.sum === 1.0)
+assert(importanceFeatures.toArray.forall(_ >= 0.0))
+assert(!(importanceFeatures.toDense.values.deep === 
importances.toDense.values.deep))
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-07 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
Ping @MLnick @jkbradley  . @sethah has given LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-05 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@MLnick @jkbradley Please find some time to review it . @sethah has given 
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-02 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@sethah Build is passed :) ,have done the changes as suggested (setting 
maxIter and maxDepth).

ping @MLnick or @jkbradley so we can move ahead with it.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-02 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
Jenkins test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-02 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@sethah Its still failing , I don't think so its issue from my side.  
Please help


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-02 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@sethah Have done the changes as suggested ,but build is failing because of 
this error
Step ?Publish JUnit test result report? failed: No test report files were 
found. Configuration error?

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83354/

Please help on the same





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-01 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@sethah please find some time to look into the changes .  

Please  let me know if further changes are required.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-01 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@sethah Thanks for reviewing code . I have done all the changed as 
suggested by you . 

Please review them and let me know if further changes are required. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-11-01 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r148197373
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
 ---
@@ -354,6 +356,41 @@ class GBTClassifierSuite extends SparkFunSuite with 
MLlibTestSparkContext
   }
 
   
/
+  // Tests of feature subset strategy
+  
/
+  test("Tests of feature subset strategy") {
+val numClasses = 2
+val gbt = new GBTClassifier()
--- End diff --

Removed stepSize , impurity and other  parameters 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-11-01 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r148197388
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
 ---
@@ -354,6 +356,41 @@ class GBTClassifierSuite extends SparkFunSuite with 
MLlibTestSparkContext
   }
 
   
/
+  // Tests of feature subset strategy
+  
/
+  test("Tests of feature subset strategy") {
+val numClasses = 2
+val gbt = new GBTClassifier()
+  .setImpurity("Gini")
+  .setMaxDepth(3)
+  .setMaxIter(5)
+  .setSubsamplingRate(1.0)
+  .setStepSize(0.5)
+  .setSeed(123)
+  .setFeatureSubsetStrategy("all")
+
+// In this data, feature 1 is very important.
+val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
+val categoricalFeatures = Map.empty[Int, Int]
+val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, 
numClasses)
+
+val importances = gbt.fit(df).featureImportances
+val mostImportantFeature = importances.argmax
+assert(mostImportantFeature === 1)
+assert(importances.toArray.sum === 1.0)
+assert(importances.toArray.forall(_ >= 0.0))
+
+// GBT with different featureSubsetStrategy
+val gbtWithFeatureSubset = gbt.setFeatureSubsetStrategy("1")
+val importanceFeatures = 
gbtWithFeatureSubset.fit(df).featureImportances
+val mostIF = importanceFeatures.argmax
+assert(!(mostImportantFeature === mostIF))
+assert(importanceFeatures.toArray.sum === 1.0)
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-11-01 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r148197380
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
 ---
@@ -354,6 +356,41 @@ class GBTClassifierSuite extends SparkFunSuite with 
MLlibTestSparkContext
   }
 
   
/
+  // Tests of feature subset strategy
+  
/
+  test("Tests of feature subset strategy") {
+val numClasses = 2
+val gbt = new GBTClassifier()
+  .setImpurity("Gini")
+  .setMaxDepth(3)
+  .setMaxIter(5)
+  .setSubsamplingRate(1.0)
+  .setStepSize(0.5)
+  .setSeed(123)
+  .setFeatureSubsetStrategy("all")
+
+// In this data, feature 1 is very important.
+val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
+val categoricalFeatures = Map.empty[Int, Int]
+val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, 
numClasses)
+
+val importances = gbt.fit(df).featureImportances
+val mostImportantFeature = importances.argmax
+assert(mostImportantFeature === 1)
+assert(importances.toArray.sum === 1.0)
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-11-01 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r148197272
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala 
---
@@ -118,11 +119,12 @@ class DecisionTreeRegressor @Since("1.4.0") 
(@Since("1.4.0") override val uid: S
 
   /** (private[ml]) Train a decision tree on an RDD */
   private[ml] def train(data: RDD[LabeledPoint],
-  oldStrategy: OldStrategy): DecisionTreeRegressionModel = {
+  oldStrategy: OldStrategy, featureSubsetStrategy: String): 
DecisionTreeRegressionModel = {
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-11-01 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r148197248
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -192,6 +197,10 @@ object GBTClassifier extends 
DefaultParamsReadable[GBTClassifier] {
 
   @Since("2.0.0")
   override def load(path: String): GBTClassifier = super.load(path)
+
+  @Since("2.3.0")
+  final val supportedFeatureSubsetStrategies: Array[String] =
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-11-01 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r148197285
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala 
---
@@ -118,11 +119,12 @@ class DecisionTreeRegressor @Since("1.4.0") 
(@Since("1.4.0") override val uid: S
 
   /** (private[ml]) Train a decision tree on an RDD */
   private[ml] def train(data: RDD[LabeledPoint],
-  oldStrategy: OldStrategy): DecisionTreeRegressionModel = {
+  oldStrategy: OldStrategy, featureSubsetStrategy: String): 
DecisionTreeRegressionModel = {
 val instr = Instrumentation.create(this, data)
 instr.logParams(params: _*)
 
-val trees = RandomForest.run(data, oldStrategy, numTrees = 1, 
featureSubsetStrategy = "all",
+val trees = RandomForest.run(data, oldStrategy, numTrees = 1,
+  featureSubsetStrategy,
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-11-01 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r148197257
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala 
---
@@ -108,7 +108,8 @@ class DecisionTreeRegressor @Since("1.4.0") 
(@Since("1.4.0") override val uid: S
 val instr = Instrumentation.create(this, oldDataset)
 instr.logParams(params: _*)
 
-val trees = RandomForest.run(oldDataset, strategy, numTrees = 1, 
featureSubsetStrategy = "all",
+val trees = RandomForest.run(oldDataset, strategy, numTrees = 1,
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-10-31 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@MLnick  @sethah  please find some time to look into this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-10-03 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@MLnick @sethah its been more than couple of months since the code changes 
has done as suggested. It would be really great if you can find some time to 
review it . Please review the pull request


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-08-26 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@MLnick @sethah please find some time to look into this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-08-14 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@MLnick @sethah please find some time to look into this .It will be really 
great if we can include this feature in spark 2.3


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-07-28 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
ping @MLnick @sethah


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-07-17 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@MLnick @sethah ,please let me know ,if you are ok with the changes . So 
that we can proceed forward . Thanks for your help :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-07-10 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
ping @sethah  @MLnick 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-07-04 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r125484321
  
--- Diff: project/MimaExcludes.scala ---
@@ -196,7 +196,10 @@ object MimaExcludes {
   
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.streaming.StreamingQueryException.startOffset"),
   
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.streaming.StreamingQueryException.endOffset"),
   
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.streaming.StreamingQueryException.this"),
-  
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryException.query")
+  
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryException.query"),
+
+ // [SPARK-20199][MLLIB] Add featureSubSet to GradientBoostedTrees
+ 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.tree.GradientBoostedTrees.this")
--- End diff --

@MLnick Yes you are correct , I have removed it . Thanks 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-07-04 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r125484168
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala ---
@@ -166,6 +166,45 @@ class GBTRegressorSuite extends SparkFunSuite with 
MLlibTestSparkContext
   }
 
   
/
+  // Tests of feature subset strategy
+  
/
+  test("Tests of feature subset strategy") {
+val numClasses = 2
+val gbt = new GBTRegressor()
+  .setMaxDepth(3)
+  .setMaxIter(5)
+  .setSubsamplingRate(1.0)
+  .setStepSize(0.5)
+  .setSeed(123)
+  .setFeatureSubsetStrategy("all")
+
+// In this data, feature 1 is very important.
+val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
+val categoricalFeatures = Map.empty[Int, Int]
+val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, 
numClasses)
+
+val importances = gbt.fit(df).featureImportances
+val mostImportantFeature = importances.argmax
+assert(mostImportantFeature === 1)
+assert(importances.toArray.sum === 1.0)
+assert(importances.toArray.forall(_ >= 0.0))
+
+val gbtWithFeatureSubset = new GBTRegressor()
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-07-04 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r125484140
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
 ---
@@ -354,6 +356,47 @@ class GBTClassifierSuite extends SparkFunSuite with 
MLlibTestSparkContext
   }
 
   
/
+  // Tests of feature subset strategy
+  
/
+  test("Tests of feature subset strategy") {
+val numClasses = 2
+val gbt = new GBTClassifier()
+  .setImpurity("Gini")
+  .setMaxDepth(3)
+  .setMaxIter(5)
+  .setSubsamplingRate(1.0)
+  .setStepSize(0.5)
+  .setSeed(123)
+  .setFeatureSubsetStrategy("all")
+
+// In this data, feature 1 is very important.
+val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
+val categoricalFeatures = Map.empty[Int, Int]
+val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, 
numClasses)
+
+val importances = gbt.fit(df).featureImportances
+val mostImportantFeature = importances.argmax
+assert(mostImportantFeature === 1)
+assert(importances.toArray.sum === 1.0)
+assert(importances.toArray.forall(_ >= 0.0))
+
+val gbtWithFeatureSubset = new GBTClassifier()
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-07-04 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@MLnick Thanks for reviewing . Have done all the changes suggested by you . 
Please review . 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-07-03 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
ping @sethah @MLnick 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-06-29 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
ping @sethah , please let me know if there is any update on it . Thanks 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-06-22 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@sethah Thanks for comment . Have done the changes as suggested in PR 
description.  

I'll wait for the review comments from your side :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-06-21 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r123422892
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -192,6 +196,9 @@ object GBTClassifier extends 
DefaultParamsReadable[GBTClassifier] {
 
   @Since("2.0.0")
   override def load(path: String): GBTClassifier = super.load(path)
+
+  final val supportedFeatureSubsetStrategies: Array[String] =
--- End diff --

@sethah 
Done 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-06-21 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@MLnick
Thanks for reviewing . I have added comment .  

Please  review them 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-06-21 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r123269246
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
@@ -359,38 +365,6 @@ private[ml] trait TreeEnsembleParams extends 
DecisionTreeParams {
   oldImpurity: OldImpurity): OldStrategy = {
 super.getOldStrategy(categoricalFeatures, numClasses, oldAlgo, 
oldImpurity, getSubsamplingRate)
   }
-}
-
-/**
- * Parameters for Random Forest algorithms.
- */
-private[ml] trait RandomForestParams extends TreeEnsembleParams {
--- End diff --

@MLnick 
 
Earlier featureSubsetStrategy, setFeatureSubsetStrategy, 
getFeatureSubsetStrategy were part of RandomForestParams , moved them to be 
part of TreeEnsembleParams , so that it can be accessed by both Random forest 
and GBT . I have not moved anything apart from that . 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-06-21 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@sethah 

Thanks for reviewing the pull request .

- Change the title to obey the proper format [SPARK-20199][ML] ...

- Response : Done 

- Change  title to reflect that both GBTClassifier and GBTRegressor are 
changed

- Response : Done

- Please  remove all the text you did not write from the PR description

- Response : Done

- Add a test to check that the default values are correct for 
GBTClassifier/Regressor. See the test in logistic regression titled: "logistic 
regression: default params" for reference

- Response : Done

- I'd  like to test that this change takes effect. One way might be to 
construct a small dataset where one feature is highly predictive and other 
features are less so, train with featureSubsetStrategy = "all" and with 
featureSubsetStrategy = "1" and they should not produce the same tree. I'm open 
to other, simpler ways to test it if you can think of some.

- Response : Added test case to check for featureSubsetStrategy parameter. 
Creating two GBT trees ,one with subset strategy "all" and other with "1" . 
Comparing their most important feature and important features vector to make 
sure tree are different


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-06-21 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r123265283
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -192,6 +196,9 @@ object GBTClassifier extends 
DefaultParamsReadable[GBTClassifier] {
 
   @Since("2.0.0")
   override def load(path: String): GBTClassifier = super.load(path)
+
+  final val supportedFeatureSubsetStrategies: Array[String] =
--- End diff --

done . I will add this to GBTRegressor in next pull request (forgot to add 
in this one) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-06-21 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r123264432
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -136,6 +136,10 @@ class GBTClassifier @Since("1.4.0") (
   @Since("1.4.0")
   override def setStepSize(value: Double): this.type = set(stepSize, value)
 
+  /** @group setParam */
+  override def setFeatureSubsetStrategy(value: String): this.type =
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-06-21 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r123264336
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala 
---
@@ -319,8 +327,10 @@ private[spark] object GradientBoostedTrees extends 
Logging {
   logDebug("###")
   logDebug("Gradient boosting tree iteration " + m)
   logDebug("###")
+
   val dt = new DecisionTreeRegressor().setSeed(seed + m)
-  val model = dt.train(data, treeStrategy)
+
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-06-21 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r123264390
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala ---
@@ -49,14 +49,16 @@ import org.apache.spark.rdd.RDD
 @Since("1.2.0")
 class GradientBoostedTrees private[spark] (
 private val boostingStrategy: BoostingStrategy,
-private val seed: Int)
+private val seed: Int,
+private val featureSubsetStrategy: String)
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-06-21 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r123264302
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala 
---
@@ -284,11 +290,13 @@ private[spark] object GradientBoostedTrees extends 
Logging {
 logDebug("##")
 logDebug("Building tree 0")
 logDebug("##")
+logDebug("Featuer Subset Strategy " + featureSubsetStrategy)
 
 // Initialize tree
 timer.start("building tree 0")
 val firstTree = new DecisionTreeRegressor().setSeed(seed)
-val firstTreeModel = firstTree.train(input, treeStrategy)
+
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-06-21 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r123264215
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala 
---
@@ -73,19 +75,21 @@ private[spark] object GradientBoostedTrees extends 
Logging {
   input: RDD[LabeledPoint],
   validationInput: RDD[LabeledPoint],
   boostingStrategy: OldBoostingStrategy,
-  seed: Long): (Array[DecisionTreeRegressionModel], Array[Double]) = {
+  seed: Long,
+  featureSubsetStrategy: String): (Array[DecisionTreeRegressionModel], 
Array[Double]) = {
--- End diff --

@sethah I tried to make it similar to RandomForest.scala . It have strategy 
and featureSubSetStrategy as separate parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-06-21 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r123264263
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala 
---
@@ -284,11 +290,13 @@ private[spark] object GradientBoostedTrees extends 
Logging {
 logDebug("##")
 logDebug("Building tree 0")
 logDebug("##")
+logDebug("Featuer Subset Strategy " + featureSubsetStrategy)
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-06-12 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@MLnick @srowen 
Thanks for your comments , I will wait for someone to review .:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-06-12 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
ping @mpjlu . Please review the pull request


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-06-07 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@sethah agree with you . Sorry if I unnecessary bother , was eager to get 
reviews on pull request. 
Thanks for the suggestion , will keep in mind


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-06-07 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@mpjlu 
please find some time to review the code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-06-06 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
Have done the changes suggested by @mpjlu  . 

Please find some time to review the pull request . 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-06-05 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@jkbradley Please review the pull request


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-06-05 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
can any one of admins please review the pull request


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-06-02 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
can any one of admins review the pull request


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-06-02 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
12d83aa is successful . Please review the pull request .
@MLnick @sethah @mpjlu @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-06-01 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@mpjlu : Please review the changes 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-05-31 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@mpjlu Thanks for reviewing the code . I have done the code changes as 
suggested .
Build is passed with all test cases.

Please review and let me know if further changes are required.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

2017-05-31 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r119365434
  
--- Diff: project/MimaExcludes.scala ---
@@ -37,11 +37,15 @@ object MimaExcludes {
   // Exclude rules for 2.3.x
   lazy val v23excludes = v22excludes ++ Seq(
 // [SPARK-20495][SQL] Add StorageLevel to cacheTable API
-
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.cacheTable")
+
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.cacheTable"),
+   
+// [SPARK-20199][MLLIB] Add featureSubSet to GradientBoostedTrees
+
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.tree.GradientBoostedTrees.this")
   )
--- End diff --

I put it in V21 excludes . Please let me know ,if you are expecting 
something else


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

2017-05-31 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r119364985
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
@@ -420,18 +394,18 @@ private[ml] trait RandomForestParams extends 
TreeEnsembleParams {
*/
   final val featureSubsetStrategy: Param[String] = new Param[String](this, 
"featureSubsetStrategy",
 "The number of features to consider for splits at each tree node." +
-  s" Supported options: 
${RandomForestParams.supportedFeatureSubsetStrategies.mkString(", ")}" +
+  s" Supported options: 
${TreeEnsembleParams.supportedFeatureSubsetStrategies.mkString(", ")}" +
   s", (0.0-1.0], [1-n].",
 (value: String) =>
-  RandomForestParams.supportedFeatureSubsetStrategies.contains(
+  TreeEnsembleParams.supportedFeatureSubsetStrategies.contains(
 value.toLowerCase(Locale.ROOT))
-  || Try(value.toInt).filter(_ > 0).isSuccess
-  || Try(value.toDouble).filter(_ > 0).filter(_ <= 1.0).isSuccess)
+|| Try(value.toInt).filter(_ > 0).isSuccess
+|| Try(value.toDouble).filter(_ > 0).filter(_ <= 1.0).isSuccess)
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

2017-05-31 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r119365001
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
@@ -441,12 +415,44 @@ private[ml] trait RandomForestParams extends 
TreeEnsembleParams {
   final def getFeatureSubsetStrategy: String = 
$(featureSubsetStrategy).toLowerCase(Locale.ROOT)
 }
 
-private[spark] object RandomForestParams {
-  // These options should be lowercase.
-  final val supportedFeatureSubsetStrategies: Array[String] =
-Array("auto", "all", "onethird", "sqrt", 
"log2").map(_.toLowerCase(Locale.ROOT))
+
+
+/**
+ * Parameters for Random Forest algorithms.
+ */
+private[ml] trait RandomForestParams extends TreeEnsembleParams {
+
+  /**
+   * Number of trees to train (>= 1).
+   * If 1, then no bootstrapping is used.  If > 1, then bootstrapping is 
done.
+   * TODO: Change to always do bootstrapping (simpler).  SPARK-7130
+   * (default = 20)
+   *
+   * Note: The reason that we cannot add this to both GBT and RF (i.e. in 
TreeEnsembleParams)
+   * is the param `maxIter` controls how many trees a GBT has. The 
semantics in the algorithms
+   * are a bit different.
+   * @group param
+   */
+  final val numTrees: IntParam = new IntParam(this, "numTrees", "Number of 
trees to train (>= 1)",
+ParamValidators.gtEq(1))
+
+  setDefault(numTrees -> 20)
+
+  /**
+   * @deprecated This method is deprecated and will be removed in 3.0.0.
+   * @group setParam
+   */
+  @deprecated("This method is deprecated and will be removed in 3.0.0.", 
"2.1.0")
+  def setNumTrees(value: Int): this.type = set(numTrees, value)
+
+  /** @group getParam */
+  final def getNumTrees: Int = $(numTrees)
+
+
 }
 
+
+
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

2017-05-31 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r119364950
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
@@ -305,7 +305,7 @@ private[ml] object TreeRegressorParams {
 }
 
 private[ml] trait DecisionTreeRegressorParams extends DecisionTreeParams
-  with TreeRegressorParams with HasVarianceCol {
+  with TreeRegressorParams with HasVarianceCol  {
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

2017-05-31 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r119364792
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -136,12 +136,20 @@ class GBTClassifier @Since("1.4.0") (
   @Since("1.4.0")
   override def setStepSize(value: Double): this.type = set(stepSize, value)
 
+  /** @group setParam */
+  override def setFeatureSubsetStrategy(value: String): this.type =
+set(featureSubsetStrategy, value)
+
   // Parameters from GBTClassifierParams:
 
   /** @group setParam */
   @Since("1.4.0")
   def setLossType(value: String): this.type = set(lossType, value)
 
+
+
+
+
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

2017-05-31 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r119364846
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala 
---
@@ -99,6 +99,8 @@ class DecisionTreeRegressor @Since("1.4.0") 
(@Since("1.4.0") override val uid: S
   @Since("2.0.0")
   def setVarianceCol(value: String): this.type = set(varianceCol, value)
 
+
+
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-05-30 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
can any one of admin ,please review the pull request.  It would be really 
helpful. Thanks 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-05-29 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
16ccbdf is successful . Please review the pull request .  
@MLnick @sethah @mpjlu @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-05-29 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
Can one of the admins please verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-05-28 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
[~arushkharbanda][~peng.m...@intel.com][~facai] [~srowen]

Please review the pull request /approach,.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: SPARK-20199 : Provided featureSubsetStrategy to G...

2017-05-26 Thread pralabhkumar
GitHub user pralabhkumar opened a pull request:

https://github.com/apache/spark/pull/18118

SPARK-20199 : Provided featureSubsetStrategy to GBTClassifier

## What changes were proposed in this pull request?

(Provided featureSubset Strategy to GBTClassifier
a) Moved featureSubsetStrategy to TreeEnsembleParams
b)  Changed GBTClassifier to pass featureSubsetStrategy
val firstTreeModel = firstTree.train(input, treeStrategy, 
featureSubsetStrategy))

## How was this patch tested?
a) Tested GradientBoostedTreeClassifierExample by adding 
.setFeatureSubsetStrategy with GBTClassifier
(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pralabhkumar/spark develop

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18118


commit b0444fa75f4cc33a0c35cf88664a89a1c425e7a1
Author: Pralabh Kumar <pralabhku...@gmail.com>
Date:   2017-05-26T07:16:32Z

SPARK-20199 : Provided featureSubsetStrategy to GBTClassifier




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org