[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user epahomov closed the pull request at: https://github.com/apache/spark/pull/2394 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user epahomov commented on the pull request: https://github.com/apache/spark/pull/2394#issuecomment-57077001 Sorry for such messy pull request, I didn't review my student code close enough. Would try my best next time. We'll fix everything by the middle of the week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2394#issuecomment-57106342 @epahomov Hi, just making sure you saw the [comment in the JIRA](https://issues.apache.org/jira/browse/SPARK-3525) about overlapping JIRAs and PRs in preparation for gradient boosting. It would be great to get your student's input on [the other GBT JIRA](https://issues.apache.org/jira/browse/SPARK-1547) and the linked design doc---thank you both! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2394#issuecomment-5782 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2394#issuecomment-57000202 @jkbradley @manishamde Could you help review this PR? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2394#issuecomment-57000221 this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2394#issuecomment-57001565 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20864/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18105434 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StochasticGradientBoosting.scala --- @@ -0,0 +1,173 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.tree.DecisionTree +import org.apache.spark.mllib.tree.configuration.Algo.Algo +import org.apache.spark.mllib.tree.configuration.Strategy +import org.apache.spark.mllib.tree.impurity.Impurity +import org.apache.spark.mllib.tree.model.DecisionTreeModel +import org.apache.spark.rdd.{DoubleRDDFunctions, RDD} +import scala.util.Random + +/** + * + * Read about the algorithm Gradient boosting here: + * http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2007/GWD07/geurts-icml2007.pdf + * + * Libraries that implement the algorithm Gradient boosting similar way + * https://code.google.com/p/jforests/ + * https://code.google.com/p/jsgbm/ + * + */ +class StochasticGradientBoosting { + + /** + * Train a Gradient Boosting model given an RDD of (label, features) pairs. + * + * @param input Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. + * @param leaningRate Learning rate + * @param countOfTrees Number of trees. + * @param samplingSizeRatio Size of random sample, percent of ${input} size. + * @param strategy The configuration parameters for the tree algorithm which specify the type + * of algorithm (classification, regression, etc.), feature type (continuous, + * categorical), depth of the tree, quantile calculation strategy, etc. + * @return StochasticGradientBoostingModel that can be used for prediction + */ + def run( + input : RDD[LabeledPoint], + leaningRate : Double, + countOfTrees : Int, + samplingSizeRatio : Double, + strategy: Strategy): StochasticGradientBoostingModel = { + +val featureDimension = input.count() +val mean = new DoubleRDDFunctions(input.map(l = l.label)).mean() --- End diff -- First import SparkConetxt, then you can use input.map(l = l.label).mean() directly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18105459 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StochasticGradientBoosting.scala --- @@ -0,0 +1,173 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.tree.DecisionTree +import org.apache.spark.mllib.tree.configuration.Algo.Algo +import org.apache.spark.mllib.tree.configuration.Strategy +import org.apache.spark.mllib.tree.impurity.Impurity +import org.apache.spark.mllib.tree.model.DecisionTreeModel +import org.apache.spark.rdd.{DoubleRDDFunctions, RDD} --- End diff -- DoubleRDDFunctions is not needed here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18105497 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StochasticGradientBoosting.scala --- @@ -0,0 +1,173 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.tree.DecisionTree +import org.apache.spark.mllib.tree.configuration.Algo.Algo +import org.apache.spark.mllib.tree.configuration.Strategy +import org.apache.spark.mllib.tree.impurity.Impurity +import org.apache.spark.mllib.tree.model.DecisionTreeModel +import org.apache.spark.rdd.{DoubleRDDFunctions, RDD} +import scala.util.Random + +/** + * + * Read about the algorithm Gradient boosting here: + * http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2007/GWD07/geurts-icml2007.pdf + * + * Libraries that implement the algorithm Gradient boosting similar way + * https://code.google.com/p/jforests/ + * https://code.google.com/p/jsgbm/ + * + */ +class StochasticGradientBoosting { + + /** + * Train a Gradient Boosting model given an RDD of (label, features) pairs. + * + * @param input Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. + * @param leaningRate Learning rate + * @param countOfTrees Number of trees. + * @param samplingSizeRatio Size of random sample, percent of ${input} size. + * @param strategy The configuration parameters for the tree algorithm which specify the type + * of algorithm (classification, regression, etc.), feature type (continuous, + * categorical), depth of the tree, quantile calculation strategy, etc. + * @return StochasticGradientBoostingModel that can be used for prediction + */ + def run( + input : RDD[LabeledPoint], + leaningRate : Double, + countOfTrees : Int, --- End diff -- How about TreeCount? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18106160 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StochasticGradientBoosting.scala --- @@ -0,0 +1,173 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.tree.DecisionTree +import org.apache.spark.mllib.tree.configuration.Algo.Algo +import org.apache.spark.mllib.tree.configuration.Strategy +import org.apache.spark.mllib.tree.impurity.Impurity +import org.apache.spark.mllib.tree.model.DecisionTreeModel +import org.apache.spark.rdd.{DoubleRDDFunctions, RDD} +import scala.util.Random + +/** + * + * Read about the algorithm Gradient boosting here: + * http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2007/GWD07/geurts-icml2007.pdf + * + * Libraries that implement the algorithm Gradient boosting similar way + * https://code.google.com/p/jforests/ + * https://code.google.com/p/jsgbm/ + * + */ +class StochasticGradientBoosting { + + /** + * Train a Gradient Boosting model given an RDD of (label, features) pairs. + * + * @param input Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. + * @param leaningRate Learning rate + * @param countOfTrees Number of trees. + * @param samplingSizeRatio Size of random sample, percent of ${input} size. + * @param strategy The configuration parameters for the tree algorithm which specify the type + * of algorithm (classification, regression, etc.), feature type (continuous, + * categorical), depth of the tree, quantile calculation strategy, etc. + * @return StochasticGradientBoostingModel that can be used for prediction + */ + def run( + input : RDD[LabeledPoint], + leaningRate : Double, + countOfTrees : Int, + samplingSizeRatio : Double, + strategy: Strategy): StochasticGradientBoostingModel = { + +val featureDimension = input.count() +val mean = new DoubleRDDFunctions(input.map(l = l.label)).mean() +val boostingModel = new StochasticGradientBoostingModel(countOfTrees, mean, leaningRate) + +for (i - 0 to countOfTrees - 1) { + val gradient = input.map(l = l.label - boostingModel.computeValue(l.features)) + + val newInput: RDD[LabeledPoint] = input +.zip(gradient) +.map{case(inputVal, gradientVal) = new LabeledPoint(gradientVal, inputVal.features)} + + val randomSample = newInput.sample( +false, +(samplingSizeRatio * featureDimension).asInstanceOf[Int], +Random.nextInt() + ) + + val model = DecisionTree.train(randomSample, strategy) + boostingModel.addTree(model) +} +boostingModel + } +} + +/** + * Model that can be used for prediction. + * + * @param countOfTrees Number of trees. + * @param initValue Initialize model with this value. + * @param learningRate Learning rate. + */ +class StochasticGradientBoostingModel ( +private val countOfTrees: Int, +private var initValue: Double, +private val learningRate: Double) extends Serializable with RegressionModel { + + val trees: Array[DecisionTreeModel] = new Array[DecisionTreeModel](countOfTrees) + var index: Int = 0 + + def this(countOfTrees:Int, learning_rate: Double) = { +this(countOfTrees, 0, learning_rate) + } + + def computeValue(feature_x: Vector): Double = { +var re_res = initValue + +if (index == 0) { + return re_res +} +for (i - 0 to index - 1) { + re_res += learningRate * trees(i).predict(feature_x) +} +re_res + } + + def addTree(tree : DecisionTreeModel) = { +trees(index) = tree +index += 1 + } + + def setInitValue (value : Double) = { +initValue = value + } --- End diff -- return this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18106192 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StochasticGradientBoosting.scala --- @@ -0,0 +1,173 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.tree.DecisionTree +import org.apache.spark.mllib.tree.configuration.Algo.Algo +import org.apache.spark.mllib.tree.configuration.Strategy +import org.apache.spark.mllib.tree.impurity.Impurity +import org.apache.spark.mllib.tree.model.DecisionTreeModel +import org.apache.spark.rdd.{DoubleRDDFunctions, RDD} +import scala.util.Random + +/** + * + * Read about the algorithm Gradient boosting here: + * http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2007/GWD07/geurts-icml2007.pdf + * + * Libraries that implement the algorithm Gradient boosting similar way + * https://code.google.com/p/jforests/ + * https://code.google.com/p/jsgbm/ + * + */ +class StochasticGradientBoosting { + + /** + * Train a Gradient Boosting model given an RDD of (label, features) pairs. + * + * @param input Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. + * @param leaningRate Learning rate + * @param countOfTrees Number of trees. + * @param samplingSizeRatio Size of random sample, percent of ${input} size. + * @param strategy The configuration parameters for the tree algorithm which specify the type + * of algorithm (classification, regression, etc.), feature type (continuous, + * categorical), depth of the tree, quantile calculation strategy, etc. + * @return StochasticGradientBoostingModel that can be used for prediction + */ + def run( + input : RDD[LabeledPoint], + leaningRate : Double, + countOfTrees : Int, + samplingSizeRatio : Double, + strategy: Strategy): StochasticGradientBoostingModel = { + +val featureDimension = input.count() +val mean = new DoubleRDDFunctions(input.map(l = l.label)).mean() +val boostingModel = new StochasticGradientBoostingModel(countOfTrees, mean, leaningRate) + +for (i - 0 to countOfTrees - 1) { + val gradient = input.map(l = l.label - boostingModel.computeValue(l.features)) + + val newInput: RDD[LabeledPoint] = input +.zip(gradient) +.map{case(inputVal, gradientVal) = new LabeledPoint(gradientVal, inputVal.features)} + + val randomSample = newInput.sample( +false, +(samplingSizeRatio * featureDimension).asInstanceOf[Int], +Random.nextInt() + ) + + val model = DecisionTree.train(randomSample, strategy) + boostingModel.addTree(model) +} +boostingModel + } +} + +/** + * Model that can be used for prediction. + * + * @param countOfTrees Number of trees. + * @param initValue Initialize model with this value. + * @param learningRate Learning rate. + */ +class StochasticGradientBoostingModel ( +private val countOfTrees: Int, +private var initValue: Double, +private val learningRate: Double) extends Serializable with RegressionModel { + + val trees: Array[DecisionTreeModel] = new Array[DecisionTreeModel](countOfTrees) + var index: Int = 0 + + def this(countOfTrees:Int, learning_rate: Double) = { +this(countOfTrees, 0, learning_rate) + } + + def computeValue(feature_x: Vector): Double = { +var re_res = initValue + +if (index == 0) { + return re_res +} +for (i - 0 to index - 1) { + re_res += learningRate * trees(i).predict(feature_x) +} +re_res + } + + def addTree(tree : DecisionTreeModel) = { --- End diff -- Check whether index is out of bound --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18106266 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StochasticGradientBoosting.scala --- @@ -0,0 +1,173 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.tree.DecisionTree +import org.apache.spark.mllib.tree.configuration.Algo.Algo +import org.apache.spark.mllib.tree.configuration.Strategy +import org.apache.spark.mllib.tree.impurity.Impurity +import org.apache.spark.mllib.tree.model.DecisionTreeModel +import org.apache.spark.rdd.{DoubleRDDFunctions, RDD} +import scala.util.Random + +/** + * + * Read about the algorithm Gradient boosting here: + * http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2007/GWD07/geurts-icml2007.pdf + * + * Libraries that implement the algorithm Gradient boosting similar way + * https://code.google.com/p/jforests/ + * https://code.google.com/p/jsgbm/ + * + */ +class StochasticGradientBoosting { + + /** + * Train a Gradient Boosting model given an RDD of (label, features) pairs. + * + * @param input Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. + * @param leaningRate Learning rate + * @param countOfTrees Number of trees. + * @param samplingSizeRatio Size of random sample, percent of ${input} size. + * @param strategy The configuration parameters for the tree algorithm which specify the type + * of algorithm (classification, regression, etc.), feature type (continuous, + * categorical), depth of the tree, quantile calculation strategy, etc. + * @return StochasticGradientBoostingModel that can be used for prediction + */ + def run( + input : RDD[LabeledPoint], + leaningRate : Double, + countOfTrees : Int, + samplingSizeRatio : Double, + strategy: Strategy): StochasticGradientBoostingModel = { + +val featureDimension = input.count() +val mean = new DoubleRDDFunctions(input.map(l = l.label)).mean() +val boostingModel = new StochasticGradientBoostingModel(countOfTrees, mean, leaningRate) + +for (i - 0 to countOfTrees - 1) { --- End diff -- use while instead of for. while is faster in scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18106416 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StochasticGradientBoosting.scala --- @@ -0,0 +1,173 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.tree.DecisionTree +import org.apache.spark.mllib.tree.configuration.Algo.Algo +import org.apache.spark.mllib.tree.configuration.Strategy +import org.apache.spark.mllib.tree.impurity.Impurity +import org.apache.spark.mllib.tree.model.DecisionTreeModel +import org.apache.spark.rdd.{DoubleRDDFunctions, RDD} +import scala.util.Random + +/** + * + * Read about the algorithm Gradient boosting here: + * http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2007/GWD07/geurts-icml2007.pdf + * + * Libraries that implement the algorithm Gradient boosting similar way + * https://code.google.com/p/jforests/ + * https://code.google.com/p/jsgbm/ + * + */ +class StochasticGradientBoosting { + + /** + * Train a Gradient Boosting model given an RDD of (label, features) pairs. + * + * @param input Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. + * @param leaningRate Learning rate + * @param countOfTrees Number of trees. + * @param samplingSizeRatio Size of random sample, percent of ${input} size. + * @param strategy The configuration parameters for the tree algorithm which specify the type + * of algorithm (classification, regression, etc.), feature type (continuous, + * categorical), depth of the tree, quantile calculation strategy, etc. + * @return StochasticGradientBoostingModel that can be used for prediction + */ + def run( + input : RDD[LabeledPoint], + leaningRate : Double, + countOfTrees : Int, + samplingSizeRatio : Double, + strategy: Strategy): StochasticGradientBoostingModel = { + +val featureDimension = input.count() +val mean = new DoubleRDDFunctions(input.map(l = l.label)).mean() +val boostingModel = new StochasticGradientBoostingModel(countOfTrees, mean, leaningRate) + +for (i - 0 to countOfTrees - 1) { + val gradient = input.map(l = l.label - boostingModel.computeValue(l.features)) --- End diff -- @mengxr Would it be better if cache input explicitly as it is used many times inside this function? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18106472 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StochasticGradientBoosting.scala --- @@ -0,0 +1,173 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.tree.DecisionTree +import org.apache.spark.mllib.tree.configuration.Algo.Algo +import org.apache.spark.mllib.tree.configuration.Strategy +import org.apache.spark.mllib.tree.impurity.Impurity +import org.apache.spark.mllib.tree.model.DecisionTreeModel +import org.apache.spark.rdd.{DoubleRDDFunctions, RDD} +import scala.util.Random + +/** + * + * Read about the algorithm Gradient boosting here: + * http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2007/GWD07/geurts-icml2007.pdf + * + * Libraries that implement the algorithm Gradient boosting similar way + * https://code.google.com/p/jforests/ + * https://code.google.com/p/jsgbm/ + * + */ +class StochasticGradientBoosting { + + /** + * Train a Gradient Boosting model given an RDD of (label, features) pairs. + * + * @param input Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. + * @param leaningRate Learning rate + * @param countOfTrees Number of trees. + * @param samplingSizeRatio Size of random sample, percent of ${input} size. + * @param strategy The configuration parameters for the tree algorithm which specify the type + * of algorithm (classification, regression, etc.), feature type (continuous, + * categorical), depth of the tree, quantile calculation strategy, etc. + * @return StochasticGradientBoostingModel that can be used for prediction + */ + def run( + input : RDD[LabeledPoint], + leaningRate : Double, + countOfTrees : Int, + samplingSizeRatio : Double, + strategy: Strategy): StochasticGradientBoostingModel = { + +val featureDimension = input.count() +val mean = new DoubleRDDFunctions(input.map(l = l.label)).mean() +val boostingModel = new StochasticGradientBoostingModel(countOfTrees, mean, leaningRate) + +for (i - 0 to countOfTrees - 1) { + val gradient = input.map(l = l.label - boostingModel.computeValue(l.features)) + + val newInput: RDD[LabeledPoint] = input +.zip(gradient) +.map{case(inputVal, gradientVal) = new LabeledPoint(gradientVal, inputVal.features)} + + val randomSample = newInput.sample( +false, +(samplingSizeRatio * featureDimension).asInstanceOf[Int], --- End diff -- change asInstanceOf[Int] to toInt --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18106585 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StochasticGradientBoosting.scala --- @@ -0,0 +1,173 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.tree.DecisionTree +import org.apache.spark.mllib.tree.configuration.Algo.Algo +import org.apache.spark.mllib.tree.configuration.Strategy +import org.apache.spark.mllib.tree.impurity.Impurity +import org.apache.spark.mllib.tree.model.DecisionTreeModel +import org.apache.spark.rdd.{DoubleRDDFunctions, RDD} +import scala.util.Random + +/** + * + * Read about the algorithm Gradient boosting here: + * http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2007/GWD07/geurts-icml2007.pdf + * + * Libraries that implement the algorithm Gradient boosting similar way + * https://code.google.com/p/jforests/ + * https://code.google.com/p/jsgbm/ + * + */ +class StochasticGradientBoosting { + + /** + * Train a Gradient Boosting model given an RDD of (label, features) pairs. + * + * @param input Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. + * @param leaningRate Learning rate + * @param countOfTrees Number of trees. + * @param samplingSizeRatio Size of random sample, percent of ${input} size. + * @param strategy The configuration parameters for the tree algorithm which specify the type + * of algorithm (classification, regression, etc.), feature type (continuous, + * categorical), depth of the tree, quantile calculation strategy, etc. + * @return StochasticGradientBoostingModel that can be used for prediction + */ + def run( --- End diff -- may be put run method under object StochasticGradientBoosting, the StochasticGradientBoosting class does not have any state in it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18106962 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StochasticGradientBoosting.scala --- @@ -0,0 +1,173 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.tree.DecisionTree +import org.apache.spark.mllib.tree.configuration.Algo.Algo +import org.apache.spark.mllib.tree.configuration.Strategy +import org.apache.spark.mllib.tree.impurity.Impurity +import org.apache.spark.mllib.tree.model.DecisionTreeModel +import org.apache.spark.rdd.{DoubleRDDFunctions, RDD} +import scala.util.Random + +/** + * + * Read about the algorithm Gradient boosting here: + * http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2007/GWD07/geurts-icml2007.pdf + * + * Libraries that implement the algorithm Gradient boosting similar way + * https://code.google.com/p/jforests/ + * https://code.google.com/p/jsgbm/ + * + */ +class StochasticGradientBoosting { + + /** + * Train a Gradient Boosting model given an RDD of (label, features) pairs. + * + * @param input Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. + * @param leaningRate Learning rate + * @param countOfTrees Number of trees. + * @param samplingSizeRatio Size of random sample, percent of ${input} size. + * @param strategy The configuration parameters for the tree algorithm which specify the type + * of algorithm (classification, regression, etc.), feature type (continuous, + * categorical), depth of the tree, quantile calculation strategy, etc. + * @return StochasticGradientBoostingModel that can be used for prediction + */ + def run( + input : RDD[LabeledPoint], + leaningRate : Double, + countOfTrees : Int, + samplingSizeRatio : Double, + strategy: Strategy): StochasticGradientBoostingModel = { + +val featureDimension = input.count() --- End diff -- This is not feature dimension, rather, input.count() is the number of LabeledPoint in your RDD. if you would like to compute feature dimension, please use input.take(1).features.length --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18107064 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StochasticGradientBoosting.scala --- @@ -0,0 +1,173 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.tree.DecisionTree +import org.apache.spark.mllib.tree.configuration.Algo.Algo +import org.apache.spark.mllib.tree.configuration.Strategy +import org.apache.spark.mllib.tree.impurity.Impurity +import org.apache.spark.mllib.tree.model.DecisionTreeModel +import org.apache.spark.rdd.{DoubleRDDFunctions, RDD} +import scala.util.Random + +/** + * + * Read about the algorithm Gradient boosting here: + * http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2007/GWD07/geurts-icml2007.pdf + * + * Libraries that implement the algorithm Gradient boosting similar way + * https://code.google.com/p/jforests/ + * https://code.google.com/p/jsgbm/ + * + */ +class StochasticGradientBoosting { + + /** + * Train a Gradient Boosting model given an RDD of (label, features) pairs. + * + * @param input Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. + * @param leaningRate Learning rate + * @param countOfTrees Number of trees. + * @param samplingSizeRatio Size of random sample, percent of ${input} size. + * @param strategy The configuration parameters for the tree algorithm which specify the type + * of algorithm (classification, regression, etc.), feature type (continuous, + * categorical), depth of the tree, quantile calculation strategy, etc. + * @return StochasticGradientBoostingModel that can be used for prediction + */ + def run( + input : RDD[LabeledPoint], + leaningRate : Double, + countOfTrees : Int, + samplingSizeRatio : Double, + strategy: Strategy): StochasticGradientBoostingModel = { + +val featureDimension = input.count() +val mean = new DoubleRDDFunctions(input.map(l = l.label)).mean() +val boostingModel = new StochasticGradientBoostingModel(countOfTrees, mean, leaningRate) + +for (i - 0 to countOfTrees - 1) { + val gradient = input.map(l = l.label - boostingModel.computeValue(l.features)) + + val newInput: RDD[LabeledPoint] = input +.zip(gradient) +.map{case(inputVal, gradientVal) = new LabeledPoint(gradientVal, inputVal.features)} + + val randomSample = newInput.sample( +false, +(samplingSizeRatio * featureDimension).asInstanceOf[Int], --- End diff -- featureDimension is the number of instance? Probably we need a better name for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18107119 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/regression/StochasticGradientBoostingSuite.scala --- @@ -0,0 +1,44 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.tree.configuration.Algo +import org.apache.spark.mllib.tree.impurity.Variance +import org.apache.spark.mllib.util.{LinearDataGenerator, LocalSparkContext} +import org.apache.spark.rdd.{RDD, DoubleRDDFunctions} --- End diff -- DoubleRDDFuctions is not needed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18107150 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/regression/StochasticGradientBoostingSuite.scala --- @@ -0,0 +1,44 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.tree.configuration.Algo +import org.apache.spark.mllib.tree.impurity.Variance +import org.apache.spark.mllib.util.{LinearDataGenerator, LocalSparkContext} +import org.apache.spark.rdd.{RDD, DoubleRDDFunctions} +import org.apache.spark.util.Utils +import org.scalatest.FunSuite + +class StochasticGradientBoostingSuite extends FunSuite with LocalSparkContext { + + /** + * Test if we can correctly learn on random data + */ + test(stochastic gradient boosting) { +val parsedData = randomLabeledPoints() +val model = StochasticGradientBoosting.train(parsedData, Algo.Regression, Variance, 3) +checkModel(parsedData, model) + } + + test(test serialization) { +val parsedData = randomLabeledPoints() +val model = StochasticGradientBoosting.train(parsedData, Algo.Regression, Variance, 3) +checkModel(parsedData, Utils.deserialize[StochasticGradientBoostingModel](Utils.serialize(model))) + } + + def checkModel(parsedData: RDD[LabeledPoint], model: RegressionModel) { +val valuesAndPredictions = parsedData.map { point = + val prediction = model.predict(point.features) + (point.label, prediction) +} +val actualValues = parsedData.map(l = l.label) +val mean = new DoubleRDDFunctions(actualValues).mean() --- End diff -- use mean() directly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18107249 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/regression/StochasticGradientBoostingSuite.scala --- @@ -0,0 +1,44 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.tree.configuration.Algo +import org.apache.spark.mllib.tree.impurity.Variance +import org.apache.spark.mllib.util.{LinearDataGenerator, LocalSparkContext} +import org.apache.spark.rdd.{RDD, DoubleRDDFunctions} +import org.apache.spark.util.Utils +import org.scalatest.FunSuite + +class StochasticGradientBoostingSuite extends FunSuite with LocalSparkContext { + + /** + * Test if we can correctly learn on random data + */ + test(stochastic gradient boosting) { +val parsedData = randomLabeledPoints() +val model = StochasticGradientBoosting.train(parsedData, Algo.Regression, Variance, 3) +checkModel(parsedData, model) + } + + test(test serialization) { +val parsedData = randomLabeledPoints() +val model = StochasticGradientBoosting.train(parsedData, Algo.Regression, Variance, 3) +checkModel(parsedData, Utils.deserialize[StochasticGradientBoostingModel](Utils.serialize(model))) + } + + def checkModel(parsedData: RDD[LabeledPoint], model: RegressionModel) { +val valuesAndPredictions = parsedData.map { point = + val prediction = model.predict(point.features) + (point.label, prediction) +} +val actualValues = parsedData.map(l = l.label) +val mean = new DoubleRDDFunctions(actualValues).mean() +val meanError = new DoubleRDDFunctions(actualValues.map(i = math.pow(i - mean, 2))).mean() +val MSE = valuesAndPredictions.map { case (v, p) = math.pow(v - p, 2)} +val error = new DoubleRDDFunctions(MSE).mean() --- End diff -- same as above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18107318 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StochasticGradientBoosting.scala --- @@ -0,0 +1,173 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.tree.DecisionTree +import org.apache.spark.mllib.tree.configuration.Algo.Algo +import org.apache.spark.mllib.tree.configuration.Strategy +import org.apache.spark.mllib.tree.impurity.Impurity +import org.apache.spark.mllib.tree.model.DecisionTreeModel +import org.apache.spark.rdd.{DoubleRDDFunctions, RDD} +import scala.util.Random + +/** + * + * Read about the algorithm Gradient boosting here: + * http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2007/GWD07/geurts-icml2007.pdf + * + * Libraries that implement the algorithm Gradient boosting similar way + * https://code.google.com/p/jforests/ + * https://code.google.com/p/jsgbm/ + * + */ +class StochasticGradientBoosting { + + /** + * Train a Gradient Boosting model given an RDD of (label, features) pairs. + * + * @param input Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. + * @param leaningRate Learning rate + * @param countOfTrees Number of trees. + * @param samplingSizeRatio Size of random sample, percent of ${input} size. + * @param strategy The configuration parameters for the tree algorithm which specify the type + * of algorithm (classification, regression, etc.), feature type (continuous, + * categorical), depth of the tree, quantile calculation strategy, etc. + * @return StochasticGradientBoostingModel that can be used for prediction + */ + def run( + input : RDD[LabeledPoint], + leaningRate : Double, + countOfTrees : Int, + samplingSizeRatio : Double, + strategy: Strategy): StochasticGradientBoostingModel = { + +val featureDimension = input.count() +val mean = new DoubleRDDFunctions(input.map(l = l.label)).mean() +val boostingModel = new StochasticGradientBoostingModel(countOfTrees, mean, leaningRate) + +for (i - 0 to countOfTrees - 1) { + val gradient = input.map(l = l.label - boostingModel.computeValue(l.features)) + + val newInput: RDD[LabeledPoint] = input +.zip(gradient) +.map{case(inputVal, gradientVal) = new LabeledPoint(gradientVal, inputVal.features)} + + val randomSample = newInput.sample( +false, +(samplingSizeRatio * featureDimension).asInstanceOf[Int], +Random.nextInt() + ) + + val model = DecisionTree.train(randomSample, strategy) + boostingModel.addTree(model) +} +boostingModel + } +} + +/** + * Model that can be used for prediction. + * + * @param countOfTrees Number of trees. + * @param initValue Initialize model with this value. + * @param learningRate Learning rate. + */ +class StochasticGradientBoostingModel ( +private val countOfTrees: Int, +private var initValue: Double, +private val learningRate: Double) extends Serializable with RegressionModel { + + val trees: Array[DecisionTreeModel] = new Array[DecisionTreeModel](countOfTrees) + var index: Int = 0 --- End diff -- private --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on a diff in the pull request: https://github.com/apache/spark/pull/2394#discussion_r18107306 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StochasticGradientBoosting.scala --- @@ -0,0 +1,173 @@ +package org.apache.spark.mllib.regression + +import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.tree.DecisionTree +import org.apache.spark.mllib.tree.configuration.Algo.Algo +import org.apache.spark.mllib.tree.configuration.Strategy +import org.apache.spark.mllib.tree.impurity.Impurity +import org.apache.spark.mllib.tree.model.DecisionTreeModel +import org.apache.spark.rdd.{DoubleRDDFunctions, RDD} +import scala.util.Random + +/** + * + * Read about the algorithm Gradient boosting here: + * http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2007/GWD07/geurts-icml2007.pdf + * + * Libraries that implement the algorithm Gradient boosting similar way + * https://code.google.com/p/jforests/ + * https://code.google.com/p/jsgbm/ + * + */ +class StochasticGradientBoosting { + + /** + * Train a Gradient Boosting model given an RDD of (label, features) pairs. + * + * @param input Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. + * @param leaningRate Learning rate + * @param countOfTrees Number of trees. + * @param samplingSizeRatio Size of random sample, percent of ${input} size. + * @param strategy The configuration parameters for the tree algorithm which specify the type + * of algorithm (classification, regression, etc.), feature type (continuous, + * categorical), depth of the tree, quantile calculation strategy, etc. + * @return StochasticGradientBoostingModel that can be used for prediction + */ + def run( + input : RDD[LabeledPoint], + leaningRate : Double, + countOfTrees : Int, + samplingSizeRatio : Double, + strategy: Strategy): StochasticGradientBoostingModel = { + +val featureDimension = input.count() +val mean = new DoubleRDDFunctions(input.map(l = l.label)).mean() +val boostingModel = new StochasticGradientBoostingModel(countOfTrees, mean, leaningRate) + +for (i - 0 to countOfTrees - 1) { + val gradient = input.map(l = l.label - boostingModel.computeValue(l.features)) + + val newInput: RDD[LabeledPoint] = input +.zip(gradient) +.map{case(inputVal, gradientVal) = new LabeledPoint(gradientVal, inputVal.features)} + + val randomSample = newInput.sample( +false, +(samplingSizeRatio * featureDimension).asInstanceOf[Int], +Random.nextInt() + ) + + val model = DecisionTree.train(randomSample, strategy) + boostingModel.addTree(model) +} +boostingModel + } +} + +/** + * Model that can be used for prediction. + * + * @param countOfTrees Number of trees. + * @param initValue Initialize model with this value. + * @param learningRate Learning rate. + */ +class StochasticGradientBoostingModel ( +private val countOfTrees: Int, +private var initValue: Double, +private val learningRate: Double) extends Serializable with RegressionModel { + + val trees: Array[DecisionTreeModel] = new Array[DecisionTreeModel](countOfTrees) --- End diff -- private --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user Ishiihara commented on the pull request: https://github.com/apache/spark/pull/2394#issuecomment-57006487 @mengxr @epahomov Added some comments after quickly going through the code. Will do a deeper looking at the algorithm later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
GitHub user epahomov opened a pull request: https://github.com/apache/spark/pull/2394 [Spark-3525] Adding gradient boosting You can merge this pull request into a Git repository by running: $ git pull https://github.com/epahomov/spark SPARK-3525 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2394.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2394 commit d0dfb7b632715c60ef78964ea4d20aaa7712d2e2 Author: olgaoskina olgaosk...@yandex-team.ru Date: 2014-09-04T06:51:45Z Added stochastic gradient boosting algorithm commit 11c247a72e1681661cef4314fec5d1b4283b087f Author: olgaoskina olgaosk...@yandex-team.ru Date: 2014-09-04T06:52:05Z Added stochastic gradient boosting algorithm commit fdfc88e046a29202058b8f45168d624ed91f6d16 Author: olgaoskina olgaosk...@yandex-team.ru Date: 2014-09-05T12:25:41Z Code refactor commit b91b372c951db8bd1be6bd4d2308bc509bc1b44f Author: olgaoskina olgaosk...@yandex-team.ru Date: 2014-09-06T09:02:51Z Added test 'StochasticGradientBoostingSuite' commit 223f0907b6accaa0bf08c7948b2e6c1d728dab18 Author: olgaoskina olgaosk...@yandex-team.ru Date: 2014-09-10T08:08:30Z Added new test commit da13706bd8101ec8a2b648ce6ddc9777516e121f Author: olgaoskina olgaosk...@yandex-team.ru Date: 2014-09-14T15:33:52Z Refactor tests commit eafa0b75785b2ac570ddbc26a80b08b328f7b29c Author: Egor Pakhomov pahomov.e...@gmail.com Date: 2014-09-15T07:42:53Z Merge branch 'gradient_boosting' of https://github.com/olgaoskina/spark into olgaoskina-gradient_boosting commit 3c56f4ef65fb0df80804b0f4b9436f0623582be7 Author: Egor Pakhomov pahomov.e...@gmail.com Date: 2014-09-15T08:46:43Z Merge branch 'olgaoskina-gradient_boosting' into SPARK-3525 commit ce1934a329783629a12f615cbeac3d7e1a05a791 Author: Egor Pakhomov pahomov.e...@gmail.com Date: 2014-09-15T08:32:48Z [SPARK-3525] Fixing GradientBoostingSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2394#issuecomment-55565637 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org