[GitHub] spark pull request: [SPARK-10064] [ML] Parallelize decision tree b...

2015-09-10 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/8246#issuecomment-139331864 @NathanHowell @jkbradley We should consider making bins per feature and sample sizes configurable to avoid the side-effects mentioned above. Did

[GitHub] spark pull request: [SPARK-10064] [ML] Parallelize decision tree b...

2015-08-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/8246#discussion_r38249715 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -1056,6 +988,70 @@ object DecisionTree extends Serializable

[GitHub] spark pull request: [SPARK-10064] [ML] Parallelize decision tree b...

2015-08-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/8246#discussion_r38249912 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -1056,6 +988,70 @@ object DecisionTree extends Serializable

[GitHub] spark pull request: [SPARK-10064] [ML] Parallelize decision tree b...

2015-08-28 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/8246#issuecomment-135866073 Thanks @NathanHowell Sorry for not responding earlier. Will try to review soon. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-8924] [MLLIB, DOCUMENTATION] Added @sin...

2015-08-17 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/7380#issuecomment-131967469 @mengxr Sorry, did not get a chance to review this so far. Will try to do it today. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-8924] [MLLIB, DOCUMENTATION] Added @sin...

2015-07-17 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/7380#issuecomment-122480109 @mengxr Sure. I can take a look. How do we handle API modifications? Any change moves the tag to the newest version? --- If your project is set up

[GitHub] spark pull request: [SPARK-7131] [ml] Copy Decision Tree, Random F...

2015-07-15 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/7294#issuecomment-121840575 @jkbradley It looks good to me. It might be a good idea to run the spark.mllib and spark.ml models on a couple of datasets to ensure there are no regressions

[GitHub] spark pull request: [SPARK-6113] [mllib] Stabilize DecisionTree an...

2015-03-27 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/5009#issuecomment-87079012 @jkbradley Apologies for not reviewing earlier. I hope to make one pass over the weekend. I have one quick question -- what's the rationale for abbreviating

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-17 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-74733642 Thanks @MechCoder @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-17 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-74722368 @MechCoder Sorry I didn't see the message earlier. I am sure @jkbradley must have done a thorough review but please let me know if you need me to take a look

[GitHub] spark pull request: [SPARK-4580] [SPARK-4610] [mllib] Documentatio...

2014-12-01 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3461#discussion_r21133672 --- Diff: docs/mllib-decision-tree.md --- @@ -103,36 +106,73 @@ and the resulting `$M-1$` split candidates are considered. ### Stopping rule

[GitHub] spark pull request: [SPARK-4580] [SPARK-4610] [mllib] Documentatio...

2014-12-01 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3461#issuecomment-65167028 @jkbradley The GBDT sections looks good to me but the subsection on Comparison with RFs could possibly be moved towards the end. It breaks the flow in my opinion

[GitHub] spark pull request: [SPARK-4580] [SPARK-4610] [mllib] Documentatio...

2014-11-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3461#discussion_r21067775 --- Diff: docs/mllib-decision-tree.md --- @@ -103,36 +106,73 @@ and the resulting `$M-1$` split candidates are considered. ### Stopping rule

[GitHub] spark pull request: [SPARK-4580] [SPARK-4610] [mllib] Documentatio...

2014-11-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3461#discussion_r21067802 --- Diff: docs/mllib-decision-tree.md --- @@ -103,36 +106,73 @@ and the resulting `$M-1$` split candidates are considered. ### Stopping rule

[GitHub] spark pull request: [SPARK-4580] [SPARK-4610] [mllib] Documentatio...

2014-11-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3461#discussion_r21067657 --- Diff: docs/mllib-decision-tree.md --- @@ -103,36 +106,73 @@ and the resulting `$M-1$` split candidates are considered. ### Stopping rule

[GitHub] spark pull request: [SPARK-4580] [SPARK-4610] [mllib] Documentatio...

2014-11-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3461#discussion_r21067826 --- Diff: docs/mllib-decision-tree.md --- @@ -103,36 +106,73 @@ and the resulting `$M-1$` split candidates are considered. ### Stopping rule

[GitHub] spark pull request: [SPARK-4580] [SPARK-4610] [mllib] Documentatio...

2014-11-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3461#discussion_r21067860 --- Diff: docs/mllib-decision-tree.md --- @@ -103,36 +106,73 @@ and the resulting `$M-1$` split candidates are considered. ### Stopping rule

[GitHub] spark pull request: [SPARK-4580] [SPARK-4610] [mllib] Documentatio...

2014-11-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3461#discussion_r21067979 --- Diff: docs/mllib-decision-tree.md --- @@ -103,36 +106,73 @@ and the resulting `$M-1$` split candidates are considered. ### Stopping rule

[GitHub] spark pull request: [SPARK-4580] [SPARK-4610] [mllib] Documentatio...

2014-11-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3461#discussion_r21068084 --- Diff: docs/mllib-gbt.md --- @@ -0,0 +1,308 @@ +--- +layout: global +title: Gradient-Boosted Trees - MLlib +displayTitle: a href=mllib

[GitHub] spark pull request: [SPARK-4580] [SPARK-4610] [mllib] Documentatio...

2014-11-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3461#discussion_r21068117 --- Diff: docs/mllib-gbt.md --- @@ -0,0 +1,308 @@ +--- +layout: global +title: Gradient-Boosted Trees - MLlib +displayTitle: a href=mllib

[GitHub] spark pull request: [SPARK-4580] [SPARK-4610] [mllib] Documentatio...

2014-11-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3461#discussion_r21068173 --- Diff: docs/mllib-gbt.md --- @@ -0,0 +1,308 @@ +--- +layout: global +title: Gradient-Boosted Trees - MLlib +displayTitle: a href=mllib

[GitHub] spark pull request: [SPARK-4583] [mllib] LogLoss for GradientBoost...

2014-11-25 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3439#issuecomment-64497207 @jkbradley I am trying to find my reference for the LogLoss calculations. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-4583] [mllib] LogLoss for GradientBoost...

2014-11-25 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3439#issuecomment-64498949 @jkbradley LGTM. Thanks for the documentation too -- it is really helpful. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-4439] [MLlib] add python api for random...

2014-11-20 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3320#issuecomment-63900346 Thanks a lot @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4439] [MLlib] add python api for random...

2014-11-19 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3320#discussion_r20619737 --- Diff: python/pyspark/mllib/tree.py --- @@ -181,8 +180,191 @@ def trainRegressor(data, categoricalFeaturesInfo, model.predict(rdd).collect

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-19 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3374#issuecomment-63744101 Will we have to rename ```GradientBoostedTrees``` back to ```GradientBoosting``` when we add generic weak learner support? I think we should not modify the name

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-19 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3374#discussion_r20621257 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala --- @@ -45,146 +43,92 @@ import

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-19 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3374#issuecomment-63744889 @mengxr The plan to move to mllib.ensemble namespace with a new class sounds good to me. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-19 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3374#issuecomment-63746922 Should the```trainClassifier``` and ``trainRegressor`` methods from ```DecisionTree``` and ```RandomForest``` classes also be the deprecated? --- If your project

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-19 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3374#discussion_r20622307 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala --- @@ -45,146 +43,92 @@ import

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-19 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3374#discussion_r20622629 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala --- @@ -45,146 +43,92 @@ import

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-19 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3374#discussion_r20622816 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/TreeEnsembleModel.scala --- @@ -0,0 +1,182 @@ +/* --- End diff

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-19 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3374#discussion_r20623463 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/TreeEnsembleModel.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-19 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3374#discussion_r20623750 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/GradientBoostedTreesSuite.scala --- @@ -23,104 +23,95 @@ import

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-19 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3374#discussion_r20624623 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/GradientBoostedTreesSuite.scala --- @@ -23,104 +23,95 @@ import

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-19 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3374#issuecomment-63763093 Completed my pass. LGTM! :+1: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-19 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3374#discussion_r20628996 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala --- @@ -45,146 +43,92 @@ import

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-19 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3374#discussion_r20629031 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala --- @@ -40,151 +39,98 @@ import

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-17 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-63375342 @avulanov Thanks for conducting the experiments. Could you plot graphs for the experiments that you conducted with changing number of features and number of machines

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-17 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-63375573 I found this reference recently about Netflix's distributed implementation of neural nets that could be relevant for MLlib. http://techblog.netflix.com/2014/02

[GitHub] spark pull request: [SPARK-4439] [MLlib] add python api for random...

2014-11-17 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3320#discussion_r20479768 --- Diff: python/pyspark/mllib/tree.py --- @@ -181,8 +180,191 @@ def trainRegressor(data, categoricalFeaturesInfo, model.predict(rdd).collect

[GitHub] spark pull request: [SPARK-4439] [MLlib] add python api for random...

2014-11-17 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3320#discussion_r20481177 --- Diff: python/pyspark/mllib/tree.py --- @@ -181,8 +180,191 @@ def trainRegressor(data, categoricalFeaturesInfo, model.predict(rdd).collect

[GitHub] spark pull request: [MLLIB] SPARK-4347: Reducing GradientBoostingS...

2014-11-11 Thread manishamde
GitHub user manishamde opened a pull request: https://github.com/apache/spark/pull/3214 [MLLIB] SPARK-4347: Reducing GradientBoostingSuite run time.Before: [info] GradientBoostingSuite: [info] - Regression with continuous features: SquaredError (22 seconds, 115 milliseconds) [info

[GitHub] spark pull request: [SPARK-4197] [mllib] GradientBoosting API clea...

2014-11-05 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3094#issuecomment-61849672 @codedeft Thanks for creating the JIRA and informing us. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [WIP][SPARK-3530][MLLIB] pipeline and paramete...

2014-11-05 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3099#issuecomment-61916011 I have a few comments based upon the API: 1. Like @jkbradley, I prefer ```lr.setMaxIter(50)``` over ```lr.set(lr.maxIter, 50)```. Also, prefer to avoid

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-05 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/1290#discussion_r19922521 --- Diff: docs/mllib-ann.md --- @@ -0,0 +1,223 @@ +--- +layout: global +title: Artificial Neural Networks - MLlib +displayTitle: a href

[GitHub] spark pull request: [mllib] GradientBoosting API cleanup and examp...

2014-11-04 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3094#issuecomment-61721420 @jkbradley Thanks! I will take a look and get back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [mllib] GradientBoosting API cleanup and examp...

2014-11-04 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3094#issuecomment-61756916 @codedeft Not yet. I was planning to but forgot to do so. Feel free to create one or I can create it if you prefer. You are correct. We need to add (possibly

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-11-03 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-61494531 @0asa Yes. PRs for these will be great. Could you check if there are already existing JIRA for these -- if not, you could create a JIRA tickets. Also, please

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-11-03 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-61549273 @0asa Thanks. Looks good. Let's move the conversation to the JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [FIX][MLLIB] fix seed in BaggedPointSuite

2014-11-03 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3084#issuecomment-61577226 @mengxr Sorry, it's my fault. It looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/1290#discussion_r19717255 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala --- @@ -0,0 +1,528 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-61433892 @bgreeven I haven't studied the implementation details yet but I had a question about the API. I realize that RDD[(Vector, Vector)] is a more general structure

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/1290#discussion_r19717451 --- Diff: docs/mllib-ann.md --- @@ -0,0 +1,223 @@ +--- +layout: global +title: Artificial Neural Networks - MLlib +displayTitle: a href

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/1290#discussion_r19717692 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala --- @@ -0,0 +1,528 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/1290#discussion_r19717699 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala --- @@ -0,0 +1,528 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-61436212 @bgreeven Another general suggestion: consider adding logging to the code. It goes a long way in debugging errors and get statuses on long running job. Check

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/1290#discussion_r19717937 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala --- @@ -0,0 +1,528 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/1290#discussion_r19717985 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala --- @@ -0,0 +1,528 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/1290#discussion_r19720701 --- Diff: docs/mllib-ann.md --- @@ -0,0 +1,223 @@ +--- +layout: global +title: Artificial Neural Networks - MLlib +displayTitle: a href

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-11-01 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-61361549 @tgaloppo Thanks for the PR and congratulations on the first contribution. Apologies for the lack of feedback thus far -- I guess everyone is busy with the 1.2

[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...

2014-11-01 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-61375098 @codeleft I am so sorry. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-4081] [mllib] DatasetIndexer

2014-10-31 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3000#issuecomment-61335029 Cool. I will make another pass shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [MLLIB] SPARK-1547: Add Gradient Boosting to M...

2014-10-31 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61345885 @mengxr Could we get this merged? :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-4081] [mllib] DatasetIndexer

2014-10-31 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3000#issuecomment-61351601 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...

2014-10-31 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-61359008 @codeleft I agree that local training should be a high priority. Just curious -- what's the depth of the tree in the failing case? I vote for merging this PR

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-30 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61069736 @jkbradley @codedeft I think I have implemented all the suggestions on the PR except for 1) public API and 2) warning when using non SquaredError loss functions. I

[GitHub] spark pull request: [MLLIB] SPARK-1547: Add Gradient Boosting to M...

2014-10-30 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61155798 @jkbradley I agree with protection against driver failure for long sequential operations. However, in this case we will just be checkpointing partial models rather

[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...

2014-10-30 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-61156969 @codedeft @jkbradley I have not followed the discussion very closely (apologies!) but at the high level, could we add local training support along with this PR

[GitHub] spark pull request: [SPARK-4081] [mllib] DatasetIndexer

2014-10-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3000#discussion_r19633564 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/DatasetIndexer.scala --- @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4081] [mllib] DatasetIndexer

2014-10-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3000#discussion_r19634083 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/DatasetIndexer.scala --- @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4081] [mllib] DatasetIndexer

2014-10-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3000#discussion_r19634947 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/DatasetIndexer.scala --- @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4081] [mllib] DatasetIndexer

2014-10-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3000#discussion_r19635024 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/DatasetIndexer.scala --- @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4081] [mllib] DatasetIndexer

2014-10-30 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3000#issuecomment-61166762 How about the transformation for labels? This will help with transformations for classification especially from +1/-1 to 0/1 labeling for binary classification

[GitHub] spark pull request: [SPARK-4081] [mllib] DatasetIndexer

2014-10-30 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/3000#issuecomment-61168991 Agree. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-4081] [mllib] DatasetIndexer

2014-10-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3000#discussion_r19636313 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/DatasetIndexer.scala --- @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Add Gradient Boosting to M...

2014-10-30 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61174268 @jkbradley Thanks for the confirmation! I will now proceed to finish the rest of the tasks -- should be straightforward. --- If your project is set up for it, you

[GitHub] spark pull request: [MLLIB] SPARK-1547: Add Gradient Boosting to M...

2014-10-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19638832 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala --- @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Add Gradient Boosting to M...

2014-10-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19638821 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala --- @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Add Gradient Boosting to M...

2014-10-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19638835 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/EnsembleTestHelper.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Add Gradient Boosting to M...

2014-10-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19639714 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,412 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Add Gradient Boosting to M...

2014-10-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19645493 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,412 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4081] [mllib] DatasetIndexer

2014-10-30 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/3000#discussion_r19645862 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/DatasetIndexer.scala --- @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Add Gradient Boosting to M...

2014-10-30 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61221341 @jkbradley I cleaned up the public API based on our discussion. Going with a nested structure where we have to specify the weak learner parameters separately

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19569087 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19569553 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -26,7 +26,7 @@ import

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19570259 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19570610 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -26,7 +26,7 @@ import

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61024859 @jkbradley I originally used checkpointing instead of simply caching in memory. There are trade-offs going with one versus the other. I will study what @codedeft

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60814290 @jkbradley Your API suggestions sound reasonable. Let me work on simplifying the API. I had originally started with something similar to what you suggested so I

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496069 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/SquaredError.scala --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496095 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala --- @@ -46,20 +47,63 @@ private[tree] object BaggedPoint

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496113 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala --- @@ -70,7 +71,8 @@ class Strategy ( val

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496210 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/EnsembleCombiningStrategy.scala --- @@ -0,0 +1,30 @@ +/* + * Licensed

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496224 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/impl/BaggedPointSuite.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496241 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/impl/BaggedPointSuite.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496253 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/GradientBoostingSuite.scala --- @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496266 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala --- @@ -0,0 +1,177 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496447 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala --- @@ -0,0 +1,177 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496561 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation

  1   2   3   >