[GitHub] spark pull request #17407: [SPARK-20043][ML][WIP] DecisionTreeModel can't re...

2017-03-23 Thread facaiy
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/17407 [SPARK-20043][ML][WIP] DecisionTreeModel can't recongnize Impurity "Gini" when loading Fix bug: DecisionTreeModel can't recongnize Impurity "Gini" when loading TODO

[GitHub] spark issue #17407: [SPARK-20043][ML] DecisionTreeModel: ImpurityCalculator ...

2017-03-26 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17407 @jkbradley Hi, tests passed. Is it good enough to be merged? By the way, String Params are fragile when saving/loading model, as setParam and getParam methods are useless in such case

[GitHub] spark issue #17407: [SPARK-20043][ML] DecisionTreeModel can't recongnize Imp...

2017-03-25 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17407 @jkbradley Thanks. I agree with your advice. Modifying the value is a little aggressive, while changing ImpurityCalculator.getCalculator is more moderate. However, I'm afraid that the similar bugs

[GitHub] spark pull request #17383: [SPARK-3165][MLlib][WIP] DecisionTree does not us...

2017-03-27 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17383#discussion_r108318833 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala --- @@ -301,7 +302,7 @@ private[tree] class LearningNode( * group of nodes

[GitHub] spark pull request #17383: [SPARK-3165][MLlib][WIP] DecisionTree does not us...

2017-03-27 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17383#discussion_r108318997 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala --- @@ -301,7 +302,7 @@ private[tree] class LearningNode( * group of nodes

[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel: ImpurityCalc...

2017-03-27 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17407#discussion_r108320537 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -385,6 +385,22 @@ class

[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel: ImpurityCalc...

2017-03-27 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17407#discussion_r108320481 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala --- @@ -178,6 +178,22 @@ class DecisionTreeRegressorSuite

[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel: ImpurityCalc...

2017-03-27 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17407#discussion_r108320525 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -385,6 +385,22 @@ class

[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel can't recongn...

2017-03-25 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17407#discussion_r108050101 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -385,6 +385,20 @@ class

[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel can't recongn...

2017-03-25 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17407#discussion_r108050103 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -385,6 +385,20 @@ class

[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel can't recongn...

2017-03-25 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17407#discussion_r108050099 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -385,6 +385,20 @@ class

[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel can't recongn...

2017-03-25 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17407#discussion_r108050110 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -385,6 +385,20 @@ class

[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel can't recongn...

2017-03-25 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17407#discussion_r108050269 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -385,6 +385,22 @@ class

[GitHub] spark pull request #14547: [SPARK-16718][MLlib] gbm-style treeboost

2017-03-20 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/14547#discussion_r107077783 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impurity/ApproxBernoulliImpurity.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #17383: [SPARK-3165][MLlib][WIP] DecisionTree does not us...

2017-03-22 Thread facaiy
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/17383 [SPARK-3165][MLlib][WIP] DecisionTree does not use sparsity in data ## What changes were proposed in this pull request? DecisionTree should take advantage of sparse feature vectors

[GitHub] spark issue #17503: [SPARK-3159][MLlib] Check for reducible DecisionTree

2017-04-05 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17503 @jkbradley @hhbyyh Could you review the PR? thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17503: [SPARK-3159][MLlib] Check for reducible DecisionT...

2017-04-01 Thread facaiy
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/17503 [SPARK-3159][MLlib] Check for reducible DecisionTree add canMergeChildren param: find the pairs of leave of the same parent which output the same prediction, and merge them. ## How

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-07 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r110385244 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1009,10 +1009,24 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-07 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r110385132 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1009,10 +1009,24 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-07 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r110385526 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1009,10 +1009,24 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-07 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r110386358 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -996,7 +996,7 @@ private[spark] object RandomForest extends Logging

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-10 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 is there something wrong with spark CI? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-10 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 ``` Test Result (1 failure / +1) org.apache.spark.storage.TopologyAwareBlockReplicationPolicyBehavior.Peers in 2 racks ``` Does anyone know what is this? --- If your

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-10 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 @srowen Hi, I forget unit tests in python and R. Where can I find document about creating develop environment? thanks. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-12 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 I have ran all unit test case of MLlib in Python. However, I am not familiar with R, and I don't want waste too many time on deploying R's environment. Could CI retest the pr? We can

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-14 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r111656240 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -126,9 +138,10 @@ class RandomForestSuite extends SparkFunSuite

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-14 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r111656235 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -112,9 +124,9 @@ class RandomForestSuite extends SparkFunSuite

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-14 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r111656245 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -104,6 +104,18 @@ class RandomForestSuite extends SparkFunSuite

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-14 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 @sethah Perhaps it's hard to compare R with Spark's behavior, since many factors involved. I'd like to read R GBM's code, and verify consistency of both side's design on split criteria. Is it OK

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-13 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 many thanks, @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-23 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 I scanned split critical of sklearn and xgboost. 1. sklearn count all continuous values and split at mean value. commit 5147fd09c6a063188efde444f47bd006fa5f95f0

[GitHub] spark issue #17503: [SPARK-3159][MLlib] Check for reducible DecisionTree

2017-04-24 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17503 @srowen Hi, could you review the PR? The PR is simple, though many code for unit test are added. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-22 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 Hi, I has checked R GBM's code and found that: R's gbm uses mean value $(x + y) / 2$, not weighted mean $(c_x * x + c_y * y) / (c_x + c_y)$ described in [JIRA SPARK-16957](https

[GitHub] spark pull request #14547: [SPARK-16718][MLlib] gbm-style treeboost

2017-03-12 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/14547#discussion_r105576961 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impurity/ApproxBernoulliImpurity.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14547: [SPARK-16718][MLlib] gbm-style treeboost

2017-03-13 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/14547#discussion_r105814881 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impurity/ApproxBernoulliImpurity.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-06 Thread facaiy
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/17556 [SPARK-16957][MLlib] Use weighted midpoints for split values. ## What changes were proposed in this pull request? Use weighted midpoints for split values. ## How was this patch

[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...

2017-07-31 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18764 Thanks, @yanboliang . Could you give a hand, @srowen ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...

2017-08-03 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18764 Test failures in pyspark.ml.tests with python2.6, but I don't have the environment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...

2017-08-04 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18764 @yanboliang Thanks, yanbo. I am not familar with python 2.6, which is too outdated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...

2017-08-04 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18764 Test failures in pyspark.ml.tests with python2.6, but I don't have the environment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18736: [SPARK-21481][ML] Add indexOf method for ml.feature.Hash...

2017-08-15 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18736 Sure, @yanboliang . Thanks for your suggestion. I'll work on it later, perhaps next week. Is it OK? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #18736: [SPARK-21481][ML] Add indexOf method for ml.featu...

2017-08-10 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18736#discussion_r132618802 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala --- @@ -80,20 +82,31 @@ class HashingTF @Since("1.4.0") (@Si

[GitHub] spark issue #18736: [SPARK-21481][ML] Add indexOf method for ml.feature.Hash...

2017-08-13 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18736 @yanboliang Hi, yangbo. Could you help review the PR? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18554: [SPARK-21306][ML] OneVsRest should cache weightCo...

2017-07-11 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18554#discussion_r126863072 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -317,7 +318,12 @@ final class OneVsRest @Since("

[GitHub] spark pull request #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucket...

2017-07-11 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18582#discussion_r126646511 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -36,7 +36,8 @@ import org.apache.spark.util.collection.OpenHashMap

[GitHub] spark pull request #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucket...

2017-07-11 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18582#discussion_r126645714 --- Diff: python/pyspark/ml/feature.py --- @@ -3058,26 +3035,37 @@ class RFormula(JavaEstimator, HasFeaturesCol, HasLabelCol, JavaMLReadable, JavaM

[GitHub] spark pull request #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucket...

2017-07-11 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18582#discussion_r126643882 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -460,16 +460,16 @@ object LinearRegression extends

[GitHub] spark pull request #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucket...

2017-07-11 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18582#discussion_r126642928 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -36,7 +36,8 @@ import org.apache.spark.sql.types.{DoubleType

[GitHub] spark issue #18554: [SPARK-21306][ML] OneVsRest should cache weightCol if ne...

2017-07-11 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18554 @srowen @yanboliang Could you help review the PR? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat...

2017-07-06 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18556#discussion_r126026388 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala --- @@ -89,18 +93,17 @@ private[libsvm] class LibSVMFileFormat extends

[GitHub] spark issue #18554: [SPARK-21306][ML] OneVsRest should cache weightCol if ne...

2017-07-06 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18554 @lins05 thanks, reasonable suggestion, I will fix it later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat...

2017-07-06 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18556#discussion_r126023986 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala --- @@ -89,18 +93,17 @@ private[libsvm] class LibSVMFileFormat extends

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-07-17 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r127873828 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -598,8 +598,23 @@ class LogisticRegression @Since("

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-07-17 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r127874833 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala --- @@ -32,40 +34,45 @@ private[ml] trait

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-07-18 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r127972263 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -598,8 +598,23 @@ class LogisticRegression @Since("

[GitHub] spark pull request #18554: [SPARK-21306][ML] OneVsRest should support setWei...

2017-07-18 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18554#discussion_r128158473 --- Diff: python/pyspark/ml/tests.py --- @@ -1255,6 +1255,17 @@ def test_output_columns(self): output = model.transform(df

[GitHub] spark pull request #18554: [SPARK-21306][ML] OneVsRest should support setWei...

2017-07-26 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18554#discussion_r129562237 --- Diff: python/pyspark/ml/tests.py --- @@ -1255,6 +1255,24 @@ def test_output_columns(self): output = model.transform(df

[GitHub] spark pull request #18554: [SPARK-21306][ML] OneVsRest should support setWei...

2017-07-26 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18554#discussion_r129562189 --- Diff: python/pyspark/ml/classification.py --- @@ -1517,20 +1517,22 @@ class OneVsRest(Estimator, OneVsRestParams, MLReadable, MLWritable

[GitHub] spark issue #18554: [SPARK-21306][ML] OneVsRest should support setWeightCol

2017-07-26 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18554 ping @holdenk @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18736: [SPARK-21481][ML] Add indexOf method for ml.featu...

2017-07-26 Thread facaiy
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/18736 [SPARK-21481][ML] Add indexOf method for ml.feature.HashingTF ## What changes were proposed in this pull request? Add indexOf method for ml.feature.HashingTF. The PR is a hotfix

[GitHub] spark pull request #18523: [SPARK-21285][ML] VectorAssembler reports the col...

2017-07-03 Thread facaiy
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/18523 [SPARK-21285][ML] VectorAssembler reports the column name of unsupported data type ## What changes were proposed in this pull request? add the column name in the exception which is raised

[GitHub] spark issue #17503: [SPARK-3159][MLlib] Check for reducible DecisionTree

2017-07-04 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17503 @jkbradley May you have time reviewing the pr? I believe that it will be a little improvement for predict. Thanks. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #18523: [SPARK-21285][ML] VectorAssembler reports the col...

2017-07-04 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18523#discussion_r125398010 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -113,12 +113,12 @@ class VectorAssembler @Since("1.4.0"

[GitHub] spark issue #18523: [SPARK-21285][ML] VectorAssembler reports the column nam...

2017-07-05 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18523 I don't know how to write an unit test for the pr? Is it necessary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #17383: [SPARK-3165][MLlib][WIP] DecisionTree does not us...

2017-07-04 Thread facaiy
Github user facaiy closed the pull request at: https://github.com/apache/spark/pull/17383 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #18523: [SPARK-21285][ML] VectorAssembler reports the column nam...

2017-07-05 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18523 Good idea! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17383: [SPARK-3165][MLlib][WIP] DecisionTree does not us...

2017-07-05 Thread facaiy
GitHub user facaiy reopened a pull request: https://github.com/apache/spark/pull/17383 [SPARK-3165][MLlib][WIP] DecisionTree does not use sparsity in data ## What changes were proposed in this pull request? DecisionTree should take advantage of sparse feature vectors

[GitHub] spark pull request #18523: [SPARK-21285][ML] VectorAssembler reports the col...

2017-07-05 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18523#discussion_r125584572 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -113,12 +113,12 @@ class VectorAssembler @Since("1.4.0"

[GitHub] spark pull request #18554: [SPARK-21306][ML] OneVsRest should cache weightCo...

2017-07-06 Thread facaiy
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/18554 [SPARK-21306][ML] OneVsRest should cache weightCol if necessary ## What changes were proposed in this pull request? cache weightCol if classifier inherits HasWeightCol trait

[GitHub] spark pull request #18523: [SPARK-21285][ML] VectorAssembler reports the col...

2017-07-06 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18523#discussion_r125860650 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -113,12 +113,15 @@ class VectorAssembler @Since("1.4.0"

[GitHub] spark issue #18523: [SPARK-21285][ML] VectorAssembler reports the column nam...

2017-07-05 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18523 @SparkQA Jenkins, run tests again, please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18523: [SPARK-21285][ML] VectorAssembler reports the column nam...

2017-07-06 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18523 @SparkQA test again, please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18554: [SPARK-21306][ML] OneVsRest should cache weightCol if ne...

2017-07-06 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18554 I'm not familiar with R, and use grep to search "OneVsRest" and get nothing. Hence it seems that nothing is needed to do with R part. --- If your project is set up for it, you

[GitHub] spark pull request #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat...

2017-07-06 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18556#discussion_r126050849 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala --- @@ -89,18 +93,17 @@ private[libsvm] class LibSVMFileFormat extends

[GitHub] spark pull request #18523: [SPARK-21285][ML] VectorAssembler reports the col...

2017-07-05 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18523#discussion_r125763918 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -113,12 +113,15 @@ class VectorAssembler @Since("1.4.0"

[GitHub] spark pull request #18523: [SPARK-21285][ML] VectorAssembler reports the col...

2017-07-04 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18523#discussion_r125539040 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -113,12 +113,12 @@ class VectorAssembler @Since("1.4.0"

[GitHub] spark issue #17503: [SPARK-3159][MLlib] Check for reducible DecisionTree

2017-04-25 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17503 @srowen I am not sure whether I understand your question clearly. RandomForest uses LearningNode to construct tree model when training, and convert them to Leaf or InternalNode at last. Hence, all

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-25 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 fix failed case, please retest it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17503: [SPARK-3159][MLlib] Check for reducible DecisionT...

2017-04-25 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17503#discussion_r113360409 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala --- @@ -61,6 +61,8 @@ import org.apache.spark.mllib.tree.impurity

[GitHub] spark issue #17503: [SPARK-3159][MLlib] Check for reducible DecisionTree

2017-04-27 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17503 I have the same question with you. I guess that Impurity info is useful to debug and analysis tree model. However, as tree is grown from root to leaf when training, hence it seems needless to merge

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-28 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r114043563 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1009,10 +1009,24 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-28 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r114043568 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -138,9 +169,10 @@ class RandomForestSuite extends SparkFunSuite

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-28 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r114043558 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1037,7 +1051,10 @@ private[spark] object RandomForest extends

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-28 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 For a (train) sample of continuous series, say {x0, x1, x2, x3, ..., x100}. Now spark select quantile as split point. Suppose 10-quantiles is used, and x2 is 1st quantile, and x10 is 2nd

[GitHub] spark pull request #17556: [SPARK-16957][MLlib] Use weighted midpoints for s...

2017-04-28 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17556#discussion_r114043439 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -112,9 +138,11 @@ class RandomForestSuite extends SparkFunSuite

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-28 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 By the way, it's safe to use mean value as it is match the other libraries. If requested, I'd like to modify the PR. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-30 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 OK, weight has been removed when calculating. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest shoul...

2017-08-05 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18764#discussion_r131529693 --- Diff: python/pyspark/ml/classification.py --- @@ -1344,7 +1346,19 @@ def _fit(self, dataset): numClasses = int(dataset.agg({labelCol

[GitHub] spark pull request #18763: [SPARK-21306][ML] For branch 2.1, OneVsRest shoul...

2017-08-05 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18763#discussion_r131529768 --- Diff: python/pyspark/ml/classification.py --- @@ -1423,7 +1425,18 @@ def _fit(self, dataset): numClasses = int(dataset.agg({labelCol

[GitHub] spark pull request #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest shoul...

2017-07-28 Thread facaiy
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/18764 [SPARK-21306][ML] For branch 2.0, OneVsRest should support setWeightCol The PR is related to #18554, and is modified for branch 2.0. ## What changes were proposed in this pull request

[GitHub] spark pull request #18763: [SPARK-21306][ML] OneVsRest should support setWei...

2017-07-28 Thread facaiy
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/18763 [SPARK-21306][ML] OneVsRest should support setWeightCol for branch-2.1 The PR is related to #18554, and is modified for branch 2.1. ## What changes were proposed in this pull request

[GitHub] spark pull request #18763: [SPARK-21306][ML] For branch 2.1, OneVsRest shoul...

2017-07-28 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18763#discussion_r130202540 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -158,7 +158,7 @@ class OneVsRestSuite extends SparkFunSuite

[GitHub] spark pull request #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest shoul...

2017-07-28 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18764#discussion_r130200288 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -33,6 +33,7 @@ import

[GitHub] spark pull request #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest shoul...

2017-07-28 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18764#discussion_r130200379 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -143,6 +144,16 @@ class OneVsRestSuite extends SparkFunSuite

[GitHub] spark pull request #18763: [SPARK-21306][ML] For branch-2.1, OneVsRest shoul...

2017-07-28 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18763#discussion_r130200461 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -157,6 +157,16 @@ class OneVsRestSuite extends SparkFunSuite

[GitHub] spark pull request #18763: [SPARK-21306][ML] For branch 2.1, OneVsRest shoul...

2017-07-28 Thread facaiy
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18763#discussion_r130213337 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -158,7 +158,7 @@ class OneVsRestSuite extends SparkFunSuite

[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...

2017-08-08 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18764 Sure, thanks, @yanboliang ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest shoul...

2017-08-08 Thread facaiy
Github user facaiy closed the pull request at: https://github.com/apache/spark/pull/18764 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #18763: [SPARK-21306][ML] For branch 2.1, OneVsRest shoul...

2017-08-08 Thread facaiy
Github user facaiy closed the pull request at: https://github.com/apache/spark/pull/18763 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #18763: [SPARK-21306][ML] For branch 2.1, OneVsRest should suppo...

2017-08-08 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18763 Thanks, all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

  1   2   >