[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

2015-02-19 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/4610#discussion_r2537 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala --- @@ -66,9 +66,17 @@ private[sql] class DefaultSource mode

[GitHub] spark pull request: [SPARK-6291] [MLLIB] GLM toString toDebugStr...

2015-03-18 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/5038#issuecomment-82787376 @srowen , sorry for the misunderstand. I have change to use string interpolation and delete ```toDebugString``` which may be useless. --- If your project is set up

[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

2015-03-19 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/4610#issuecomment-83545467 @yhuai Thank you for your comments on this PR @liancheng If this LGTM, can I open another PR for ```ParquetRelation2``` and work on it? --- If your project

[GitHub] spark pull request: [SPARK-6095] [MLLIB] Support model save/load i...

2015-03-20 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/5016#issuecomment-83940710 @mengxr Thank you for your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-6096] [MLLIB] Support model save/load i...

2015-03-20 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/5106 [SPARK-6096] [MLLIB] Support model save/load in Python's Naive Bayes Model save/load in Python's Naive Bayes which reuse the Scala save/load implementation. You can merge this pull request

[GitHub] spark pull request: [SPARK-5821] [SQL] ParquetRelation2 CTAS shoul...

2015-03-20 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/5107 [SPARK-5821] [SQL] ParquetRelation2 CTAS should check if delete is successful Do the same check as #4610 for ParquetRelation2. You can merge this pull request into a Git repository by running

[GitHub] spark pull request: [SPARK-6256] [MLlib] MLlib Python API parity c...

2015-03-20 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/4997#issuecomment-84051013 @jkbradley @mengxr Can you review this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-6096] [MLLIB] Support model save/load i...

2015-03-20 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/5106#issuecomment-84254673 @mengxr Sorry for miss check, I will close this PR, thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-6096] [MLLIB] Support model save/load i...

2015-03-20 Thread yanboliang
Github user yanboliang closed the pull request at: https://github.com/apache/spark/pull/5106 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

2015-03-20 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/4610#issuecomment-84065407 @liancheng Please review my fix for ParquetRelation2 in #5107 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-6291] [MLLIB] GLM toString toDebugStr...

2015-03-16 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/5038 [SPARK-6291] [MLLIB] GLM toString toDebugString GLM toString prints out intercept, numFeatures. For LogisticRegression and SVM model, toString also prints out numClasses, threshold. GLM

[GitHub] spark pull request: [SPARK-6095] [MLLIB] Support model save/load i...

2015-03-09 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/4911#issuecomment-77854374 @mengxr Yes, it make sense, I will try to implement the save/load operation in Python which do the same thing as in Scala. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-6256] [MLlib] MLlib Python API parity c...

2015-03-12 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/4997 [SPARK-6256] [MLlib] MLlib Python API parity check for regression MLlib Python API parity check for Regression, major disparities list following: LinearRegressionWithSGD

[GitHub] spark pull request: [SPARK-6095] [MLLIB] Support model save/load i...

2015-03-06 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/4911#issuecomment-77572142 @mengxr Yes, it make sense. After look through the code, I found we have two alternatives: 1, Implement a new PythonMLLibAPI looks like this def

[GitHub] spark pull request: [SPARK-6095] [MLLIB] Support model save/load i...

2015-03-13 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/5016#issuecomment-79633944 retest, please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6095] [MLLIB] Support model save/load i...

2015-03-13 Thread yanboliang
Github user yanboliang closed the pull request at: https://github.com/apache/spark/pull/4911 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: Support model save/load in Python's linear mod...

2015-03-13 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/5016 Support model save/load in Python's linear models You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-6095 Alternatively

[GitHub] spark pull request: [SPARK-6095] [MLLIB] Support model save/load i...

2015-03-13 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/4911#issuecomment-79110275 Sorry I closed this PR accidentally, I pull another request #5016 for this issue. I have implement save/load operation in Python for classification model

[GitHub] spark pull request: [SPARK-6256] [MLlib] MLlib Python API parity c...

2015-03-24 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/4997#discussion_r27030181 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -111,9 +111,11 @@ private[python] class PythonMLLibAPI

[GitHub] spark pull request: LogisticRegressionWithLBFGS.run(input, initial...

2015-03-24 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/5167 LogisticRegressionWithLBFGS.run(input, initialWeights) should initialize numFeatures LogisticRegressionWithLBFGS.run(input, initialWeights) should initialize numFeatures You can merge

[GitHub] spark pull request: [SPARK-6255] [MLLIB] Support multiclass classi...

2015-03-24 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/5137#issuecomment-85571917 @jkbradley @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5990] [MLLIB] Model import/export for I...

2015-03-30 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/5270 [SPARK-5990] [MLLIB] Model import/export for IsotonicRegression Model import/export for IsotonicRegression You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: [SPARK-6580] [MLLIB] Optimize LogisticRegressi...

2015-03-31 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/5249#discussion_r27545988 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -145,13 +135,20 @@ class LogisticRegressionModel

[GitHub] spark pull request: [SPARK-6580] [MLLIB] Optimize LogisticRegressi...

2015-03-31 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/5249#discussion_r27500087 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -106,7 +118,6 @@ class LogisticRegressionModel

[GitHub] spark pull request: [SPARK-6580] [MLLIB] Optimize LogisticRegressi...

2015-03-31 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/5249#discussion_r27499535 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -145,13 +135,20 @@ class LogisticRegressionModel

[GitHub] spark pull request: [SPARK-6580] [MLLIB] Optimize LogisticRegressi...

2015-03-31 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/5249#discussion_r27497189 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -106,7 +118,6 @@ class LogisticRegressionModel

[GitHub] spark pull request: [SPARK-6580] [MLLIB] Optimize LogisticRegressi...

2015-03-31 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/5249#discussion_r27499263 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -145,13 +135,20 @@ class LogisticRegressionModel

[GitHub] spark pull request: [SPARK-6580] [MLLIB] Optimize LogisticRegressi...

2015-03-31 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/5249#discussion_r27498701 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -145,13 +135,20 @@ class LogisticRegressionModel

[GitHub] spark pull request: [SPARK-6580] [MLLIB] Optimize LogisticRegressi...

2015-03-31 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/5249#discussion_r27498752 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -145,13 +135,20 @@ class LogisticRegressionModel

[GitHub] spark pull request: [SPARK-6580] [MLLIB] Optimize LogisticRegressi...

2015-03-31 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/5249#issuecomment-88176235 @srowen Thank you for your comments, I have updated the commits. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-6255] [MLLIB] Support multiclass classi...

2015-03-31 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/5137#issuecomment-88176441 @jkbradley Thank you for your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-6580] [MLLIB] Optimize LogisticRegressi...

2015-03-29 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/5249#issuecomment-87369690 retest please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6580] [MLLIB] Optimize LogisticRegressi...

2015-03-29 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/5249 [SPARK-6580] [MLLIB] Optimize LogisticRegressionModel.predictPoint https://issues.apache.org/jira/browse/SPARK-6580 You can merge this pull request into a Git repository by running: $ git

[GitHub] spark pull request: [SPARK-6255] [MLLIB] Python API parity check f...

2015-03-23 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/5137 [SPARK-6255] [MLLIB] Python API parity check for classification Python API parity check for classification Support multiclass classification in pyspark You can merge this pull request

[GitHub] spark pull request: [WIP] [SPARK-6255] [MLLIB] Python API parity c...

2015-03-23 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/5137#issuecomment-85059666 This PR is work in progress, I still need to make LogisticRegressionModel.predict can handle multiclass classification. --- If your project is set up for it, you can

[GitHub] spark pull request: correct LogisticRegressionWithLBFGS regType pa...

2015-02-28 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/4831 correct LogisticRegressionWithLBFGS regType parameter for pyspark Currently LogisticRegressionWithLBFGS in python/pyspark/mllib/classification.py will invoke callMLlibFunc with a wrong regType

[GitHub] spark pull request: [SPARK-6080] [PySpark] correct LogisticRegress...

2015-03-01 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/4831#discussion_r25582380 --- Diff: python/pyspark/mllib/classification.py --- @@ -207,7 +207,7 @@ def train(cls, data, iterations=100, initialWeights=None, regParam=0.01, regType

[GitHub] spark pull request: [SPARK-6095] [MLLIB] Support model save/load i...

2015-03-05 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/4911 [SPARK-6095] [MLLIB] Support model save/load in Python's linear models Linear models can be stored in Python which is different from other models currently. So we leverage pickle

[GitHub] spark pull request: [SPARK-5926] [SQL] make DataFrame.explain leve...

2015-02-22 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/4707#issuecomment-75437798 test please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5926] [SQL] make DataFrame.explain leve...

2015-02-23 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/4707#issuecomment-75536583 retest please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5926] [SQL] make DataFrame.explain leve...

2015-02-22 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/4707#issuecomment-75439264 retest please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5926] [SQL] make DataFrame.explain leve...

2015-02-20 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/4707 [SPARK-5926] [SQL] make DataFrame.explain leverage queryExecution.logical DataFrame.explain return wrong result when the query is DDL command. For example, the following two queries

[GitHub] spark pull request: [SPARK-5926] [SQL] make DataFrame.explain leve...

2015-02-20 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/4707#discussion_r25091776 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala --- @@ -123,7 +123,7 @@ class DataFrame protected[sql

[GitHub] spark pull request: [SPARK-6264] [MLLIB] Support FPGrowth algorith...

2015-03-26 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/5213 [SPARK-6264] [MLLIB] Support FPGrowth algorithm in Python API Support FPGrowth algorithm in Python API You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: [SPARK-6827] [mllib] Wrap FPGrowthModel.freqIt...

2015-04-21 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/5614 [SPARK-6827] [mllib] Wrap FPGrowthModel.freqItemsets and make it consistent with Java API You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: [SPARK-5738] [SQL] Reuse mutable row for each ...

2015-04-24 Thread yanboliang
Github user yanboliang closed the pull request at: https://github.com/apache/spark/pull/4527 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-6827] [mllib] Wrap FPGrowthModel.freqIt...

2015-04-21 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/5614#issuecomment-95011704 @mengxr Thank you for your comments and help, I have merged your PR to this PR. I will investigate the pickle/unpickle problems and file another PR to resolve

[GitHub] spark pull request: [SPARK-6267] [MLLIB] Python API for IsotonicRe...

2015-05-04 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/5890 [SPARK-6267] [MLLIB] Python API for IsotonicRegression https://issues.apache.org/jira/browse/SPARK-6267 You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: [SPARK-6093] [MLLIB] Add RegressionMetrics in ...

2015-05-06 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/5941 [SPARK-6093] [MLLIB] Add RegressionMetrics in PySpark/MLlib https://issues.apache.org/jira/browse/SPARK-6093 You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: [SPARK-5913] [MLLIB] Python API for ChiSqSelec...

2015-05-06 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/5939 [SPARK-5913] [MLLIB] Python API for ChiSqSelector https://issues.apache.org/jira/browse/SPARK-5913 You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: [SPARK-6093] [MLLIB] Add RegressionMetrics in ...

2015-05-07 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/5941#issuecomment-100102143 @mengxr , OK, I will take them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-5913] [MLLIB] Python API for ChiSqSelec...

2015-05-08 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/5939#issuecomment-100115666 @jkbradley , I think the Python docs for ChiSqSelector and ChiSqSelectorModel have reached parity with Scala ones. Please correct me is I misunderstand. Yes, I

[GitHub] spark pull request: [SPARK-6093] [MLLIB] Add RegressionMetrics in ...

2015-05-07 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/5941#discussion_r29867547 --- Diff: python/pyspark/mllib/evaluation.py --- @@ -67,6 +67,73 @@ def unpersist(self): self.call(unpersist) +class

[GitHub] spark pull request: [SPARK-5913] [MLLIB] Python API for ChiSqSelec...

2015-05-07 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/5939#issuecomment-99925969 @mengxr , @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6091] [MLLIB] Add MulticlassMetrics in ...

2015-05-08 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/6011 [SPARK-6091] [MLLIB] Add MulticlassMetrics in PySpark/MLlib https://issues.apache.org/jira/browse/SPARK-6091 You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: [SPARK-6092] [MLLIB] Add RankingMetrics in PyS...

2015-05-11 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6044#issuecomment-100792972 It looks like py4j can not recognize [T: ClassTag] for RankingMetrics. If I removed [T: ClassTag] and specify the type (for example: Int) of RankingMetrics, py4j

[GitHub] spark pull request: [SPARK-6092] [MLLIB] Add RankingMetrics in PyS...

2015-05-10 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6044#issuecomment-100752796 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-6092] [MLLIB] Add RankingMetrics in PyS...

2015-05-11 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6044#issuecomment-100907856 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-6092] [MLLIB] Add RankingMetrics in PyS...

2015-05-10 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/6044 [SPARK-6092] [MLLIB] Add RankingMetrics in PySpark/MLlib You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-6092

[GitHub] spark pull request: [SPARK-6091] [MLLIB] Add MulticlassMetrics in ...

2015-05-09 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6011#issuecomment-100512819 Retest please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6258] [MLLIB] GaussianMixture Python AP...

2015-05-13 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/6087#discussion_r30215924 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -345,28 +345,39 @@ private[python] class PythonMLLibAPI

[GitHub] spark pull request: [SPARK-6258] [MLLIB] GaussianMixture Python AP...

2015-05-13 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/6087#discussion_r3021 --- Diff: python/pyspark/mllib/clustering.py --- @@ -166,12 +166,39 @@ class GaussianMixtureModel(object): True labels[3]==labels[4

[GitHub] spark pull request: [SPARK-6258] [MLLIB] GaussianMixture Python AP...

2015-05-13 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6087#issuecomment-101624047 @FlytxtRnD , at Scala we use three parameters separately to construct ```GaussianMixtureModel```. So if we pass a single compacting list, we need to unfold

[GitHub] spark pull request: [SPARK-6258] [MLLIB] GaussianMixture Python AP...

2015-05-12 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/6087 [SPARK-6258] [MLLIB] GaussianMixture Python API parity check You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-6258

[GitHub] spark pull request: [SPARK-6258] [MLLIB] GaussianMixture Python AP...

2015-05-14 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6087#issuecomment-101942746 gassian[i] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-6258] [MLLIB] GaussianMixture Python AP...

2015-05-14 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6087#issuecomment-101948297 weightsi --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-6258] [MLLIB] GaussianMixture Python AP...

2015-05-14 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6087#issuecomment-101949379 gaussians[i]() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6258] [MLLIB] GaussianMixture Python AP...

2015-05-14 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6087#issuecomment-101949093 gaussians[i] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6258] [MLLIB] GaussianMixture Python AP...

2015-05-14 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/6087#discussion_r30382722 --- Diff: python/pyspark/mllib/clustering.py --- @@ -166,12 +168,38 @@ class GaussianMixtureModel(object): True labels[3]==labels[4

[GitHub] spark pull request: [SPARK-6094] [MLlib] Add MultilabelMetrics in ...

2015-05-19 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/6276 [SPARK-6094] [MLlib] Add MultilabelMetrics in PySpark/MLlib Add MultilabelMetrics in PySpark/MLlib You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: [SPARK-7604] [MLlib] Python API for PCA and PC...

2015-05-21 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/6315 [SPARK-7604] [MLlib] Python API for PCA and PCAModel Python API for PCA and PCAModel You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: [Minor] [MLLib] rename some functions of Pytho...

2015-06-25 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/7011 [Minor] [MLLib] rename some functions of PythonMLLibAPI Keep the same naming conventions for PythonMLLibAPI. Only the following three function is different from others ```scala

[GitHub] spark pull request: [SPARK-5962] [MLlib] Python support for Power ...

2015-06-25 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6992#issuecomment-115470960 @jkbradley @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5962] [MLlib] Python support for Power ...

2015-06-24 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/6991 [SPARK-5962] [MLlib] Python support for Power Iteration Clustering Python support for Power Iteration Clustering https://issues.apache.org/jira/browse/SPARK-5962 You can merge this pull

[GitHub] spark pull request: [SPARK-5962] [MLlib] Python support for Power ...

2015-06-24 Thread yanboliang
Github user yanboliang closed the pull request at: https://github.com/apache/spark/pull/6991 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-5962] [MLlib] Python support for Power ...

2015-06-24 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/6992 [SPARK-5962] [MLlib] Python support for Power Iteration Clustering Python support for Power Iteration Clustering https://issues.apache.org/jira/browse/SPARK-5962 You can merge this pull

[GitHub] spark pull request: [MLlib] [SPARK-7667] MLlib Python API consiste...

2015-06-23 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6856#issuecomment-114429515 @jkbradley, ChiSqSelectorModel is not inheriting correctly and I change it to the original style. I have checked the merge issue and it can be merged cleanly if I

[GitHub] spark pull request: [SPARK-7604] [MLlib] Python API for PCA and PC...

2015-06-16 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6315#issuecomment-112646149 @jkbradley @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [MLlib] [SPARK-7667] MLlib Python API consiste...

2015-06-17 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/6856#discussion_r32611917 --- Diff: python/pyspark/mllib/feature.py --- @@ -123,20 +132,6 @@ class StandardScalerModel(JavaVectorTransformer): Represents

[GitHub] spark pull request: [MLlib] [SPARK-7667] MLlib Python API consiste...

2015-06-17 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/6856 [MLlib] [SPARK-7667] MLlib Python API consistency check MLlib Python API consistency check You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: [MLlib] [SPARK-7667] MLlib Python API consiste...

2015-06-17 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/6856#discussion_r32611974 --- Diff: python/pyspark/mllib/feature.py --- @@ -205,14 +200,6 @@ class ChiSqSelectorModel(JavaVectorTransformer): Represents a Chi Squared

[GitHub] spark pull request: [SPARK-7916] [MLlib] MLlib Python doc parity c...

2015-06-15 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6460#issuecomment-112258204 retest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-7916] [MLlib] MLlib Python doc parity c...

2015-05-30 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6460#issuecomment-107009193 The length of docstring line for PEP8 standard is 72, so truncate too long line. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-7918] [MLlib] MLlib Python doc parity c...

2015-05-30 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/6461#discussion_r31378624 --- Diff: python/pyspark/mllib/evaluation.py --- @@ -27,6 +27,8 @@ class BinaryClassificationMetrics(JavaModelWrapper): Evaluator

[GitHub] spark pull request: [SPARK-7918] [MLlib] MLlib Python doc parity c...

2015-05-28 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/6461 [SPARK-7918] [MLlib] MLlib Python doc parity check for evaluation and feature Check then make the MLlib Python evaluation and feature doc to be as complete as the Scala doc. You can merge

[GitHub] spark pull request: [SPARK-7916] [MLlib] [Doc] MLlib Python doc pa...

2015-05-28 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/6460 [SPARK-7916] [MLlib] [Doc] MLlib Python doc parity check for classification and regression Check then make the MLlib Python classification and regression doc to be as complete as the Scala doc

[GitHub] spark pull request: [SPARK-8758] [MLlib] Add Python user guide for...

2015-07-01 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/7155 [SPARK-8758] [MLlib] Add Python user guide for PowerIterationClustering Add Python user guide for PowerIterationClustering You can merge this pull request into a Git repository by running

[GitHub] spark pull request: [SPARK-8765] [MLlib] Fix PySpark PowerIteratio...

2015-07-01 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/7177#discussion_r33747859 --- Diff: python/pyspark/mllib/clustering.py --- @@ -282,18 +282,30 @@ class PowerIterationClusteringModel(JavaModelWrapper, JavaSaveable, JavaLoader

[GitHub] spark pull request: [SPARK-8765] [MLlib] Fix PySpark PowerIteratio...

2015-07-01 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/7177 [SPARK-8765] [MLlib] Fix PySpark PowerIterationClustering test issue PySpark PowerIterationClustering test failure due to bad You can merge this pull request into a Git repository by running

[GitHub] spark pull request: [SPARK-8765] [MLLIB] [PYTHON] removed flaky py...

2015-07-01 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/7164#issuecomment-117900730 @mengxr @jkbradley I have found the test failure caused by the small dataset. If the dataset is small, PowerIterationClustering will behavior indeterministic. I have

[GitHub] spark pull request: [SPARK-8765] [MLlib] Fix PySpark PowerIteratio...

2015-07-03 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/7177#issuecomment-118370370 @mengxr @jkbradley Yes, the cluster assignments is deterministic subject to numerical difference. The current tests is deterministic right now just like the test

[GitHub] spark pull request: [SPARK-8788] [ML] Add Java unit test for PCA t...

2015-07-03 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/7184#discussion_r33868622 --- Diff: mllib/src/test/java/org/apache/spark/ml/feature/JavaPCASuite.java --- @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-8788] [ML] Add Java unit test for PCA t...

2015-07-02 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/7184 [SPARK-8788] [ML] Add Java unit test for PCA transformer Add Java unit test for PCA transformer You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: [SPARK-8792] [ML] Add Python API for PCA trans...

2015-07-02 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/7190 [SPARK-8792] [ML] Add Python API for PCA transformer Add Python API for PCA transformer You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: [SPARK-5962] [MLlib] Python support for Power ...

2015-06-28 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/6992#discussion_r33420166 --- Diff: python/pyspark/mllib/clustering.py --- @@ -466,7 +557,8 @@ def predictOnValues(self, dstream): def _test(): import doctest

[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-28 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/7065 [SPARK-8664] [ML] Add PCA transformer Add PCA transformer for ML pipeline You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark

[GitHub] spark pull request: [MLlib] [SPARK-7667] MLlib Python API consiste...

2015-06-28 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6856#issuecomment-116222651 @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8664] [ML] Add PCA transformer

2015-06-28 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/7065#discussion_r33420498 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -68,7 +68,7 @@ class PCA(val k: Int) { * @param k number of principal

[GitHub] spark pull request: [SPARK-7604] [MLlib] Python API for PCA and PC...

2015-05-25 Thread yanboliang
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/6315#issuecomment-105149865 @mengxr @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5133] [ml] Added featureImportance to R...

2015-08-02 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/7838#discussion_r36054229 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/RandomForestRegressor.scala --- @@ -30,7 +30,7 @@ import org.apache.spark.mllib.tree.model

  1   2   3   4   5   6   7   8   9   10   >