spark git commit: [SPARK-16155][DOC] remove package grouping in Java docs

2016-06-22 Thread meng
ala`. I didn't find anyone complaining about missing groups since 1.5.0 on Google. Manually checked the generated Java API docs and confirmed that they are the same as in master. Author: Xiangrui Meng <m...@databricks.com> Closes #13856 from mengxr/SPARK-16155. Project: http:

spark git commit: [SPARK-16153][MLLIB] switch to multi-line doc to avoid a genjavadoc bug

2016-06-22 Thread meng
re.ChiSqSelectorModel setLabelCol (java.lang.String value) { throw new RuntimeException(); } ~~~ Switching to multiline is a workaround. Author: Xiangrui Meng <m...@databricks.com> Closes #13855 from mengxr/SPARK-16153. (cherry picked from commit 00cc5cca4522297b63b1522a2b8643b1a098e2b3) Signed-off

spark git commit: [SPARK-16153][MLLIB] switch to multi-line doc to avoid a genjavadoc bug

2016-06-22 Thread meng
re.ChiSqSelectorModel setLabelCol (java.lang.String value) { throw new RuntimeException(); } ~~~ Switching to multiline is a workaround. Author: Xiangrui Meng <m...@databricks.com> Closes #13855 from mengxr/SPARK-16153. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-w

spark git commit: [MINOR][MLLIB] DefaultParamsReadable/Writable should be DeveloperApi

2016-06-22 Thread meng
ent `Transformer/Estimator` would use it. So this PR changes the annotation to `DeveloperApi`. Author: Xiangrui Meng <m...@databricks.com> Closes #13828 from mengxr/default-readable-should-be-developer-api. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.

spark git commit: [MINOR][MLLIB] DefaultParamsReadable/Writable should be DeveloperApi

2016-06-22 Thread meng
ent `Transformer/Estimator` would use it. So this PR changes the annotation to `DeveloperApi`. Author: Xiangrui Meng <m...@databricks.com> Closes #13828 from mengxr/default-readable-should-be-developer-api. (cherry picked from commit 6a6010f0015542dc2753b2cb12fdd1204db63ea6) Signed-off-by: Xian

spark git commit: [SPARK-16127][ML][PYPSARK] Audit @Since annotations related to ml.linalg

2016-06-22 Thread meng
alg-since. (cherry picked from commit 18faa588ca11190890d2eb569d7497fbb25eee5c) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0cde3ad6 Tree: http://git-wip-us.apache.org/

spark git commit: [SPARK-16127][ML][PYPSARK] Audit @Since annotations related to ml.linalg

2016-06-22 Thread meng
er Commit: 18faa588ca11190890d2eb569d7497fbb25eee5c Parents: ea3a12b Author: Nick Pentreath <ni...@za.ibm.com> Authored: Wed Jun 22 10:05:25 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Wed Jun 2

spark git commit: [SPARK-16107][R] group glm methods in documentation

2016-06-22 Thread meng
(cherry picked from commit ea3a12b0147821960f8dabdc58d726f07f1f0e52) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1cfdd25f Tree: http://git-wip-us.apache.org/repos/as

spark git commit: [SPARK-16118][MLLIB] add getDropLast to OneHotEncoder

2016-06-21 Thread meng
hor: Xiangrui Meng <m...@databricks.com> Closes #13821 from mengxr/SPARK-16118. (cherry picked from commit 9493b079a0050f0a6f4936c17622b96fb185b67f) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.a

spark git commit: [SPARK-16118][MLLIB] add getDropLast to OneHotEncoder

2016-06-21 Thread meng
hor: Xiangrui Meng <m...@databricks.com> Closes #13821 from mengxr/SPARK-16118. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9493b079 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9493b079 Diff: http:

spark git commit: [MINOR][MLLIB] move setCheckpointInterval to non-expert setters

2016-06-21 Thread meng
oup. Author: Xiangrui Meng <m...@databricks.com> Closes #13813 from mengxr/checkpoint-non-expert. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/918c9195 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/918c9195 D

spark git commit: [MINOR][MLLIB] move setCheckpointInterval to non-expert setters

2016-06-21 Thread meng
oup. Author: Xiangrui Meng <m...@databricks.com> Closes #13813 from mengxr/checkpoint-non-expert. (cherry picked from commit 918c91954fb46400ce2c5ab066d2ec0ae48dda4a) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-15741][PYSPARK][ML] Pyspark cleanup of set default seed to None

2016-06-21 Thread meng
igned-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f805b989 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f805b989 Diff: http://git-wip-us.apache.org/repos/asf/spar

spark git commit: [SPARK-15741][PYSPARK][ML] Pyspark cleanup of set default seed to None

2016-06-21 Thread meng
ler <cutl...@gmail.com> Authored: Tue Jun 21 11:43:25 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Tue Jun 21 11:43:25 2016 -0700 -- python/pyspark/ml/classification.py | 4 ++-- python/pyspark/ml/

spark git commit: [SPARK-15177][.1][R] make SparkR model params and default values consistent with MLlib

2016-06-21 Thread meng
axIter -> 20, default initMode -> "k-means||" * `spark.naiveBayes`: laplace -> smoothing, default 1.0 ## How was this patch tested? Existing unit tests. Author: Xiangrui Meng <m...@databricks.com> Closes #13801 from mengxr/SPARK-15177.1. (cherry picked from commit 4f8

spark git commit: [SPARK-15177][.1][R] make SparkR model params and default values consistent with MLlib

2016-06-21 Thread meng
gt; 20, default initMode -> "k-means||" * `spark.naiveBayes`: laplace -> smoothing, default 1.0 ## How was this patch tested? Existing unit tests. Author: Xiangrui Meng <m...@databricks.com> Closes #13801 from mengxr/SPARK-15177.1. Project: http://git-wip-us.apache.org/re

spark git commit: [SPARK-16045][ML][DOC] Spark 2.0 ML.feature: doc update for stopwords and binarizer

2016-06-21 Thread meng
Yuhao Yang <hhb...@gmail.com> Authored: Tue Jun 21 00:47:36 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Tue Jun 21 00:47:36 2016 -0700 -- docs/ml-features.md | 16 ++-- 1 file changed, 10

spark git commit: [SPARK-16045][ML][DOC] Spark 2.0 ML.feature: doc update for stopwords and binarizer

2016-06-21 Thread meng
be4) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0499ed96 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0499ed96 Diff: http://git-wip-us.apache.org/repos/as

spark git commit: [SPARK-10258][DOC][ML] Add @Since annotations to ml.feature

2016-06-21 Thread meng
pache.org/repos/asf/spark/diff/37494a18 Branch: refs/heads/master Commit: 37494a18e8d6e22113338523d6498e00ac9725ea Parents: ce49bfc Author: Nick Pentreath <ni...@za.ibm.com> Authored: Tue Jun 21 00:39:47 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Tue Jun 2

spark git commit: [SPARK-10258][DOC][ML] Add @Since annotations to ml.feature

2016-06-21 Thread meng
ath <ni...@za.ibm.com> Closes #13641 from MLnick/add-since-annotations. (cherry picked from commit 37494a18e8d6e22113338523d6498e00ac9725ea) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apach

spark git commit: Revert "[SPARK-16086] [SQL] fix Python UDF without arguments (for 1.6)"

2016-06-21 Thread meng
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/37d05ec9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/37d05ec9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/37d05ec9 Branch: refs/heads/branch-2.0 Commit: 37d05ec9e96c0da786ee26b5c25216bf98f239c0 Parents: 34feea3 Author:

spark git commit: Revert "[SPARK-16086] [SQL] fix Python UDF without arguments (for 1.6)"

2016-06-21 Thread meng
t: http://git-wip-us.apache.org/repos/asf/spark/commit/ce49bfc2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ce49bfc2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ce49bfc2 Branch: refs/heads/master Commit: ce49bfc2550ba8f5a33235c7fc3b88201d63c276 Parents: 843a1eb Author: Xia

spark git commit: [SPARK-16079][PYSPARK][ML] Added missing import for DecisionTreeRegressionModel used in GBTClassificationModel

2016-06-20 Thread meng
4) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c7006538 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c7006538 Diff: http://git-wip-us.apache.org/repos

spark git commit: [SPARK-16079][PYSPARK][ML] Added missing import for DecisionTreeRegressionModel used in GBTClassificationModel

2016-06-20 Thread meng
ler <cutl...@gmail.com> Authored: Mon Jun 20 16:28:11 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Mon Jun 20 16:28:11 2016 -0700 -- python/pyspark/ml/classification.py | 6 -- python/pys

spark git commit: [SPARK-16035][PYSPARK] Fix SparseVector parser assertion for end parenthesis

2016-06-17 Thread meng
Jun 17 22:41:05 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Fri Jun 17 22:41:05 2016 -0700 -- python/pyspark/mllib/linalg/__init__.py | 2 +- 1 file changed, 1 ins

spark git commit: [SPARK-16035][PYSPARK] Fix SparseVector parser assertion for end parenthesis

2016-06-17 Thread meng
ong variable. I corrected that. ## How was this patch tested? Manual test Author: andreapasqua <and...@radius.com> Closes #13750 from andreapasqua/sparse-vector-parser-assertion-fix. (cherry picked from commit 4c64e88d5ba4c36cbdbc903376492f0f43401e4e) Signed-off-by: Xiangrui Meng <m...@da

spark git commit: [SPARK-16035][PYSPARK] Fix SparseVector parser assertion for end parenthesis

2016-06-17 Thread meng
ong variable. I corrected that. ## How was this patch tested? Manual test Author: andreapasqua <and...@radius.com> Closes #13750 from andreapasqua/sparse-vector-parser-assertion-fix. (cherry picked from commit 4c64e88d5ba4c36cbdbc903376492f0f43401e4e) Signed-off-by: Xiangrui Meng <m...@da

spark git commit: [SPARK-15129][R][DOC] R API changes in ML

2016-06-17 Thread meng
PIs Author: GayathriMurali <gayathr...@intel.com> Closes #13285 from GayathriMurali/SPARK-15129. (cherry picked from commit af2a4b0826b2358c0fe75c3e4d7fd8f7bccdd8e5) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-15129][R][DOC] R API changes in ML

2016-06-17 Thread meng
//git-wip-us.apache.org/repos/asf/spark/diff/af2a4b08 Branch: refs/heads/master Commit: af2a4b0826b2358c0fe75c3e4d7fd8f7bccdd8e5 Parents: 10b6714 Author: GayathriMurali <gayathr...@intel.com> Authored: Fri Jun 17 21:10:29 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Commit

spark git commit: [SPARK-15892][ML] Backport correctly merging AFTAggregators to branch 1.6

2016-06-17 Thread meng
4:24 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Fri Jun 17 21:04:24 2016 -0700 -- .../ml/regression/AFTSurvivalRegression.scala | 2 +- .../ml/regression/AFTSurvivalRegressionSuite.scala | 17 +

spark git commit: [SPARK-16008][ML] Remove unnecessary serialization in logistic regression

2016-06-17 Thread meng
picked from commit 1f0a46958ef51a01560ada23665dccde89696e12) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/269b715e Tree: http://git-wip-us.apache.org/repos/asf/spark/

spark git commit: [SPARK-16008][ML] Remove unnecessary serialization in logistic regression

2016-06-17 Thread meng
aster Commit: 1f0a46958ef51a01560ada23665dccde89696e12 Parents: 34d6c4c Author: sethah <seth.hendrickso...@gmail.com> Authored: Fri Jun 17 09:58:49 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Fri Jun 17 09:58:49 2016 -0700 -

spark git commit: [SPARK-15364][ML][PYSPARK] Implement PySpark picklers for ml.Vector and ml.Matrix under spark.ml.python

2016-06-13 Thread meng
bm.com> Closes #13219 from viirya/pyspark-pickler-ml. (cherry picked from commit baa3e633e18c47b12e79fe3ddc01fc8ec010f096) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/

spark git commit: [SPARK-15364][ML][PYSPARK] Implement PySpark picklers for ml.Vector and ml.Matrix under spark.ml.python

2016-06-13 Thread meng
park/diff/baa3e633 Branch: refs/heads/master Commit: baa3e633e18c47b12e79fe3ddc01fc8ec010f096 Parents: 5827b65 Author: Liang-Chi Hsieh <sim...@tw.ibm.com> Authored: Mon Jun 13 19:59:53 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Mon Jun 1

svn commit: r1745389 - in /spark: research.md site/research.html

2016-05-24 Thread meng
Author: meng Date: Tue May 24 18:30:06 2016 New Revision: 1745389 URL: http://svn.apache.org/viewvc?rev=1745389=rev Log: fix typo Modified: spark/research.md spark/site/research.html Modified: spark/research.md URL: http://svn.apache.org/viewvc/spark/research.md?rev=1745389=1745388

svn commit: r1745381 - in /spark: documentation.md site/documentation.html

2016-05-24 Thread meng
Author: meng Date: Tue May 24 17:41:08 2016 New Revision: 1745381 URL: http://svn.apache.org/viewvc?rev=1745381=rev Log: list papers only on the research page Modified: spark/documentation.md spark/site/documentation.html Modified: spark/documentation.md URL: http://svn.apache.org

svn commit: r1745380 - in /spark: research.md site/research.html

2016-05-24 Thread meng
Author: meng Date: Tue May 24 17:33:48 2016 New Revision: 1745380 URL: http://svn.apache.org/viewvc?rev=1745380=rev Log: add MLlib and SparkR papers Modified: spark/research.md spark/site/research.html Modified: spark/research.md URL: http://svn.apache.org/viewvc/spark/research.md?rev

spark git commit: [SPARK-15222][SPARKR][ML] SparkR ML examples update in 2.0

2016-05-20 Thread meng
glm * spark.survreg * spark.naiveBayes * spark.kmeans ## How was this patch tested? Offline test. Author: Yanbo Liang <yblia...@gmail.com> Closes #13000 from yanboliang/spark-15222. (cherry picked from commit 9a9c6f5c22248c5a891e9d3b788ff12b6b4718b2) Signed-off-by: Xiangrui Meng <m...@da

spark git commit: [SPARK-15222][SPARKR][ML] SparkR ML examples update in 2.0

2016-05-20 Thread meng
May 20 09:30:20 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Fri May 20 09:30:20 2016 -0700 -- examples/src/main/r/ml.R | 129 -- 1 file changed, 112 insert

spark git commit: [SPARK-15339][ML] ML 2.0 QA: Scala APIs and code audit for regression

2016-05-20 Thread meng
//git-wip-us.apache.org/repos/asf/spark/diff/c94b34eb Branch: refs/heads/master Commit: c94b34ebbf4c6ce353c899c571beb34e8db98917 Parents: 5e20350 Author: Yanbo Liang <yblia...@gmail.com> Authored: Thu May 19 23:35:20 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Thu

spark git commit: [SPARK-15339][ML] ML 2.0 QA: Scala APIs and code audit for regression

2016-05-20 Thread meng
sts. Author: Yanbo Liang <yblia...@gmail.com> Closes #13129 from yanboliang/spark-15339. (cherry picked from commit c94b34ebbf4c6ce353c899c571beb34e8db98917) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-15394][ML][DOCS] User guide typos and grammar audit

2016-05-20 Thread meng
ges only. Note that many of these changes were identified by whomfire01 Author: sethah <seth.hendrickso...@gmail.com> Closes #13180 from sethah/ml_guide_audit. (cherry picked from commit 5e203505f1a092e5849ebd01d9ff9e4fc6cdc34a) Signed-off-by: Xiangrui Meng <m...@databricks.com> Proj

spark git commit: [SPARK-15394][ML][DOCS] User guide typos and grammar audit

2016-05-20 Thread meng
9:37 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Thu May 19 23:29:37 2016 -0700 -- docs/ml-classification-regression.md | 28 +- docs/ml-clustering.md| 2 +- docs/ml-colla

spark git commit: [SPARK-15398][ML] Update the warning message to recommend ML usage

2016-05-20 Thread meng
mit: 47a2940da97caa55bbb8bb8ec1d51c9f6d5041c6 Parents: 4c7a6b3 Author: Zheng RuiFeng <ruife...@foxmail.com> Authored: Thu May 19 23:26:11 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Thu May 19 23:26:11 2016 -0700 --

spark git commit: [SPARK-15398][ML] Update the warning message to recommend ML usage

2016-05-20 Thread meng
ses #13190 from zhengruifeng/update_recd. (cherry picked from commit 47a2940da97caa55bbb8bb8ec1d51c9f6d5041c6) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/539dfa20 Tree: http://git-wip-us.apache

spark git commit: [SPARK-15363][ML][EXAMPLE] Example code shouldn't use VectorImplicits._, asML/fromML

2016-05-20 Thread meng
wm...@hotmail.com <wm...@hotmail.com> Closes #13213 from wangmiao1981/ml. (cherry picked from commit 4c7a6b385c79f4de07a89495afce4f8e73b06086) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/

spark git commit: [SPARK-15363][ML][EXAMPLE] Example code shouldn't use VectorImplicits._, asML/fromML

2016-05-20 Thread meng
org/repos/asf/spark/diff/4c7a6b38 Branch: refs/heads/master Commit: 4c7a6b385c79f4de07a89495afce4f8e73b06086 Parents: 09a0051 Author: wm...@hotmail.com <wm...@hotmail.com> Authored: Thu May 19 23:21:17 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Thu May 1

spark git commit: [SPARK-15172][ML] Explicitly tell user initial coefficients is ignored when size mismatch happened in LogisticRegression

2016-05-20 Thread meng
igned-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8fb08777 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8fb08777 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8

spark git commit: Closes #11915 Closes #8648 Closes #13089

2016-05-19 Thread meng
asf/spark/tree/66ec2494 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/66ec2494 Branch: refs/heads/master Commit: 66ec2494938e2f9f4403285713d3cba7a1495cf1 Parents: 01cf649 Author: Xiangrui Meng <m...@databricks.com> Authored: Thu May 19 20:40:17 2016 -0700 Committer: Xiangru

[1/2] spark git commit: [SPARK-15296][MLLIB] Refactor All Java Tests that use SparkSession

2016-05-19 Thread meng
Repository: spark Updated Branches: refs/heads/branch-2.0 7e25131a9 -> 5fa23956b http://git-wip-us.apache.org/repos/asf/spark/blob/5fa23956/mllib/src/test/java/org/apache/spark/mllib/fpm/JavaPrefixSpanSuite.java -- diff --git

[2/2] spark git commit: [SPARK-15296][MLLIB] Refactor All Java Tests that use SparkSession

2016-05-19 Thread meng
Closes #13101 from techaddict/SPARK-15296. (cherry picked from commit 01cf649c4f96f64fb4bd09e0e1811cabcc5ead2e) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5fa2395

[2/2] spark git commit: [SPARK-15296][MLLIB] Refactor All Java Tests that use SparkSession

2016-05-19 Thread meng
cf649c Branch: refs/heads/master Commit: 01cf649c4f96f64fb4bd09e0e1811cabcc5ead2e Parents: 16ba71a Author: Sandeep Singh <sand...@techaddict.me> Authored: Thu May 19 20:38:44 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Thu May 19 2

[1/2] spark git commit: [SPARK-15296][MLLIB] Refactor All Java Tests that use SparkSession

2016-05-19 Thread meng
Repository: spark Updated Branches: refs/heads/master 16ba71aba -> 01cf649c4 http://git-wip-us.apache.org/repos/asf/spark/blob/01cf649c/mllib/src/test/java/org/apache/spark/mllib/fpm/JavaPrefixSpanSuite.java -- diff --git

spark git commit: [MINOR][ML][PYSPARK] ml.evaluation Scala and Python API sync

2016-05-19 Thread meng
6778 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/66436778 Branch: refs/heads/master Commit: 664367781786df7ec52e39950dccd5a09681602c Parents: f8107c7 Author: Yanbo Liang <yblia...@gmail.com> Authored: Thu May 19 17:56:21 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com&g

spark git commit: [MINOR][ML][PYSPARK] ml.evaluation Scala and Python API sync

2016-05-19 Thread meng
nge, no new tests. Author: Yanbo Liang <yblia...@gmail.com> Closes #13195 from yanboliang/evaluation-doc. (cherry picked from commit 664367781786df7ec52e39950dccd5a09681602c) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/re

spark git commit: [SPARK-15341][DOC][ML] Add documentation for "model.write" to clarify "summary" was not saved

2016-05-19 Thread meng
nch: refs/heads/master Commit: f8107c7846c9fcabbe2579867574305c7f2028e7 Parents: dcf407d Author: Yanbo Liang <yblia...@gmail.com> Authored: Thu May 19 17:54:18 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com>

spark git commit: [SPARK-15341][DOC][ML] Add documentation for "model.write" to clarify "summary" was not saved

2016-05-19 Thread meng
#13131 from yanboliang/spark-15341. (cherry picked from commit f8107c7846c9fcabbe2579867574305c7f2028e7) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b0aff55d Tree:

spark git commit: [SPARK-15414][MLLIB] Make the mllib, ml linalg type conversion APIs public

2016-05-19 Thread meng
echaddict.me> Authored: Thu May 19 17:24:42 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Thu May 19 17:24:42 2016 -0700 -- .../apache/spark/mllib/linalg/Matrices.scala| 30 ++--

spark git commit: [SPARK-15414][MLLIB] Make the mllib, ml linalg type conversion APIs public

2016-05-19 Thread meng
Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/758253f7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/758253f7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/758253f

[3/4] spark git commit: [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms

2016-05-17 Thread meng
http://git-wip-us.apache.org/repos/asf/spark/blob/ff1cfce1/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala

[4/4] spark git commit: [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms

2016-05-17 Thread meng
was this patch tested? Unit tests Author: DB Tsai <d...@netflix.com> Author: Liang-Chi Hsieh <sim...@tw.ibm.com> Author: Xiangrui Meng <m...@databricks.com> Closes #12627 from dbtsai/SPARK-14615-NewML. (cherry picked from commit e2efe0529acd748f26dbaa41331d1733ed256237) Signed-off-by

[2/4] spark git commit: [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms

2016-05-17 Thread meng
http://git-wip-us.apache.org/repos/asf/spark/blob/ff1cfce1/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala -- diff --git

[3/4] spark git commit: [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms

2016-05-17 Thread meng
http://git-wip-us.apache.org/repos/asf/spark/blob/e2efe052/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala

[1/4] spark git commit: [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms

2016-05-17 Thread meng
Repository: spark Updated Branches: refs/heads/master 9f176dd39 -> e2efe0529 http://git-wip-us.apache.org/repos/asf/spark/blob/e2efe052/mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala -- diff --git

[4/4] spark git commit: [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms

2016-05-17 Thread meng
was this patch tested? Unit tests Author: DB Tsai <d...@netflix.com> Author: Liang-Chi Hsieh <sim...@tw.ibm.com> Author: Xiangrui Meng <m...@databricks.com> Closes #12627 from dbtsai/SPARK-14615-NewML. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.ap

[2/4] spark git commit: [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms

2016-05-17 Thread meng
http://git-wip-us.apache.org/repos/asf/spark/blob/e2efe052/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala -- diff --git

spark git commit: [SPARK-14906][ML] Copy linalg in PySpark to new ML package

2016-05-17 Thread meng
How was this patch tested? Existing tests. Author: Xiangrui Meng <m...@databricks.com> Author: Liang-Chi Hsieh <sim...@tw.ibm.com> Author: Liang-Chi Hsieh <vii...@gmail.com> Closes #13099 from viirya/move-pyspark-vector-matrix-udt4. (cherry picked from commit 8ad9f08c94e98317a90

spark git commit: [SPARK-15268][SQL] Make JavaTypeInference work with UDTRegistration

2016-05-11 Thread meng
.0 Commit: 0858a82c141fe9b2d2c94a62c16657dcd6c3ec8b Parents: 749c29b Author: Liang-Chi Hsieh <sim...@tw.ibm.com> Authored: Wed May 11 09:31:22 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Wed May 1

spark git commit: [SPARK-14050][ML] Add multiple languages support and additional methods for Stop Words Remover

2016-05-06 Thread meng
ing changes: * load English stopwords as default * covert stopwords to list in Python * update some tests and doc ## How was this patch tested? Unit tests. Closes #11871 cc: burakkose srowen Author: Burak Köse <burakk...@gmail.com> Author: Xiangrui Meng <m...@databricks.com> Author: Bur

spark git commit: [SPARK-14050][ML] Add multiple languages support and additional methods for Stop Words Remover

2016-05-06 Thread meng
ges: * load English stopwords as default * covert stopwords to list in Python * update some tests and doc ## How was this patch tested? Unit tests. Closes #11871 cc: burakkose srowen Author: Burak Köse <burakk...@gmail.com> Author: Xiangrui Meng <m...@databricks.com> Author: Bur

spark git commit: [SPARK-6717][ML] Clear shuffle files after checkpointing in ALS

2016-05-03 Thread meng
iff: http://git-wip-us.apache.org/repos/asf/spark/diff/27efd92e Branch: refs/heads/branch-2.0 Commit: 27efd92e3683f88233ebe755855dac337069246f Parents: 5230810 Author: Holden Karau <hol...@us.ibm.com> Authored: Tue May 3 00:18:10 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com>

spark git commit: [SPARK-6717][ML] Clear shuffle files after checkpointing in ALS

2016-05-03 Thread meng
iff: http://git-wip-us.apache.org/repos/asf/spark/diff/f10ae4b1 Branch: refs/heads/master Commit: f10ae4b1e169495af11b8e8123c60dd96174477e Parents: d8f528c Author: Holden Karau <hol...@us.ibm.com> Authored: Tue May 3 00:18:10 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed

spark git commit: [SPARK-15030][ML][SPARKR] Support formula in spark.kmeans in SparkR

2016-04-30 Thread meng
; Authored: Sat Apr 30 08:37:56 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Sat Apr 30 08:37:56 2016 -0700 -- R/pkg/R/generics.R | 2 +- R/pkg/R/mllib.R

spark git commit: [SPARK-14653][ML] Remove json4s from mllib-local

2016-04-30 Thread meng
ncy minimal. The json encoding is used by Params. So we still need this feature in SPARK-14615, where we will switch to ml.linalg in spark.ml APIs. ## How was this patch tested? Copied existing unit tests over. cc; dbtsai Author: Xiangrui Meng <m...@databricks.com> Closes #12802 from men

spark git commit: [SPARK-14831][.2][ML][R] rename ml.save/ml.load to write.ml/read.ml

2016-04-30 Thread meng
ore consistent with read.df/write.df and other methods in SparkR. I didn't rename `data` to `df` because we still use `predict` for prediction, which uses `newData` to match the signature in R. ## How was this patch tested? Existing unit tests. cc: yanboliang thunterdb Author: Xiangrui Meng

spark git commit: [SPARK-14412][.2][ML] rename *RDDStorageLevel to *StorageLevel in ml.ALS

2016-04-30 Thread meng
vel -> intermediateStorageLevel * finalRDDStorageLevel -> finalStorageLevel The argument name in `ALS.train` will be addressed in SPARK-15027. ## How was this patch tested? Existing unit tests. Author: Xiangrui Meng <m...@databricks.com> Closes #12803 from mengxr/SPARK-14412. Project: http://git-wi

spark git commit: [SPARK-14533][MLLIB] RowMatrix.computeCovariance inaccurate when values are very large (partial fix)

2016-04-30 Thread meng
nch: refs/heads/master Commit: 5886b6217b7ac783ec605e38f5d960048d448976 Parents: f86f717 Author: Sean Owen <so...@cloudera.com> Authored: Sat Apr 30 00:15:41 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Sat Apr 3

spark git commit: [SPARK-14850][.2][ML] use UnsafeArrayData.fromPrimitiveArray in ml.VectorUDT/MatrixUDT

2016-04-30 Thread meng
ent `ml.VectorUDT/MatrixUDT` to avoid boxing/unboxing. ## How was this patch tested? Exiting unit tests. cc: cloud-fan Author: Xiangrui Meng <m...@databricks.com> Closes #12805 from mengxr/SPARK-14850. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/

spark git commit: [SPARK-14831][SPARKR] Make the SparkR MLlib API more consistent with Spark

2016-04-30 Thread meng
; Authored: Fri Apr 29 23:13:03 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Fri Apr 29 23:13:03 2016 -0700 -- R/pkg/NAMESPACE| 7 +- R/pkg/R/generics.R

spark git commit: [SPARK-14850][ML] convert primitive array from/to unsafe array directly in VectorUDT/MatrixUDT

2016-04-30 Thread meng
: 4bac703 Author: Wenchen Fan <wenc...@databricks.com> Authored: Fri Apr 29 23:04:51 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Fri Apr 29 23:04:51 2016 -0700 -- .../apache/spark/mllib/linalg/M

spark git commit: [SPARK-14412][ML][PYSPARK] Add StorageLevel params to ALS

2016-04-29 Thread meng
er Commit: 90fa2c6e7f4893af51e0cfb6dc162b828ea55995 Parents: d7755cf Author: Nick Pentreath <ni...@za.ibm.com> Authored: Fri Apr 29 22:01:41 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Fri Apr 29 22:01:41 2016 -0700 --

spark git commit: [SPARK-13786][ML][PYTHON] Removed save/load for python tuning

2016-04-29 Thread meng
-0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Fri Apr 29 20:51:24 2016 -0700 -- python/pyspark/ml/tests.py | 39 +++ python/pyspark/ml/tuning.py | 244 +--

spark git commit: [SPARK-14314][SPARK-14315][ML][SPARKR] Model persistence in SparkR (glm & kmeans)

2016-04-29 Thread meng
/tree/87ac84d4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/87ac84d4 Branch: refs/heads/master Commit: 87ac84d43729c54be100bb9ad7dc6e8fa14b8805 Parents: a7d0fed Author: Yanbo Liang <yblia...@gmail.com> Authored: Fri Apr 29 09:42:54 2016 -0700 Committer: Xiangrui Meng <m...@da

spark git commit: [SPARK-7264][ML] Parallel lapply for sparkR

2016-04-28 Thread meng
Commit: 769a909d1357766a441ff69e6e98c22c51b12c93 Parents: 4607f6e Author: Timothy Hunter <timhun...@databricks.com> Authored: Thu Apr 28 22:42:48 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Thu Apr 28 22:42:48 2016 -0700 ---

spark git commit: [SPARK-14487][SQL] User Defined Type registration without SQLUserDefinedType annotation

2016-04-28 Thread meng
-0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Thu Apr 28 01:14:49 2016 -0700 -- .../org/apache/spark/ml/linalg/MatrixUDT.scala | 111 +++ .../org/apache/spark/ml/linalg/V

spark git commit: [SPARK-14313][ML][SPARKR] AFTSurvivalRegression model persistence in SparkR

2016-04-26 Thread meng
park/tree/92f66331 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/92f66331 Branch: refs/heads/master Commit: 92f66331b4ba3634f54f57ddb5e7962b14aa4ca1 Parents: 162cf02 Author: Yanbo Liang <yblia...@gmail.com> Authored: Tue Apr 26 10:30:24 2016 -0700 Committer: Xiangrui Meng <m

spark git commit: [SPARK-14312][ML][SPARKR] NaiveBayes model persistence in SparkR

2016-04-25 Thread meng
eads/master Commit: 9cb3ba1013a7eae11be8a00fa4a9c5308bb20195 Parents: 0c47e27 Author: Yanbo Liang <yblia...@gmail.com> Authored: Mon Apr 25 14:08:41 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Mon Apr 25 14:08:41 2016 -0700 -

spark git commit: [SPARK-14479][ML] GLM supports output link prediction

2016-04-21 Thread meng
asf/spark/diff/4e726227 Branch: refs/heads/master Commit: 4e726227a3e68c776ea30b78b7db8d01d00b44d6 Parents: f25a3ea Author: Yanbo Liang <yblia...@gmail.com> Authored: Thu Apr 21 17:31:33 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Thu Apr 2

spark git commit: [SPARK-14299][EXAMPLES] Remove duplications for scala.examples.ml

2016-04-18 Thread meng
http://git-wip-us.apache.org/repos/asf/spark/diff/8c62edb7 Branch: refs/heads/master Commit: 8c62edb70fdeedf0ca5a7fc154698aea96184cc6 Parents: f31a62d Author: Xusen Yin <yinxu...@gmail.com> Authored: Mon Apr 18 13:34:36 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committe

spark git commit: [SPARK-14440][PYSPARK] Remove pipeline specific reader and writer

2016-04-18 Thread meng
..@gmail.com> Authored: Mon Apr 18 13:31:48 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Mon Apr 18 13:31:48 2016 -0700 -- python/pyspark/ml/pipeline.py | 53 +---

spark git commit: [SPARK-13925][ML][SPARKR] Expose R-like summary statistics in SparkR::glm for more family and link functions

2016-04-15 Thread meng
ache.org/repos/asf/spark/tree/83af297a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/83af297a Branch: refs/heads/master Commit: 83af297ac42546580983f91079f74e3a4cf25050 Parents: 06b9d62 Author: Yanbo Liang <yblia...@gmail.com> Authored: Fri Apr 15 08:23:51 2016 -0700 Committer: Xian

[1/2] spark git commit: [SPARK-14549][ML] Copy the Vector and Matrix classes from mllib to ml in mllib-local

2016-04-15 Thread meng
Repository: spark Updated Branches: refs/heads/master a9324a06e -> 96534aa47 http://git-wip-us.apache.org/repos/asf/spark/blob/96534aa4/mllib-local/src/test/scala/org/apache/spark/ml/SparkMLFunSuite.scala -- diff --git

[2/2] spark git commit: [SPARK-14549][ML] Copy the Vector and Matrix classes from mllib to ml in mllib-local

2016-04-15 Thread meng
ip-us.apache.org/repos/asf/spark/tree/96534aa4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/96534aa4 Branch: refs/heads/master Commit: 96534aa47c39e0ec40bc38be566455d11e21adb2 Parents: a9324a0 Author: DB Tsai <d...@netflix.com> Authored: Fri Apr 15 01:17:03 2016 -0700 Committer: Xi

spark git commit: [SPARK-14374][ML][PYSPARK] PySpark ml GBTClassifier, Regressor support export/import

2016-04-14 Thread meng
repos/asf/spark/tree/b9613239 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b9613239 Branch: refs/heads/master Commit: b9613239d303bc0f451233852c1eb1219a69875e Parents: 297ba3f Author: Yanbo Liang <yblia...@gmail.com> Authored: Thu Apr 14 21:36:03 2016 -0700 Committer: X

spark git commit: [SPARK-12869] Implemented an improved version of the toIndexedRowMatrix

2016-04-14 Thread meng
/asf/spark/tree/c80586d9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c80586d9 Branch: refs/heads/master Commit: c80586d9e820d19fc328b3e4c6f1c1439f5583a7 Parents: 01dd1f5 Author: Fokko Driesprong <f.driespr...@catawiki.nl> Authored: Thu Apr 14 17:32:20 2016 -0700 Commit

spark git commit: [SPARK-14565][ML] RandomForest should use parseInt and parseDouble for feature subset size instead of regexes

2016-04-14 Thread meng
028475cd3 Parents: d7e124e Author: Yong Tang <yong.tang.git...@outlook.com> Authored: Thu Apr 14 17:23:16 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Thu Apr 14 17:23:16 2016 -0700 -- .

spark git commit: Revert "[SPARK-14154][MLLIB] Simplify the implementation for Kolmogorov–Smirnov test"

2016-04-13 Thread meng
d11e40 Author: Xiangrui Meng <m...@databricks.com> Authored: Wed Apr 13 09:17:46 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Wed Apr 13 09:17:46 2016 -0700 -- .../mllib/stat/test/Kolmogo

spark git commit: [SPARK-14147][ML][SPARKR] SparkR predict should not output feature column

2016-04-12 Thread meng
4:40 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Tue Apr 12 11:34:40 2016 -0700 -- .../org/apache/spark/ml/r/AFTSurvivalRegressionWrapper.scala | 2 +- .../src/main/scala/org/apache/spark/ml/r/NaiveB

spark git commit: [SPARK-14563][ML] use a random table name instead of __THIS__ in SQLTransformer

2016-04-12 Thread meng
est for `transformSchema`. The problems of using `__THIS__` are: * It doesn't work under HiveContext (in Spark 1.6) * Race conditions ## How was this patch tested? * Manual test with HiveContext. * Added a unit test for `transformSchema` to improve coverage. cc: yhuai Author: Xiangrui Meng

spark git commit: [SPARK-14563][ML] use a random table name instead of __THIS__ in SQLTransformer

2016-04-12 Thread meng
est for `transformSchema`. The problems of using `__THIS__` are: * It doesn't work under HiveContext (in Spark 1.6) * Race conditions ## How was this patch tested? * Manual test with HiveContext. * Added a unit test for `transformSchema` to improve coverage. cc: yhuai Author: Xiangrui Meng

<    1   2   3   4   5   6   7   8   9   10   >