spark git commit: [SPARK-24785][SHELL] Making sure REPL prints Spark UI info and then Welcome message

2018-08-22 Thread dbtsai
e variable is private, so reflection has to be used which is not desirable. We can use this PR to brainstorm how to handle it properly and how Scala can change their APIs to fit our need. ## How was this patch tested? Existing test Closes #21749 from dbtsai/repl-followup. Authored-by: DB Tsa

spark git commit: [SPARK-23042][ML] Use OneHotEncoderModel to encode labels in MultilayerPerceptronClassifier

2018-08-17 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 162326c0e -> 8b0e94d89 [SPARK-23042][ML] Use OneHotEncoderModel to encode labels in MultilayerPerceptronClassifier ## What changes were proposed in this pull request? In MultilayerPerceptronClassifier, we use RDD operation to encode label

spark git commit: [SPARK-25115][CORE] Eliminate extra memory copy done when a ByteBuf is used that is backed by > 1 ByteBuffer.

2018-08-14 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 3c614d056 -> 92fd7f321 [SPARK-25115][CORE] Eliminate extra memory copy done when a ByteBuf is used that is backed by > 1 ByteBuffer. …d by > 1 ByteBuffer. ## What changes were proposed in this pull request? Check how many ByteBuffer ar

spark git commit: [SPARK-22974][ML] Attach attributes to output column of CountVectorModel

2018-08-13 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master ab197308a -> 3eb52092b [SPARK-22974][ML] Attach attributes to output column of CountVectorModel ## What changes were proposed in this pull request? The output column from `CountVectorModel` lacks attribute. So a later transformer like `In

spark git commit: [SPARK-25104][SQL] Avro: Validate user specified output schema

2018-08-13 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master c220cc42a -> ab197308a [SPARK-25104][SQL] Avro: Validate user specified output schema ## What changes were proposed in this pull request? With code changes in https://github.com/apache/spark/pull/21847 , Spark can write out to Avro file a

spark git commit: [SPARK-24420][BUILD][FOLLOW-UP] Upgrade ASM6 APIs

2018-08-12 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 5d6abad36 -> a9928277d [SPARK-24420][BUILD][FOLLOW-UP] Upgrade ASM6 APIs ## What changes were proposed in this pull request? Use ASM 6 APIs after we upgrading it to ASM6. ## How was this patch tested? N/A Closes #22082 from gatorsmile/asm

spark git commit: [SPARK-24855][SQL][EXTERNAL] Built-in AVRO support should support specified schema on write

2018-08-09 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master bdd27961c -> 0cea9e3cd [SPARK-24855][SQL][EXTERNAL] Built-in AVRO support should support specified schema on write ## What changes were proposed in this pull request? Allows `avroSchema` option to be specified on write, allowing a user to

spark git commit: [SPARK-24993][SQL] Make Avro Fast Again

2018-08-03 Thread dbtsai
) = (t2 - t1) / 1000.0 i += 1 } spark.sparkContext.parallelize(writeTimes.slice(50, 150)).toDF("writeTimes").describe("writeTimes").show() spark.sparkContext.parallelize(readTimes.slice(50, 150)).toDF("readTimes").describe("readTimes").show() ``` #

spark git commit: [SPARK-25009][CORE] Standalone Cluster mode application submit is not working

2018-08-03 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master ebf33a333 -> 53ca9755d [SPARK-25009][CORE] Standalone Cluster mode application submit is not working ## What changes were proposed in this pull request? It seems 'doRunMain()' has been removed accidentally by other PR and due to that the

spark git commit: [SPARK-24908][R][STYLE] removing spaces to make lintr happy

2018-07-24 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master fc21f192a -> 3efdf3532 [SPARK-24908][R][STYLE] removing spaces to make lintr happy ## What changes were proposed in this pull request? during my travails in porting spark builds to run on our centos worker, i managed to recreate (as best

spark git commit: [SPARK-24402][SQL] Optimize `In` expression when only one element in the collection or collection is empty

2018-07-17 Thread dbtsai
) null else false)`** can be optimized to false. ## How was this patch tested? Couple new tests are added. Author: DB Tsai Closes #21797 from dbtsai/optimize-in. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/681845fd Tree: ht

spark git commit: [SPARK-24420][BUILD] Upgrade ASM to 6.1 to support JDK9+

2018-07-03 Thread dbtsai
sai Closes #21459 from dbtsai/asm. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5585c576 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5585c576 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5585c576 Bra

spark git commit: [SPARK-24412][SQL] Adding docs about automagical type casting in `isin` and `isInCollection` APIs

2018-06-08 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master f433ef786 -> 36a340913 [SPARK-24412][SQL] Adding docs about automagical type casting in `isin` and `isInCollection` APIs ## What changes were proposed in this pull request? Update documentation for `isInCollection` API to clealy explain th

spark git commit: [SPARK-24419][BUILD] Upgrade SBT to 0.13.17 with Scala 2.10.7 for JDK9+

2018-05-30 Thread dbtsai
ing tests Author: DB Tsai Closes #21458 from dbtsai/sbt. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9e7bad0e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9e7bad0e Diff: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-24371][SQL] Added isInCollection in DataFrame API for Scala and Java.

2018-05-29 Thread dbtsai
;"".stripMargin ``` ## How was this patch tested? Several unit tests are added. Author: DB Tsai Closes #21416 from dbtsai/optimize-set. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/900bc1f7 Tree: http:/

spark git commit: [SPARK-24181][SQL] Better error message for writing sorted data

2018-05-09 Thread dbtsai
nstead. ## How was this patch tested? More tests in `DataFrameReaderWriterSuite.scala` Author: DB Tsai Closes #21235 from dbtsai/fixException. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6ea582e3 Tree: http://git-wip-u

spark git commit: [SPARK-11237][ML] Add pmml export for k-means in Spark ML

2018-04-23 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 770add81c -> e82cb6834 [SPARK-11237][ML] Add pmml export for k-means in Spark ML ## What changes were proposed in this pull request? Adding PMML export to Spark ML's KMeans Model. ## How was this patch tested? New unit test for Spark ML

spark git commit: [Spark-24024][ML] Fix poisson deviance calculations in GLM to handle y = 0

2018-04-23 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master afbdf4273 -> 293a0f29e [Spark-24024][ML] Fix poisson deviance calculations in GLM to handle y = 0 ## What changes were proposed in this pull request? It is reported by Spark users that the deviance calculation for poisson regression does

spark-website git commit: Update committer page

2018-04-13 Thread dbtsai
Repository: spark-website Updated Branches: refs/heads/asf-site 658467248 -> 69b595481 Update committer page Author: DB Tsai Closes #113 from dbtsai/changeAffiliation. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/sp

spark git commit: [SPARK-20047][ML] Constrained Logistic Regression

2017-04-27 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.2 c29c6dead -> 4512e2ae6 [SPARK-20047][ML] Constrained Logistic Regression ## What changes were proposed in this pull request? MLlib ```LogisticRegression``` should support bound constrained optimization (only for L2 regularization). Use

spark git commit: [SPARK-20047][ML] Constrained Logistic Regression

2017-04-27 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 039e32ca1 -> 606432a13 [SPARK-20047][ML] Constrained Logistic Regression ## What changes were proposed in this pull request? MLlib ```LogisticRegression``` should support bound constrained optimization (only for L2 regularization). Users c

spark git commit: [SPARK-20483][MINOR] Test for Mesos Coarse mode may starve other Mesos frameworks

2017-04-27 Thread dbtsai
of spark.cores.max This tests the change in #17786 ## How was this patch tested? Ran the existing test suite with the new tests dbtsai Author: Davis Shepherd Closes #17788 from dgshep/add_mesos_test. (cherry picked from commit 039e32ca19d113e3be2c09171c7c921698be7ab8) Signed-off-by: DB T

spark git commit: [SPARK-20483][MINOR] Test for Mesos Coarse mode may starve other Mesos frameworks

2017-04-27 Thread dbtsai
of spark.cores.max This tests the change in #17786 ## How was this patch tested? Ran the existing test suite with the new tests dbtsai Author: Davis Shepherd Closes #17788 from dgshep/add_mesos_test. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.

spark git commit: [SPARK-20483] Mesos Coarse mode may starve other Mesos frameworks

2017-04-27 Thread dbtsai
cur. dbtsai mgummelt Author: Davis Shepherd Closes #17786 from dgshep/fix_mesos_max_cores. (cherry picked from commit 7633933e54ffb08ab9d959be5f76c26fae29d1d9) Signed-off-by: DB Tsai Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/sp

spark git commit: [SPARK-20483] Mesos Coarse mode may starve other Mesos frameworks

2017-04-27 Thread dbtsai
cur. dbtsai mgummelt Author: Davis Shepherd Closes #17786 from dgshep/fix_mesos_max_cores. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7633933e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7633933e Diff: http://

spark git commit: [SPARK-20449][ML] Upgrade breeze version to 0.13.1

2017-04-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.2 e2591c6d7 -> 55834a898 [SPARK-20449][ML] Upgrade breeze version to 0.13.1 ## What changes were proposed in this pull request? Upgrade breeze version to 0.13.1, which fixed some critical bugs of L-BFGS-B. ## How was this patch tested? E

spark git commit: [SPARK-20449][ML] Upgrade breeze version to 0.13.1

2017-04-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 387565cf1 -> 67eef47ac [SPARK-20449][ML] Upgrade breeze version to 0.13.1 ## What changes were proposed in this pull request? Upgrade breeze version to 0.13.1, which fixed some critical bugs of L-BFGS-B. ## How was this patch tested? Exist

spark git commit: [SPARK-20423][ML] fix MLOR coeffs centering when reg == 0

2017-04-21 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.2 adaa3f7e0 -> ff1f989f2 [SPARK-20423][ML] fix MLOR coeffs centering when reg == 0 ## What changes were proposed in this pull request? When reg == 0, MLOR has multiple solutions and we need to centralize the coeffs to get identical resu

spark git commit: [SPARK-20423][ML] fix MLOR coeffs centering when reg == 0

2017-04-21 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master a750a5959 -> eb00378f0 [SPARK-20423][ML] fix MLOR coeffs centering when reg == 0 ## What changes were proposed in this pull request? When reg == 0, MLOR has multiple solutions and we need to centralize the coeffs to get identical result.

spark git commit: [SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType)

2017-04-11 Thread dbtsai
sts. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: DB Tsai Closes #17606 from dbtsai/fixNaNvl. (cherry picked from commit 8ad63ee158815de57bf03cdf25aef312095f) Signed-off-by: DB Tsai Project: http://git-wip-us.apache.org/repos/asf/spark/r

spark git commit: [SPARK-18555][MINOR][SQL] Fix the @since tag when backporting from 2.2 branch into 2.1 branch

2017-04-10 Thread dbtsai
555) from 2.2 branch into 2.1 branch. ## How was this patch tested? N/A Please review http://spark.apache.org/contributing.html before opening a pull request. Author: DB Tsai Closes #17600 from dbtsai/branch-2.1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-

spark git commit: [MINOR][SQL] Fix the @since tag when backporting SPARK-18555 from 2.2 branch into 2.0 branch

2017-04-10 Thread dbtsai
555) from 2.2 branch into 2.0 branch. ## How was this patch tested? N/A Please review http://spark.apache.org/contributing.html before opening a pull request. Author: DB Tsai Closes #17601 from dbtsai/branch-2.0. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-

spark git commit: [SPARK-20270][SQL] na.fill should not change the values in long or integer when the default value is in double

2017-04-10 Thread dbtsai
ich behaves correctly without changing the original Long values and also avoids extra cost of unnecessary casting. ## How was this patch tested? unit test added. +cc srowen rxin cloud-fan gatorsmile Thanks. Author: DB Tsai Closes #17577 from dbtsai/fixnafill. (cherry picked from commit 1a

spark git commit: [SPARK-20270][SQL] na.fill should not change the values in long or integer when the default value is in double

2017-04-10 Thread dbtsai
ich behaves correctly without changing the original Long values and also avoids extra cost of unnecessary casting. ## How was this patch tested? unit test added. +cc srowen rxin cloud-fan gatorsmile Thanks. Author: DB Tsai Closes #17577 from dbtsai/fixnafill. (cherry picked from commit 1a

spark git commit: [SPARK-18555][SQL] DataFrameNaFunctions.fill miss up original values in long integers

2017-04-10 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.0 87be9652b -> 735e2039a [SPARK-18555][SQL] DataFrameNaFunctions.fill miss up original values in long integers ## What changes were proposed in this pull request? DataSet.na.fill(0) used on a DataSet which has a long value column, it

spark git commit: [SPARK-18555][SQL] DataFrameNaFunctions.fill miss up original values in long integers

2017-04-10 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.1 489c1f357 -> b26f2c2c6 [SPARK-18555][SQL] DataFrameNaFunctions.fill miss up original values in long integers ## What changes were proposed in this pull request? DataSet.na.fill(0) used on a DataSet which has a long value column, it

spark git commit: [SPARK-20270][SQL] na.fill should not change the values in long or integer when the default value is in double

2017-04-09 Thread dbtsai
behaves correctly without changing the original Long values and also avoids extra cost of unnecessary casting. ## How was this patch tested? unit test added. +cc srowen rxin cloud-fan gatorsmile Thanks. Author: DB Tsai Closes #17577 from dbtsai/fixnafill. Project: http://git-wip-us.apache

spark git commit: [SPARK-17137][ML][WIP] Compress logistic regression coefficients

2017-03-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master e8ddb91c7 -> be85245a9 [SPARK-17137][ML][WIP] Compress logistic regression coefficients ## What changes were proposed in this pull request? Use the new `compressed` method on matrices to store the logistic regression coefficients as spars

spark git commit: [SPARK-17471][ML] Add compressed method to ML matrices

2017-03-24 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 707e50183 -> e8810b73c [SPARK-17471][ML] Add compressed method to ML matrices ## What changes were proposed in this pull request? This patch adds a `compressed` method to ML `Matrix` class, which returns the minimal storage representation

spark git commit: [SPARK-19746][ML] Faster indexing for logistic aggregator

2017-02-27 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 8a5a58506 -> 16d8472f7 [SPARK-19746][ML] Faster indexing for logistic aggregator ## What changes were proposed in this pull request? JIRA: [SPARK-19746](https://issues.apache.org/jira/browse/SPARK-19746) The following code is inefficient:

spark git commit: [SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for coefficients in LogisticRegression training

2016-11-19 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.1 15ad3a319 -> 15eb86c29 [SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for coefficients in LogisticRegression training ## What changes were proposed in this pull request? This is a follow up to some of the discussion [here](https:

spark git commit: [SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for coefficients in LogisticRegression training

2016-11-19 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master ea77c81ec -> 856e00420 [SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for coefficients in LogisticRegression training ## What changes were proposed in this pull request? This is a follow up to some of the discussion [here](https://gi

spark git commit: [SPARK-11496][GRAPHX][FOLLOWUP] Add param checking for runParallelPersonalizedPageRank

2016-11-14 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.1 db691f05c -> cff7a70b5 [SPARK-11496][GRAPHX][FOLLOWUP] Add param checking for runParallelPersonalizedPageRank ## What changes were proposed in this pull request? add the param checking to keep in line with other algos ## How was this

spark git commit: [SPARK-11496][GRAPHX][FOLLOWUP] Add param checking for runParallelPersonalizedPageRank

2016-11-14 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 89d1fa58d -> 75934457d [SPARK-11496][GRAPHX][FOLLOWUP] Add param checking for runParallelPersonalizedPageRank ## What changes were proposed in this pull request? add the param checking to keep in line with other algos ## How was this patc

spark git commit: [SPARK-18060][ML] Avoid unnecessary computation for MLOR

2016-11-11 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.1 c2ebda443 -> 56859c029 [SPARK-18060][ML] Avoid unnecessary computation for MLOR ## What changes were proposed in this pull request? Before this patch, the gradient updates for multinomial logistic regression were computed by an outer

spark git commit: [SPARK-18060][ML] Avoid unnecessary computation for MLOR

2016-11-11 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master ba23f768f -> 46b2550bc [SPARK-18060][ML] Avoid unnecessary computation for MLOR ## What changes were proposed in this pull request? Before this patch, the gradient updates for multinomial logistic regression were computed by an outer loop

spark git commit: [SPARK-17941][ML][TEST] Logistic regression tests should use sample weights.

2016-10-14 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 05800b4b4 -> de1c1ca5c [SPARK-17941][ML][TEST] Logistic regression tests should use sample weights. ## What changes were proposed in this pull request? The sample weight testing for logistic regressions is not robust. Logistic regression

spark git commit: [SPARK-17239][ML][DOC] Update user guide for multiclass logistic regression

2016-10-05 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 6a05eb24d -> 9df54f532 [SPARK-17239][ML][DOC] Update user guide for multiclass logistic regression ## What changes were proposed in this pull request? Updates user guide to reflect that LogisticRegression now supports multiclass. Also adds

spark git commit: [SPARK-17718][DOCS][MLLIB] Make loss function formulation label note clearer in MLlib docs

2016-10-03 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 7bf921276 -> 1dd68d382 [SPARK-17718][DOCS][MLLIB] Make loss function formulation label note clearer in MLlib docs ## What changes were proposed in this pull request? Move note about labels being +1/-1 in formulation only to be just under

spark git commit: [SPARK-11918][ML] Better error from WLS for cases like singular input

2016-09-21 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master d7ee12211 -> b4a4421b6 [SPARK-11918][ML] Better error from WLS for cases like singular input ## What changes were proposed in this pull request? Update error handling for Cholesky decomposition to provide a little more info when input is

[1/3] spark git commit: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-19 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master e719b1c04 -> 26145a5af http://git-wip-us.apache.org/repos/asf/spark/blob/26145a5a/mllib/src/test/scala/org/apache/spark/ml/classification/MultinomialLogisticRegressionSuite.scala --

[2/3] spark git commit: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-19 Thread dbtsai
http://git-wip-us.apache.org/repos/asf/spark/blob/26145a5a/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala -- diff --git a/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegre

[3/3] spark git commit: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-19 Thread dbtsai
[SPARK-17163][ML] Unified LogisticRegression interface ## What changes were proposed in this pull request? Merge `MultinomialLogisticRegression` into `LogisticRegression` and remove `MultinomialLogisticRegression`. Marked as WIP because we should discuss the coefficients API in the model. See

spark git commit: [SPARK-11496][GRAPHX] Parallel implementation of personalized pagerank

2016-09-10 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 335491704 -> 1fec3ce4e [SPARK-11496][GRAPHX] Parallel implementation of personalized pagerank (Updated version of [PR-9457](https://github.com/apache/spark/pull/9457), rebased on latest Spark master, and using mllib-local). This implement

spark git commit: [SPARK-17207][MLLIB] fix comparing Vector bug in TestingUtils

2016-08-26 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 9812f7d53 -> c0949dc94 [SPARK-17207][MLLIB] fix comparing Vector bug in TestingUtils ## What changes were proposed in this pull request? fix comparing Vector bug in TestingUtils. There is the same bug for Matrix comparing. How to check the

spark git commit: [SPARK-17090][ML] Make tree aggregation level in linear/logistic regression configurable

2016-08-20 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 9f37d4eac -> 61ef74f22 [SPARK-17090][ML] Make tree aggregation level in linear/logistic regression configurable ## What changes were proposed in this pull request? Linear/logistic regression use treeAggregate with default depth (always =

[2/2] spark git commit: [SPARK-7159][ML] Add multiclass logistic regression to Spark ML

2016-08-18 Thread dbtsai
bles. An alternative approach to the problem models class conditional probabilites using the softmax function and will return uniquely identifiable coefficients (assuming regularization is applied). This second approach is used in R's glmnet and was also recommended by dbtsai. ### Separate multin

[1/2] spark git commit: [SPARK-7159][ML] Add multiclass logistic regression to Spark ML

2016-08-18 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master b482c09fa -> 287bea130 http://git-wip-us.apache.org/repos/asf/spark/blob/287bea13/mllib/src/test/scala/org/apache/spark/ml/classification/MultinomialLogisticRegressionSuite.scala --

spark git commit: [SPARK-16404][ML] LeastSquaresAggregators serializes unnecessary data

2016-08-08 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master e076fb05a -> 1db1c6567 [SPARK-16404][ML] LeastSquaresAggregators serializes unnecessary data ## What changes were proposed in this pull request? Similar to `LogisticAggregator`, `LeastSquaresAggregator` used for linear regression ends up s

spark git commit: [SPARK-15411][ML] Add @since to ml.stat.MultivariateOnlineSummarizer.scala

2016-05-19 Thread dbtsai
tch tested? unit tests Author: DB Tsai Closes #13197 from dbtsai/cleanup. (cherry picked from commit 5255e55c843c7b67fcb2abb4284b8b1a09bd6672) Signed-off-by: DB Tsai Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/833dbf92 T

spark git commit: [SPARK-15411][ML] Add @since to ml.stat.MultivariateOnlineSummarizer.scala

2016-05-19 Thread dbtsai
ted? unit tests Author: DB Tsai Closes #13197 from dbtsai/cleanup. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5255e55c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5255e55c Diff: http://git-wip-us.apache.org/re

spark git commit: [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local

2016-05-19 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.0 9f2730b0c -> bd609b0b7 [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local ## What changes were proposed in this pull request? This PR add `Since` annotations in `Vectors.scala` and `Matrices.scala` of

spark git commit: [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local

2016-05-19 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 8ecf7f77b -> 31f63ac25 [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local ## What changes were proposed in this pull request? This PR add `Since` annotations in `Vectors.scala` and `Matrices.scala` of sp

spark git commit: [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local

2016-04-28 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 78c8aaf84 -> dae538a4d [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local ## What changes were proposed in this pull request? This PR adds `since` tag into the matrix and vector classes in spark-mllib-lo

spark git commit: [SPARK-14732][ML] spark.ml GaussianMixture should use MultivariateGaussian in mllib-local

2016-04-26 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 0c99c23b7 -> bd2c9a6d4 [SPARK-14732][ML] spark.ml GaussianMixture should use MultivariateGaussian in mllib-local ## What changes were proposed in this pull request? Before, spark.ml GaussianMixtureModel used the spark.mllib MultivariateGa

spark git commit: [SPARK-14734][ML][MLLIB] Added asML, fromML methods for all spark.mllib Vector, Matrix types

2016-04-21 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master e2b5647ab -> f25a3ea8d [SPARK-14734][ML][MLLIB] Added asML, fromML methods for all spark.mllib Vector, Matrix types ## What changes were proposed in this pull request? For maintaining wrappers around spark.mllib algorithms in spark.ml, it

spark git commit: [SPARK-14612][ML] Consolidate the version of dependencies in mllib and mllib-local into one place

2016-04-14 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 3e27940a1 -> 9fa43a33b [SPARK-14612][ML] Consolidate the version of dependencies in mllib and mllib-local into one place ## What changes were proposed in this pull request? Move json4s, breeze dependency declaration into parent ## How wa

spark git commit: [SPARK-14498][ML][PYTHON][SQL] Many cleanups to ML and ML-related docs

2016-04-08 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 813e96e6f -> d7af736b2 [SPARK-14498][ML][PYTHON][SQL] Many cleanups to ML and ML-related docs ## What changes were proposed in this pull request? Cleanups to documentation. No changes to code. * GBT docs: Move Scala doc for private object

spark git commit: [SPARK-13927][MLLIB] add row/column iterator to local matrices

2016-03-20 Thread dbtsai
Matrix conversion. It handles dense and sparse matrices properly. ## How was this patch tested? Unit tests on sparse and dense matrix. cc: dbtsai Author: Xiangrui Meng Closes #11757 from mengxr/SPARK-13927. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org

spark git commit: [SPARK-13545][MLLIB][PYSPARK] Make MLlib LogisticRegressionWithLBFGS's default parameters consistent in Scala and Python

2016-02-29 Thread dbtsai
entation. We should update the API doc to clarifying ```numCorrections``` will have no effect if we fall into that route. * Make a pass for all parameters of ```LogisticRegressionWithLBFGS```, others are set properly. cc mengxr dbtsai ## How was this patch tested? No new tests, it should pass all

spark git commit: [SPARK-13379][MLLIB] Fix MLlib LogisticRegressionWithLBFGS set regularization incorrectly

2016-02-21 Thread dbtsai
as: ```SquaredL2Updater``` -> ```elasticNetParam = 0.0``` ```L1Updater``` -> ```elasticNetParam = 1.0``` cc dbtsai ## How was the this patch tested? unit tests Author: Yanbo Liang Closes #11258 from yanboliang/spark-13379. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:

spark git commit: [SPARK-12732][ML] bug fix in linear regression train

2016-02-02 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 99a6e3c1e -> 055714661 [SPARK-12732][ML] bug fix in linear regression train Fixed the bug in linear regression train for the case when the target variable is constant. The two cases for `fitIntercept=true` or `fitIntercept=false` should b

spark git commit: [SPARK-7780][MLLIB] intercept in logisticregressionwith lbfgs should not be regularized

2016-01-26 Thread dbtsai
reviewed at https://github.com/apache/spark/pull/6386#issuecomment-168781424 re-opening for dbtsai to review. Author: Holden Karau Author: Holden Karau Closes #10788 from holdenk/SPARK-7780-intercept-in-logisticregressionwithLBFGS-should-not-be-regularized. Project: http://git-wip-us.ap

spark git commit: [SPARK-12908][ML] Add warning message for LogisticRegression for potential converge issue

2016-01-21 Thread dbtsai
doesn't support this case, and will just exit. GLM can train, but will have a warning message saying the algorithm doesn't converge. Author: DB Tsai Closes #10862 from dbtsai/add-tests. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.o

spark git commit: [SPARK-12804][ML] Fix LogisticRegression with FitIntercept on all same label training data

2016-01-19 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master b122c861c -> 2388de519 [SPARK-12804][ML] Fix LogisticRegression with FitIntercept on all same label training data CC jkbradley mengxr dbtsai Author: Feynman Liang Closes #10743 from feynmanliang/SPARK-12804. Project: http://git-

spark git commit: [SPARK-10991][ML] logistic regression training summary handle empty prediction col

2015-12-10 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master b1b4ee7f3 -> 518ab5101 [SPARK-10991][ML] logistic regression training summary handle empty prediction col LogisticRegression training summary should still function if the predictionCol is set to an empty string or otherwise unset (related

spark git commit: [SPARK-11432][GRAPHX] Personalized PageRank shouldn't use uniform initialization

2015-11-02 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 9cb5c731d -> efaa4721b [SPARK-11432][GRAPHX] Personalized PageRank shouldn't use uniform initialization Changes the personalized pagerank initialization to be non-uniform. Author: Yves Raimond Closes #9386 from moustaki/personalized-page

spark git commit: [MINOR][ML] removed the old `getModelWeights` function

2015-11-02 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 476f4348e -> 21ad84623 [MINOR][ML] removed the old `getModelWeights` function Removed the old `getModelWeights` function which was private and renamed into `getModelCoefficients` Author: DB Tsai Closes #9426 from dbtsai/feature-mi

spark git commit: [SPARK-10592] [ML] [PySpark] Deprecate weights and use coefficients instead in ML models

2015-11-02 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master ec03866a7 -> c020f7d9d [SPARK-10592] [ML] [PySpark] Deprecate weights and use coefficients instead in ML models Deprecated in `LogisticRegression` and `LinearRegression` Author: vectorijk Closes #9311 from vectorijk/spark-10592. Proje

spark git commit: [SPARK-9722] [ML] Pass random seed to spark.ml DecisionTree*

2015-11-01 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 3e770a64a -> e963070c1 [SPARK-9722] [ML] Pass random seed to spark.ml DecisionTree* Author: Yu ISHIKAWA Closes #9402 from yu-iskw/SPARK-9722. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.or

spark git commit: [SPARK-11385] [ML] foreachActive made public in MLLib's vector API

2015-10-30 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master e8ec2a7b0 -> 69b9e4b3c [SPARK-11385] [ML] foreachActive made public in MLLib's vector API Made foreachActive public in MLLib's vector API Author: Nakul Jindal Closes #9362 from nakul02/SPARK-11385_foreach_for_mllib_linalg_vector. Proje

spark git commit: [SPARK-11207] [ML] Add test cases for solver selection of LinearRegres…

2015-10-30 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master eb59b94c4 -> 86d65265f [SPARK-11207] [ML] Add test cases for solver selection of LinearRegres… …sion as followup. This is the follow up work of SPARK-10668. * Fix miner style issues. * Add test case for checking whether solver is selec

spark git commit: [SPARK-11332] [ML] Refactored to use ml.feature.Instance instead of WeightedLeastSquare.Instance

2015-10-28 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 82c1c5772 -> 5f1cee6f1 [SPARK-11332] [ML] Refactored to use ml.feature.Instance instead of WeightedLeastSquare.Instance WeightedLeastSquares now uses the common Instance class in ml.feature instead of a private one. Author: Nakul Jindal

spark git commit: [SPARK-10668] [ML] Use WeightedLeastSquares in LinearRegression with L…

2015-10-19 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master dfa41e63b -> 4c33a34ba [SPARK-10668] [ML] Use WeightedLeastSquares in LinearRegression with L… …2 regularization if the number of features is small Author: lewuathe Author: Lewuathe Author: Kai Sasaki Author: Lewuathe Closes #8884

spark git commit: [SPARK-9642] [ML] LinearRegression should supported weighted data

2015-09-21 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master ca9fe540f -> 331f0b10f [SPARK-9642] [ML] LinearRegression should supported weighted data In many modeling application, data points are not necessarily sampled with equal probabilities. Linear regression should support weighting which accou

spark git commit: [SPARK-10336][example] fix not being able to set intercept in LR example

2015-08-28 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-1.5 ccda27a9b -> 9c58f6441 [SPARK-10336][example] fix not being able to set intercept in LR example `fitIntercept` is a command line option but not set in the main program. dbtsai Author: Shuo Xiang Closes #8510 from coderxi

spark git commit: [SPARK-10336][example] fix not being able to set intercept in LR example

2015-08-28 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master c53c902fa -> 45723214e [SPARK-10336][example] fix not being able to set intercept in LR example `fitIntercept` is a command line option but not set in the main program. dbtsai Author: Shuo Xiang Closes #8510 from coderxiang/interc

spark git commit: [SPARK-10236] [MLLIB] update since versions in mllib.feature

2015-08-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 4657fa1f3 -> 321d77596 [SPARK-10236] [MLLIB] update since versions in mllib.feature Same as #8421 but for `mllib.feature`. cc dbtsai Author: Xiangrui Meng Closes #8449 from mengxr/SPARK-10236.feature and squashes the following comm

spark git commit: [SPARK-10236] [MLLIB] update since versions in mllib.feature

2015-08-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-1.5 08d390f45 -> 21a10a86d [SPARK-10236] [MLLIB] update since versions in mllib.feature Same as #8421 but for `mllib.feature`. cc dbtsai Author: Xiangrui Meng Closes #8449 from mengxr/SPARK-10236.feature and squashes the follow

spark git commit: [SPARK-10235] [MLLIB] update since versions in mllib.regression

2015-08-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master fb7e12fe2 -> 4657fa1f3 [SPARK-10235] [MLLIB] update since versions in mllib.regression Same as #8421 but for `mllib.regression`. cc freeman-lab dbtsai Author: Xiangrui Meng Closes #8426 from mengxr/SPARK-10235 and squashes

spark git commit: [SPARK-10235] [MLLIB] update since versions in mllib.regression

2015-08-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-1.5 6d8ebc801 -> 08d390f45 [SPARK-10235] [MLLIB] update since versions in mllib.regression Same as #8421 but for `mllib.regression`. cc freeman-lab dbtsai Author: Xiangrui Meng Closes #8426 from mengxr/SPARK-10235 and squashes

spark git commit: [SPARK-10238] [MLLIB] update since versions in mllib.linalg

2015-08-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-1.5 af98e51f2 -> 46750b912 [SPARK-10238] [MLLIB] update since versions in mllib.linalg Same as #8421 but for `mllib.linalg`. cc dbtsai Author: Xiangrui Meng Closes #8440 from mengxr/SPARK-10238 and squashes the following comm

spark git commit: [SPARK-10238] [MLLIB] update since versions in mllib.linalg

2015-08-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 8668ead2e -> ab431f8a9 [SPARK-10238] [MLLIB] update since versions in mllib.linalg Same as #8421 but for `mllib.linalg`. cc dbtsai Author: Xiangrui Meng Closes #8440 from mengxr/SPARK-10238 and squashes the following commits: b384

spark git commit: [SPARK-10239] [SPARK-10244] [MLLIB] update since versions in mllib.pmml and mllib.util

2015-08-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-1.5 055387c08 -> 6f05b7aeb [SPARK-10239] [SPARK-10244] [MLLIB] update since versions in mllib.pmml and mllib.util Same as #8421 but for `mllib.pmml` and `mllib.util`. cc dbtsai Author: Xiangrui Meng Closes #8430 from mengxr/SP

spark git commit: [SPARK-10239] [SPARK-10244] [MLLIB] update since versions in mllib.pmml and mllib.util

2015-08-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 920590787 -> 00ae4be97 [SPARK-10239] [SPARK-10244] [MLLIB] update since versions in mllib.pmml and mllib.util Same as #8421 but for `mllib.pmml` and `mllib.util`. cc dbtsai Author: Xiangrui Meng Closes #8430 from mengxr/SPARK-10

spark git commit: [SPARK-10231] [MLLIB] update @Since annotation for mllib.classification

2015-08-25 Thread dbtsai
in constructors 2. correct some versions 3. remove `Since` on `toString` MechCoder dbtsai Author: Xiangrui Meng Closes #8421 from mengxr/SPARK-10231 and squashes the following commits: b2dce80 [Xiangrui Meng] update @Since annotation for mllib.classification (cherry picked from com

spark git commit: [SPARK-10231] [MLLIB] update @Since annotation for mllib.classification

2015-08-25 Thread dbtsai
in constructors 2. correct some versions 3. remove `Since` on `toString` MechCoder dbtsai Author: Xiangrui Meng Closes #8421 from mengxr/SPARK-10231 and squashes the following commits: b2dce80 [Xiangrui Meng] update @Since annotation for mllib.classification Project: http://git-wip-us.apache.org/re

spark git commit: SPARK-8916 [Documentation, MLlib] Add @since tags to mllib.regression

2015-08-17 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-1.5 eaeebb92f -> f5ed9ede9 SPARK-8916 [Documentation, MLlib] Add @since tags to mllib.regression Added since tags to mllib.regression Author: Prayag Chandran Closes #7518 from prayagchandran/sinceTags and squashes the following commits:

spark git commit: SPARK-8916 [Documentation, MLlib] Add @since tags to mllib.regression

2015-08-17 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 0076e8212 -> 18523c130 SPARK-8916 [Documentation, MLlib] Add @since tags to mllib.regression Added since tags to mllib.regression Author: Prayag Chandran Closes #7518 from prayagchandran/sinceTags and squashes the following commits: fa4

spark git commit: [SPARK-9204][ML] Add default params test for linearyregression suite

2015-07-20 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master a3c7a3ce3 -> 4d97be953 [SPARK-9204][ML] Add default params test for linearyregression suite Author: Holden Karau Closes #7553 from holdenk/SPARK-9204-add-default-params-test-to-linear-regression and squashes the following commits: 630b

<    1   2   3   4   >