spark git commit: [SPARK-8168] [MLLIB] Add Python friendly constructor to PipelineModel

2015-06-08 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master f3eec92ce - 82870d507 [SPARK-8168] [MLLIB] Add Python friendly constructor to PipelineModel This makes the constructor callable in Python. dbtsai Author: Xiangrui Meng m...@databricks.com Closes #6709 from mengxr/SPARK-8168 and squashes

spark git commit: [SPARK-7888] Be able to disable intercept in linear regression in ml package

2015-06-23 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 6f4cadf5e - 2bdd0 [SPARK-7888] Be able to disable intercept in linear regression in ml package Author: Holden Karau hol...@pigscanfly.ca Closes #6927 from

spark git commit: [SPARK-8613] [ML] [TRIVIAL] add param to disable linear feature scaling

2015-06-26 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 9fed6abfd - c9e05a315 [SPARK-8613] [ML] [TRIVIAL] add param to disable linear feature scaling Add a param to disable linear feature scaling (to be implemented later in linear logistic regression). Done as a seperate PR so we can use same

spark git commit: [SPARK-8314][MLlib] improvement in performance of MLUtils.appendBias

2015-06-12 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master e9471d341 - 6e9c3ff1e [SPARK-8314][MLlib] improvement in performance of MLUtils.appendBias MLUtils.appendBias method is heavily used in creating intercepts for linear models. This method uses Breeze's vector concatenation which is very

spark git commit: SPARK-8916 [Documentation, MLlib] Add @since tags to mllib.regression

2015-08-17 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-1.5 eaeebb92f - f5ed9ede9 SPARK-8916 [Documentation, MLlib] Add @since tags to mllib.regression Added since tags to mllib.regression Author: Prayag Chandran prayagchand...@gmail.com Closes #7518 from prayagchandran/sinceTags and squashes

spark git commit: SPARK-8916 [Documentation, MLlib] Add @since tags to mllib.regression

2015-08-17 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 0076e8212 - 18523c130 SPARK-8916 [Documentation, MLlib] Add @since tags to mllib.regression Added since tags to mllib.regression Author: Prayag Chandran prayagchand...@gmail.com Closes #7518 from prayagchandran/sinceTags and squashes the

spark git commit: [SPARK-8551] [ML] Elastic net python code example

2015-06-30 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 12671dd5e - 545245741 [SPARK-8551] [ML] Elastic net python code example Author: Shuo Xiang shuoxiang...@gmail.com Closes #6946 from coderxiang/en-java-code-example and squashes the following commits: 7a4bdf8 [Shuo Xiang] address

spark git commit: [SPARK-9204][ML] Add default params test for linearyregression suite

2015-07-20 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master a3c7a3ce3 - 4d97be953 [SPARK-9204][ML] Add default params test for linearyregression suite Author: Holden Karau hol...@pigscanfly.ca Closes #7553 from holdenk/SPARK-9204-add-default-params-test-to-linear-regression and squashes the

spark git commit: [SPARK-11332] [ML] Refactored to use ml.feature.Instance instead of WeightedLeastSquare.Instance

2015-10-28 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 82c1c5772 -> 5f1cee6f1 [SPARK-11332] [ML] Refactored to use ml.feature.Instance instead of WeightedLeastSquare.Instance WeightedLeastSquares now uses the common Instance class in ml.feature instead of a private one. Author: Nakul Jindal

spark git commit: [SPARK-11385] [ML] foreachActive made public in MLLib's vector API

2015-10-30 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master e8ec2a7b0 -> 69b9e4b3c [SPARK-11385] [ML] foreachActive made public in MLLib's vector API Made foreachActive public in MLLib's vector API Author: Nakul Jindal Closes #9362 from

spark git commit: [SPARK-9722] [ML] Pass random seed to spark.ml DecisionTree*

2015-11-01 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 3e770a64a -> e963070c1 [SPARK-9722] [ML] Pass random seed to spark.ml DecisionTree* Author: Yu ISHIKAWA Closes #9402 from yu-iskw/SPARK-9722. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-10592] [ML] [PySpark] Deprecate weights and use coefficients instead in ML models

2015-11-02 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master ec03866a7 -> c020f7d9d [SPARK-10592] [ML] [PySpark] Deprecate weights and use coefficients instead in ML models Deprecated in `LogisticRegression` and `LinearRegression` Author: vectorijk Closes #9311 from

spark git commit: [SPARK-11432][GRAPHX] Personalized PageRank shouldn't use uniform initialization

2015-11-02 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 9cb5c731d -> efaa4721b [SPARK-11432][GRAPHX] Personalized PageRank shouldn't use uniform initialization Changes the personalized pagerank initialization to be non-uniform. Author: Yves Raimond Closes #9386 from

spark git commit: [MINOR][ML] removed the old `getModelWeights` function

2015-11-02 Thread dbtsai
9426 from dbtsai/feature-minor. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/21ad8462 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/21ad8462 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/21ad8462 Branch: r

spark git commit: [SPARK-10668] [ML] Use WeightedLeastSquares in LinearRegression with L…

2015-10-19 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master dfa41e63b -> 4c33a34ba [SPARK-10668] [ML] Use WeightedLeastSquares in LinearRegression with L… …2 regularization if the number of features is small Author: lewuathe Author: Lewuathe Author:

spark git commit: [SPARK-8700][ML] Disable feature scaling in Logistic Regression

2015-07-08 Thread dbtsai
jkbradley Author: DB Tsai d...@netflix.com Closes #7080 from dbtsai/lors and squashes the following commits: 877e6c7 [DB Tsai] repahse the doc 7cf45f2 [DB Tsai] address feedback 78d75c9 [DB Tsai] small change c2c9e60 [DB Tsai] style 6e1a8e0 [DB Tsai] first commit Project: http://git-wip-us.apache.org

spark git commit: [SPARK-8913] [ML] Simplify LogisticRegression suite to use Vector Vector comparision

2015-07-09 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 0e78e40c0 - 272730466 [SPARK-8913] [ML] Simplify LogisticRegression suite to use Vector Vector comparision Cleanup tests from SPARK 8700. Author: Holden Karau hol...@pigscanfly.ca Closes #7335 from

spark git commit: [SPARK-8963][ML] cleanup tests in linear regression suite

2015-07-09 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 691653303 - e29ce319f [SPARK-8963][ML] cleanup tests in linear regression suite Simplify model weight assertions to use vector comparision, switch to using absTol when comparing with 0.0 intercepts Author: Holden Karau

spark git commit: [SPARK-10238] [MLLIB] update since versions in mllib.linalg

2015-08-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 8668ead2e - ab431f8a9 [SPARK-10238] [MLLIB] update since versions in mllib.linalg Same as #8421 but for `mllib.linalg`. cc dbtsai Author: Xiangrui Meng m...@databricks.com Closes #8440 from mengxr/SPARK-10238 and squashes the following

spark git commit: [SPARK-10238] [MLLIB] update since versions in mllib.linalg

2015-08-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-1.5 af98e51f2 - 46750b912 [SPARK-10238] [MLLIB] update since versions in mllib.linalg Same as #8421 but for `mllib.linalg`. cc dbtsai Author: Xiangrui Meng m...@databricks.com Closes #8440 from mengxr/SPARK-10238 and squashes

spark git commit: [SPARK-10235] [MLLIB] update since versions in mllib.regression

2015-08-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master fb7e12fe2 - 4657fa1f3 [SPARK-10235] [MLLIB] update since versions in mllib.regression Same as #8421 but for `mllib.regression`. cc freeman-lab dbtsai Author: Xiangrui Meng m...@databricks.com Closes #8426 from mengxr/SPARK-10235

spark git commit: [SPARK-10235] [MLLIB] update since versions in mllib.regression

2015-08-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-1.5 6d8ebc801 - 08d390f45 [SPARK-10235] [MLLIB] update since versions in mllib.regression Same as #8421 but for `mllib.regression`. cc freeman-lab dbtsai Author: Xiangrui Meng m...@databricks.com Closes #8426 from mengxr/SPARK-10235

spark git commit: [SPARK-10236] [MLLIB] update since versions in mllib.feature

2015-08-26 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-1.5 08d390f45 - 21a10a86d [SPARK-10236] [MLLIB] update since versions in mllib.feature Same as #8421 but for `mllib.feature`. cc dbtsai Author: Xiangrui Meng m...@databricks.com Closes #8449 from mengxr/SPARK-10236.feature and squashes

spark git commit: [SPARK-10236] [MLLIB] update since versions in mllib.feature

2015-08-26 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 4657fa1f3 - 321d77596 [SPARK-10236] [MLLIB] update since versions in mllib.feature Same as #8421 but for `mllib.feature`. cc dbtsai Author: Xiangrui Meng m...@databricks.com Closes #8449 from mengxr/SPARK-10236.feature and squashes

spark git commit: [SPARK-10231] [MLLIB] update @Since annotation for mllib.classification

2015-08-25 Thread dbtsai
in constructors 2. correct some versions 3. remove `Since` on `toString` MechCoder dbtsai Author: Xiangrui Meng m...@databricks.com Closes #8421 from mengxr/SPARK-10231 and squashes the following commits: b2dce80 [Xiangrui Meng] update @Since annotation for mllib.classification Project: http://git-wip

spark git commit: [SPARK-10239] [SPARK-10244] [MLLIB] update since versions in mllib.pmml and mllib.util

2015-08-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-1.5 055387c08 - 6f05b7aeb [SPARK-10239] [SPARK-10244] [MLLIB] update since versions in mllib.pmml and mllib.util Same as #8421 but for `mllib.pmml` and `mllib.util`. cc dbtsai Author: Xiangrui Meng m...@databricks.com Closes #8430 from

spark git commit: [SPARK-10239] [SPARK-10244] [MLLIB] update since versions in mllib.pmml and mllib.util

2015-08-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 920590787 - 00ae4be97 [SPARK-10239] [SPARK-10244] [MLLIB] update since versions in mllib.pmml and mllib.util Same as #8421 but for `mllib.pmml` and `mllib.util`. cc dbtsai Author: Xiangrui Meng m...@databricks.com Closes #8430 from

spark git commit: [SPARK-9642] [ML] LinearRegression should supported weighted data

2015-09-21 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master ca9fe540f -> 331f0b10f [SPARK-9642] [ML] LinearRegression should supported weighted data In many modeling application, data points are not necessarily sampled with equal probabilities. Linear regression should support weighting which

spark git commit: [SPARK-12732][ML] bug fix in linear regression train

2016-02-02 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 99a6e3c1e -> 055714661 [SPARK-12732][ML] bug fix in linear regression train Fixed the bug in linear regression train for the case when the target variable is constant. The two cases for `fitIntercept=true` or `fitIntercept=false` should

spark git commit: [SPARK-13545][MLLIB][PYSPARK] Make MLlib LogisticRegressionWithLBFGS's default parameters consistent in Scala and Python

2016-02-29 Thread dbtsai
ion. We should update the API doc to clarifying ```numCorrections``` will have no effect if we fall into that route. * Make a pass for all parameters of ```LogisticRegressionWithLBFGS```, others are set properly. cc mengxr dbtsai ## How was this patch tested? No new tests, it should pass all curr

spark git commit: [SPARK-12804][ML] Fix LogisticRegression with FitIntercept on all same label training data

2016-01-19 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master b122c861c -> 2388de519 [SPARK-12804][ML] Fix LogisticRegression with FitIntercept on all same label training data CC jkbradley mengxr dbtsai Author: Feynman Liang <feynman.li...@gmail.com> Closes #10743 from feynmanliang/SP

spark git commit: [SPARK-12908][ML] Add warning message for LogisticRegression for potential converge issue

2016-01-21 Thread dbtsai
n't support this case, and will just exit. GLM can train, but will have a warning message saying the algorithm doesn't converge. Author: DB Tsai <d...@netflix.com> Closes #10862 from dbtsai/add-tests. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.a

spark git commit: [SPARK-7780][MLLIB] intercept in logisticregressionwith lbfgs should not be regularized

2016-01-26 Thread dbtsai
wed at https://github.com/apache/spark/pull/6386#issuecomment-168781424 re-opening for dbtsai to review. Author: Holden Karau <hol...@us.ibm.com> Author: Holden Karau <hol...@pigscanfly.ca> Closes #10788 from holdenk/SPARK-7780-intercept-in-logisticregressionwithLBFGS-should-not

spark git commit: [SPARK-14498][ML][PYTHON][SQL] Many cleanups to ML and ML-related docs

2016-04-08 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 813e96e6f -> d7af736b2 [SPARK-14498][ML][PYTHON][SQL] Many cleanups to ML and ML-related docs ## What changes were proposed in this pull request? Cleanups to documentation. No changes to code. * GBT docs: Move Scala doc for private

spark git commit: [SPARK-13927][MLLIB] add row/column iterator to local matrices

2016-03-20 Thread dbtsai
Matrix conversion. It handles dense and sparse matrices properly. ## How was this patch tested? Unit tests on sparse and dense matrix. cc: dbtsai Author: Xiangrui Meng <m...@databricks.com> Closes #11757 from mengxr/SPARK-13927. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit

spark git commit: [SPARK-14732][ML] spark.ml GaussianMixture should use MultivariateGaussian in mllib-local

2016-04-26 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 0c99c23b7 -> bd2c9a6d4 [SPARK-14732][ML] spark.ml GaussianMixture should use MultivariateGaussian in mllib-local ## What changes were proposed in this pull request? Before, spark.ml GaussianMixtureModel used the spark.mllib

spark git commit: [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local

2016-04-28 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 78c8aaf84 -> dae538a4d [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local ## What changes were proposed in this pull request? This PR adds `since` tag into the matrix and vector classes in

spark git commit: [SPARK-14734][ML][MLLIB] Added asML, fromML methods for all spark.mllib Vector, Matrix types

2016-04-21 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master e2b5647ab -> f25a3ea8d [SPARK-14734][ML][MLLIB] Added asML, fromML methods for all spark.mllib Vector, Matrix types ## What changes were proposed in this pull request? For maintaining wrappers around spark.mllib algorithms in spark.ml,

spark git commit: [SPARK-15411][ML] Add @since to ml.stat.MultivariateOnlineSummarizer.scala

2016-05-19 Thread dbtsai
tch tested? unit tests Author: DB Tsai <d...@netflix.com> Closes #13197 from dbtsai/cleanup. (cherry picked from commit 5255e55c843c7b67fcb2abb4284b8b1a09bd6672) Signed-off-by: DB Tsai <d...@netflix.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wi

spark git commit: [SPARK-15411][ML] Add @since to ml.stat.MultivariateOnlineSummarizer.scala

2016-05-19 Thread dbtsai
ted? unit tests Author: DB Tsai <d...@netflix.com> Closes #13197 from dbtsai/cleanup. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5255e55c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5255e55c Diff: http:

spark git commit: [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local

2016-05-19 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 8ecf7f77b -> 31f63ac25 [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local ## What changes were proposed in this pull request? This PR add `Since` annotations in `Vectors.scala` and `Matrices.scala` of

spark git commit: [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local

2016-05-19 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.0 9f2730b0c -> bd609b0b7 [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local ## What changes were proposed in this pull request? This PR add `Since` annotations in `Vectors.scala` and `Matrices.scala`

spark git commit: [SPARK-14612][ML] Consolidate the version of dependencies in mllib and mllib-local into one place

2016-04-14 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 3e27940a1 -> 9fa43a33b [SPARK-14612][ML] Consolidate the version of dependencies in mllib and mllib-local into one place ## What changes were proposed in this pull request? Move json4s, breeze dependency declaration into parent ## How

spark git commit: [SPARK-16404][ML] LeastSquaresAggregators serializes unnecessary data

2016-08-08 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master e076fb05a -> 1db1c6567 [SPARK-16404][ML] LeastSquaresAggregators serializes unnecessary data ## What changes were proposed in this pull request? Similar to `LogisticAggregator`, `LeastSquaresAggregator` used for linear regression ends up

[1/2] spark git commit: [SPARK-7159][ML] Add multiclass logistic regression to Spark ML

2016-08-18 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master b482c09fa -> 287bea130 http://git-wip-us.apache.org/repos/asf/spark/blob/287bea13/mllib/src/test/scala/org/apache/spark/ml/classification/MultinomialLogisticRegressionSuite.scala

[2/2] spark git commit: [SPARK-7159][ML] Add multiclass logistic regression to Spark ML

2016-08-18 Thread dbtsai
bles. An alternative approach to the problem models class conditional probabilites using the softmax function and will return uniquely identifiable coefficients (assuming regularization is applied). This second approach is used in R's glmnet and was also recommended by dbtsai. ### Separate multinomial

spark git commit: [SPARK-17090][ML] Make tree aggregation level in linear/logistic regression configurable

2016-08-20 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 9f37d4eac -> 61ef74f22 [SPARK-17090][ML] Make tree aggregation level in linear/logistic regression configurable ## What changes were proposed in this pull request? Linear/logistic regression use treeAggregate with default depth (always =

spark git commit: [SPARK-11496][GRAPHX] Parallel implementation of personalized pagerank

2016-09-10 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 335491704 -> 1fec3ce4e [SPARK-11496][GRAPHX] Parallel implementation of personalized pagerank (Updated version of [PR-9457](https://github.com/apache/spark/pull/9457), rebased on latest Spark master, and using mllib-local). This

spark git commit: [SPARK-17239][ML][DOC] Update user guide for multiclass logistic regression

2016-10-05 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 6a05eb24d -> 9df54f532 [SPARK-17239][ML][DOC] Update user guide for multiclass logistic regression ## What changes were proposed in this pull request? Updates user guide to reflect that LogisticRegression now supports multiclass. Also

spark git commit: [SPARK-11918][ML] Better error from WLS for cases like singular input

2016-09-21 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master d7ee12211 -> b4a4421b6 [SPARK-11918][ML] Better error from WLS for cases like singular input ## What changes were proposed in this pull request? Update error handling for Cholesky decomposition to provide a little more info when input is

[1/3] spark git commit: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-19 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master e719b1c04 -> 26145a5af http://git-wip-us.apache.org/repos/asf/spark/blob/26145a5a/mllib/src/test/scala/org/apache/spark/ml/classification/MultinomialLogisticRegressionSuite.scala

[2/3] spark git commit: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-19 Thread dbtsai
http://git-wip-us.apache.org/repos/asf/spark/blob/26145a5a/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala -- diff --git

[3/3] spark git commit: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-19 Thread dbtsai
[SPARK-17163][ML] Unified LogisticRegression interface ## What changes were proposed in this pull request? Merge `MultinomialLogisticRegression` into `LogisticRegression` and remove `MultinomialLogisticRegression`. Marked as WIP because we should discuss the coefficients API in the model. See

spark git commit: [SPARK-17207][MLLIB] fix comparing Vector bug in TestingUtils

2016-08-26 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 9812f7d53 -> c0949dc94 [SPARK-17207][MLLIB] fix comparing Vector bug in TestingUtils ## What changes were proposed in this pull request? fix comparing Vector bug in TestingUtils. There is the same bug for Matrix comparing. How to check

spark git commit: [SPARK-17941][ML][TEST] Logistic regression tests should use sample weights.

2016-10-14 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 05800b4b4 -> de1c1ca5c [SPARK-17941][ML][TEST] Logistic regression tests should use sample weights. ## What changes were proposed in this pull request? The sample weight testing for logistic regressions is not robust. Logistic regression

spark git commit: [SPARK-18060][ML] Avoid unnecessary computation for MLOR

2016-11-11 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master ba23f768f -> 46b2550bc [SPARK-18060][ML] Avoid unnecessary computation for MLOR ## What changes were proposed in this pull request? Before this patch, the gradient updates for multinomial logistic regression were computed by an outer

spark git commit: [SPARK-18060][ML] Avoid unnecessary computation for MLOR

2016-11-11 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.1 c2ebda443 -> 56859c029 [SPARK-18060][ML] Avoid unnecessary computation for MLOR ## What changes were proposed in this pull request? Before this patch, the gradient updates for multinomial logistic regression were computed by an outer

spark git commit: [SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for coefficients in LogisticRegression training

2016-11-19 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master ea77c81ec -> 856e00420 [SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for coefficients in LogisticRegression training ## What changes were proposed in this pull request? This is a follow up to some of the discussion

spark git commit: [SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for coefficients in LogisticRegression training

2016-11-19 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.1 15ad3a319 -> 15eb86c29 [SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for coefficients in LogisticRegression training ## What changes were proposed in this pull request? This is a follow up to some of the discussion

spark git commit: [SPARK-11496][GRAPHX][FOLLOWUP] Add param checking for runParallelPersonalizedPageRank

2016-11-14 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 89d1fa58d -> 75934457d [SPARK-11496][GRAPHX][FOLLOWUP] Add param checking for runParallelPersonalizedPageRank ## What changes were proposed in this pull request? add the param checking to keep in line with other algos ## How was this

spark git commit: [SPARK-11496][GRAPHX][FOLLOWUP] Add param checking for runParallelPersonalizedPageRank

2016-11-14 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.1 db691f05c -> cff7a70b5 [SPARK-11496][GRAPHX][FOLLOWUP] Add param checking for runParallelPersonalizedPageRank ## What changes were proposed in this pull request? add the param checking to keep in line with other algos ## How was this

spark git commit: [SPARK-17471][ML] Add compressed method to ML matrices

2017-03-24 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 707e50183 -> e8810b73c [SPARK-17471][ML] Add compressed method to ML matrices ## What changes were proposed in this pull request? This patch adds a `compressed` method to ML `Matrix` class, which returns the minimal storage

spark git commit: [SPARK-17137][ML][WIP] Compress logistic regression coefficients

2017-03-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master e8ddb91c7 -> be85245a9 [SPARK-17137][ML][WIP] Compress logistic regression coefficients ## What changes were proposed in this pull request? Use the new `compressed` method on matrices to store the logistic regression coefficients as

spark git commit: [SPARK-18555][SQL] DataFrameNaFunctions.fill miss up original values in long integers

2017-04-10 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.0 87be9652b -> 735e2039a [SPARK-18555][SQL] DataFrameNaFunctions.fill miss up original values in long integers ## What changes were proposed in this pull request? DataSet.na.fill(0) used on a DataSet which has a long value column,

spark git commit: [SPARK-18555][SQL] DataFrameNaFunctions.fill miss up original values in long integers

2017-04-10 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.1 489c1f357 -> b26f2c2c6 [SPARK-18555][SQL] DataFrameNaFunctions.fill miss up original values in long integers ## What changes were proposed in this pull request? DataSet.na.fill(0) used on a DataSet which has a long value column,

spark git commit: [SPARK-20270][SQL] na.fill should not change the values in long or integer when the default value is in double

2017-04-10 Thread dbtsai
rectly without changing the original Long values and also avoids extra cost of unnecessary casting. ## How was this patch tested? unit test added. +cc srowen rxin cloud-fan gatorsmile Thanks. Author: DB Tsai <d...@netflix.com> Closes #17577 from dbtsai/fixnafill

spark git commit: [SPARK-20270][SQL] na.fill should not change the values in long or integer when the default value is in double

2017-04-10 Thread dbtsai
rectly without changing the original Long values and also avoids extra cost of unnecessary casting. ## How was this patch tested? unit test added. +cc srowen rxin cloud-fan gatorsmile Thanks. Author: DB Tsai <d...@netflix.com> Closes #17577 from dbtsai/fixnafill

spark git commit: [SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType)

2017-04-12 Thread dbtsai
sts. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: DB Tsai <d...@netflix.com> Closes #17606 from dbtsai/fixNaNvl. (cherry picked from commit 8ad63ee158815de57bf03cdf25aef312095f) Signed-off-by: DB Tsai <dbt...@dbtsai.com> Project:

spark git commit: [SPARK-20270][SQL] na.fill should not change the values in long or integer when the default value is in double

2017-04-09 Thread dbtsai
rectly without changing the original Long values and also avoids extra cost of unnecessary casting. ## How was this patch tested? unit test added. +cc srowen rxin cloud-fan gatorsmile Thanks. Author: DB Tsai <d...@netflix.com> Closes #17577 from dbtsai/fixnafill. Project: http://git-wip-

spark git commit: [MINOR][SQL] Fix the @since tag when backporting SPARK-18555 from 2.2 branch into 2.0 branch

2017-04-10 Thread dbtsai
555) from 2.2 branch into 2.0 branch. ## How was this patch tested? N/A Please review http://spark.apache.org/contributing.html before opening a pull request. Author: DB Tsai <dbt...@dbtsai.com> Closes #17601 from dbtsai/branch-2.0. Project: http://git-wip-us.apache.org/repos/asf/spark/re

spark git commit: [SPARK-18555][MINOR][SQL] Fix the @since tag when backporting from 2.2 branch into 2.1 branch

2017-04-10 Thread dbtsai
555) from 2.2 branch into 2.1 branch. ## How was this patch tested? N/A Please review http://spark.apache.org/contributing.html before opening a pull request. Author: DB Tsai <dbt...@dbtsai.com> Closes #17600 from dbtsai/branch-2.1. Project: http://git-wip-us.apache.org/repos/asf/s

spark git commit: [SPARK-20423][ML] fix MLOR coeffs centering when reg == 0

2017-04-21 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master a750a5959 -> eb00378f0 [SPARK-20423][ML] fix MLOR coeffs centering when reg == 0 ## What changes were proposed in this pull request? When reg == 0, MLOR has multiple solutions and we need to centralize the coeffs to get identical result.

spark git commit: [SPARK-20423][ML] fix MLOR coeffs centering when reg == 0

2017-04-21 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.2 adaa3f7e0 -> ff1f989f2 [SPARK-20423][ML] fix MLOR coeffs centering when reg == 0 ## What changes were proposed in this pull request? When reg == 0, MLOR has multiple solutions and we need to centralize the coeffs to get identical

spark git commit: [SPARK-19746][ML] Faster indexing for logistic aggregator

2017-02-27 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 8a5a58506 -> 16d8472f7 [SPARK-19746][ML] Faster indexing for logistic aggregator ## What changes were proposed in this pull request? JIRA: [SPARK-19746](https://issues.apache.org/jira/browse/SPARK-19746) The following code is

spark git commit: [SPARK-20449][ML] Upgrade breeze version to 0.13.1

2017-04-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 387565cf1 -> 67eef47ac [SPARK-20449][ML] Upgrade breeze version to 0.13.1 ## What changes were proposed in this pull request? Upgrade breeze version to 0.13.1, which fixed some critical bugs of L-BFGS-B. ## How was this patch tested?

spark git commit: [SPARK-20449][ML] Upgrade breeze version to 0.13.1

2017-04-25 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.2 e2591c6d7 -> 55834a898 [SPARK-20449][ML] Upgrade breeze version to 0.13.1 ## What changes were proposed in this pull request? Upgrade breeze version to 0.13.1, which fixed some critical bugs of L-BFGS-B. ## How was this patch tested?

spark git commit: [SPARK-20483][MINOR] Test for Mesos Coarse mode may starve other Mesos frameworks

2017-04-27 Thread dbtsai
sor of spark.cores.max This tests the change in #17786 ## How was this patch tested? Ran the existing test suite with the new tests dbtsai Author: Davis Shepherd <dsheph...@netflix.com> Closes #17788 from dgshep/add_mesos_test. (cherry picked from commit 039e32ca19d113e3be2c09171c7c921

spark git commit: [SPARK-20483][MINOR] Test for Mesos Coarse mode may starve other Mesos frameworks

2017-04-27 Thread dbtsai
sor of spark.cores.max This tests the change in #17786 ## How was this patch tested? Ran the existing test suite with the new tests dbtsai Author: Davis Shepherd <dsheph...@netflix.com> Closes #17788 from dgshep/add_mesos_test. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Com

spark git commit: [SPARK-20047][ML] Constrained Logistic Regression

2017-04-27 Thread dbtsai
Repository: spark Updated Branches: refs/heads/branch-2.2 c29c6dead -> 4512e2ae6 [SPARK-20047][ML] Constrained Logistic Regression ## What changes were proposed in this pull request? MLlib ```LogisticRegression``` should support bound constrained optimization (only for L2 regularization).

spark git commit: [SPARK-20047][ML] Constrained Logistic Regression

2017-04-27 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 039e32ca1 -> 606432a13 [SPARK-20047][ML] Constrained Logistic Regression ## What changes were proposed in this pull request? MLlib ```LogisticRegression``` should support bound constrained optimization (only for L2 regularization). Users

spark git commit: [SPARK-20483] Mesos Coarse mode may starve other Mesos frameworks

2017-04-27 Thread dbtsai
cur. dbtsai mgummelt Author: Davis Shepherd <dsheph...@netflix.com> Closes #17786 from dgshep/fix_mesos_max_cores. (cherry picked from commit 7633933e54ffb08ab9d959be5f76c26fae29d1d9) Signed-off-by: DB Tsai <dbt...@dbtsai.com> Project: http://git-wip-us.apache.org/repos/asf/spark/re

spark git commit: [SPARK-20483] Mesos Coarse mode may starve other Mesos frameworks

2017-04-27 Thread dbtsai
cur. dbtsai mgummelt Author: Davis Shepherd <dsheph...@netflix.com> Closes #17786 from dgshep/fix_mesos_max_cores. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7633933e Tree: http://git-wip-us.apache.org/repos/asf/s

spark git commit: [SPARK-24181][SQL] Better error message for writing sorted data

2018-05-09 Thread dbtsai
as this patch tested? More tests in `DataFrameReaderWriterSuite.scala` Author: DB Tsai <d_t...@apple.com> Closes #21235 from dbtsai/fixException. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6ea582e3 Tree: http://git

spark git commit: [SPARK-24412][SQL] Adding docs about automagical type casting in `isin` and `isInCollection` APIs

2018-06-08 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master f433ef786 -> 36a340913 [SPARK-24412][SQL] Adding docs about automagical type casting in `isin` and `isInCollection` APIs ## What changes were proposed in this pull request? Update documentation for `isInCollection` API to clealy explain

spark git commit: [SPARK-24419][BUILD] Upgrade SBT to 0.13.17 with Scala 2.10.7 for JDK9+

2018-05-30 Thread dbtsai
ing tests Author: DB Tsai Closes #21458 from dbtsai/sbt. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9e7bad0e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9e7bad0e Diff: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-24371][SQL] Added isInCollection in DataFrame API for Scala and Java.

2018-05-29 Thread dbtsai
;"".stripMargin ``` ## How was this patch tested? Several unit tests are added. Author: DB Tsai Closes #21416 from dbtsai/optimize-set. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/900bc1f7 Tree: http:/

spark git commit: [SPARK-25009][CORE] Standalone Cluster mode application submit is not working

2018-08-03 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master ebf33a333 -> 53ca9755d [SPARK-25009][CORE] Standalone Cluster mode application submit is not working ## What changes were proposed in this pull request? It seems 'doRunMain()' has been removed accidentally by other PR and due to that the

spark git commit: [SPARK-24993][SQL] Make Avro Fast Again

2018-08-03 Thread dbtsai
) = (t2 - t1) / 1000.0 i += 1 } spark.sparkContext.parallelize(writeTimes.slice(50, 150)).toDF("writeTimes").describe("writeTimes").show() spark.sparkContext.parallelize(readTimes.slice(50, 150)).toDF("readTimes").describe("readTimes").show() ``` #

spark git commit: [SPARK-22974][ML] Attach attributes to output column of CountVectorModel

2018-08-13 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master ab197308a -> 3eb52092b [SPARK-22974][ML] Attach attributes to output column of CountVectorModel ## What changes were proposed in this pull request? The output column from `CountVectorModel` lacks attribute. So a later transformer like

spark git commit: [SPARK-25115][CORE] Eliminate extra memory copy done when a ByteBuf is used that is backed by > 1 ByteBuffer.

2018-08-14 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 3c614d056 -> 92fd7f321 [SPARK-25115][CORE] Eliminate extra memory copy done when a ByteBuf is used that is backed by > 1 ByteBuffer. …d by > 1 ByteBuffer. ## What changes were proposed in this pull request? Check how many ByteBuffer

spark git commit: [SPARK-24420][BUILD][FOLLOW-UP] Upgrade ASM6 APIs

2018-08-12 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 5d6abad36 -> a9928277d [SPARK-24420][BUILD][FOLLOW-UP] Upgrade ASM6 APIs ## What changes were proposed in this pull request? Use ASM 6 APIs after we upgrading it to ASM6. ## How was this patch tested? N/A Closes #22082 from

spark git commit: [SPARK-24855][SQL][EXTERNAL] Built-in AVRO support should support specified schema on write

2018-08-09 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master bdd27961c -> 0cea9e3cd [SPARK-24855][SQL][EXTERNAL] Built-in AVRO support should support specified schema on write ## What changes were proposed in this pull request? Allows `avroSchema` option to be specified on write, allowing a user

spark git commit: [SPARK-25104][SQL] Avro: Validate user specified output schema

2018-08-13 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master c220cc42a -> ab197308a [SPARK-25104][SQL] Avro: Validate user specified output schema ## What changes were proposed in this pull request? With code changes in https://github.com/apache/spark/pull/21847 , Spark can write out to Avro file

spark git commit: [SPARK-23042][ML] Use OneHotEncoderModel to encode labels in MultilayerPerceptronClassifier

2018-08-17 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 162326c0e -> 8b0e94d89 [SPARK-23042][ML] Use OneHotEncoderModel to encode labels in MultilayerPerceptronClassifier ## What changes were proposed in this pull request? In MultilayerPerceptronClassifier, we use RDD operation to encode

spark git commit: [SPARK-25235][SHELL] Merge the REPL code in Scala 2.11 and 2.12 branches

2018-08-28 Thread dbtsai
tch tested? Existing tests. Closes #22246 from dbtsai/repl. Lead-authored-by: DB Tsai Co-authored-by: Liang-Chi Hsieh Signed-off-by: DB Tsai Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ff8dcc1d Tree: http://git-

spark git commit: [SPARK-24785][SHELL] Making sure REPL prints Spark UI info and then Welcome message

2018-08-22 Thread dbtsai
ble is private, so reflection has to be used which is not desirable. We can use this PR to brainstorm how to handle it properly and how Scala can change their APIs to fit our need. ## How was this patch tested? Existing test Closes #21749 from dbtsai/repl-followup. Authored-by: DB Tsai Signed-

spark git commit: [SPARK-24402][SQL] Optimize `In` expression when only one element in the collection or collection is empty

2018-07-17 Thread dbtsai
) null else false)`** can be optimized to false. ## How was this patch tested? Couple new tests are added. Author: DB Tsai Closes #21797 from dbtsai/optimize-in. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/681845fd Tree: ht

spark git commit: [SPARK-24908][R][STYLE] removing spaces to make lintr happy

2018-07-24 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master fc21f192a -> 3efdf3532 [SPARK-24908][R][STYLE] removing spaces to make lintr happy ## What changes were proposed in this pull request? during my travails in porting spark builds to run on our centos worker, i managed to recreate (as best

spark git commit: [SPARK-24411][SQL] Adding native Java tests for 'isInCollection'

2018-08-30 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 135ff16a3 -> c685b5f56 [SPARK-24411][SQL] Adding native Java tests for 'isInCollection' ## What changes were proposed in this pull request? `JavaColumnExpressionSuite.java` was added and

spark git commit: [SPARK-24420][BUILD] Upgrade ASM to 6.1 to support JDK9+

2018-07-03 Thread dbtsai
sai Closes #21459 from dbtsai/asm. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5585c576 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5585c576 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5585c576 Bra

  1   2   3   >