[GitHub] spark pull request #20709: [SPARK-18844][MLLIB] Adding more binary classific...
Github user sandecho closed the pull request at: https://github.com/apache/spark/pull/20709 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA a...
Github user sandecho closed the pull request at: https://github.com/apache/spark/pull/20708 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20708 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20709: [SPARK-18844][MLLIB] Adding more binary classification e...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20709 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20063: Branch 2.1
GitHub user sandecho opened a pull request: https://github.com/apache/spark/pull/20063 Branch 2.1 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20063.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20063 commit 21afc4534f90e063330ad31033aa178b37ef8340 Author: Marcelo Vanzin <vanzin@...> Date: 2017-02-22T21:19:31Z [SPARK-19652][UI] Do auth checks for REST API access (branch-2.1). The REST API has a security filter that performs auth checks based on the UI root's security manager. That works fine when the UI root is the app's UI, but not when it's the history server. In the SHS case, all users would be allowed to see all applications through the REST API, even if the UI itself wouldn't be available to them. This change adds auth checks for each app access through the API too, so that only authorized users can see the app's data. The change also modifies the existing security filter to use `HttpServletRequest.getRemoteUser()`, which is used in other places. That is not necessarily the same as the principal's name; for example, when using Hadoop's SPNEGO auth filter, the remote user strips the realm information, which then matches the user name registered as the owner of the application. I also renamed the UIRootFromServletContext trait to a more generic name since I'm using it to store more context information now. Tested manually with an authentication filter enabled. Author: Marcelo Vanzin <van...@cloudera.com> Closes #17019 from vanzin/SPARK-19652_2.1. commit d30238f1b9096c9fd85527d95be639de9388fcc7 Author: actuaryzhang <actuaryzhang10@...> Date: 2017-02-23T19:12:02Z [SPARK-19682][SPARKR] Issue warning (or error) when subset method "[[" takes vector index ## What changes were proposed in this pull request? The `[[` method is supposed to take a single index and return a column. This is different from base R which takes a vector index. We should check for this and issue warning or error when vector index is supplied (which is very likely given the behavior in base R). Currently I'm issuing a warning message and just take the first element of the vector index. We could change this to an error it that's better. ## How was this patch tested? new tests Author: actuaryzhang <actuaryzhan...@gmail.com> Closes #17017 from actuaryzhang/sparkRSubsetter. (cherry picked from commit 7bf09433f5c5e08154ba106be21fe24f17cd282b) Signed-off-by: Felix Cheung <felixche...@apache.org> commit 43084b3cc3918b720fe28053d2037fa22a71264e Author: Herman van Hovell <hvanhovell@...> Date: 2017-02-23T22:58:02Z [SPARK-19459][SQL][BRANCH-2.1] Support for nested char/varchar fields in ORC ## What changes were proposed in this pull request? This is a backport of the two following commits: https://github.com/apache/spark/commit/78eae7e67fd5dec0c2d5b1853ce86cd0f1ae & https://github.com/apache/spark/commit/de8a03e68202647555e30fffba551f65bc77608d This PR adds support for ORC tables with (nested) char/varchar fields. ## How was this patch tested? Added a regression test to `OrcSourceSuite`. Author: Herman van Hovell <hvanhov...@databricks.com> Closes #17041 from hvanhovell/SPARK-19459-branch-2.1. commit 66a7ca28a9de92e67ce24896a851a0c96c92aec6 Author: Takeshi Yamamuro <yamamuro@...> Date: 2017-02-24T09:54:00Z [SPARK-19691][SQL][BRANCH-2.1] Fix ClassCastException when calculating percentile of decimal column ## What changes were proposed in this pull request? This is a backport of the two following commits: https://github.com/apache/spark/commit/93aa4271596a30752dc5234d869c3ae2f6e8e723 This pr fixed a class-cast exception below; ``` scala> spark.range(10).selectExpr("cast (id as decimal) as x").selectExpr("percentile(x, 0.5)").collect() java.lang.ClassCastException: org.apache.spark.sql.types.Decimal cannot be cast to java.lang.Number at org.a
[GitHub] spark pull request #20063: Branch 2.1
Github user sandecho closed the pull request at: https://github.com/apache/spark/pull/20063 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20063: Branch 2.1
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20063 I want to work on Spark MLLIB Jira. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20549: Add more binary classification metrics to BinaryC...
GitHub user sandecho reopened a pull request: https://github.com/apache/spark/pull/20549 Add more binary classification metrics to BinaryClassificationMetrics ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sandecho/spark new_branch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20549.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20549 commit 9f33d677586043fe7c75ac1930c51c138f281a49 Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-08T16:49:13Z Add more binary classification metrics to BinaryClassificationMetrics commit d7144f63a99e575d5c996fd7919bdbe44266620f Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-08T17:20:52Z SPARK-18844 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20549: Add more binary classification metrics to BinaryC...
Github user sandecho closed the pull request at: https://github.com/apache/spark/pull/20549 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20549 [SPARK-18844.zip](https://github.com/apache/spark/files/1708136/SPARK-18844.zip) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20549 ok to test. Jenkins, add to whitelist. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20549: Add more binary classification metrics to BinaryC...
GitHub user sandecho opened a pull request: https://github.com/apache/spark/pull/20549 Add more binary classification metrics to BinaryClassificationMetrics ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sandecho/spark new_branch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20549.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20549 commit 9f33d677586043fe7c75ac1930c51c138f281a49 Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-08T16:49:13Z Add more binary classification metrics to BinaryClassificationMetrics --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20549: SPARK-18844[MLLIB] Add more binary classification...
GitHub user sandecho reopened a pull request: https://github.com/apache/spark/pull/20549 SPARK-18844[MLLIB] Add more binary classification metrics to BinaryClassificationMetrics ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sandecho/spark new_branch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20549.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20549 commit 9f33d677586043fe7c75ac1930c51c138f281a49 Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-08T16:49:13Z Add more binary classification metrics to BinaryClassificationMetrics commit d7144f63a99e575d5c996fd7919bdbe44266620f Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-08T17:20:52Z SPARK-18844 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20549: Add more binary classification metrics to BinaryC...
Github user sandecho closed the pull request at: https://github.com/apache/spark/pull/20549 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20549 Srowen: Will the result of the test not be posted? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20549 As a first time contributor and a novice on it, I would submit the final patch and leave it up to you. You can merge it or leave it. I will close the pull request after that, whether patch is accepted or not. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20549 Then why was the status of JIRA left open from so many days. It was supposed to be closed earlier. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20549 I have committed the changes. Can you please run the test? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20549: SPARK-18844[MLLIB] Add more binary classification...
Github user sandecho closed the pull request at: https://github.com/apache/spark/pull/20549 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20549 Can you please run the test again? [SPARK-JIRA-18844.zip](https://github.com/apache/spark/files/1724418/SPARK-JIRA-18844.zip) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20609: SPARK-18844[MLLIB] Add more binary classification metric...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20609 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20609: SPARK-18844[MLLIB] Add more binary classification...
Github user sandecho closed the pull request at: https://github.com/apache/spark/pull/20609 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20609: SPARK-18844[MLLIB] Add more binary classification...
Github user sandecho closed the pull request at: https://github.com/apache/spark/pull/20609 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20549: SPARK-18844[MLLIB] Add more binary classification...
GitHub user sandecho reopened a pull request: https://github.com/apache/spark/pull/20549 SPARK-18844[MLLIB] Add more binary classification metrics to BinaryClassificationMetrics ## What changes were proposed in this pull request? In this PR, more binary classification metrics has been added to BinaryClassificationMetrics as mentioned in SPARK-18844 ## How was this patch tested? By running existing unit test Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sandecho/spark new_branch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20549.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20549 commit 9f33d677586043fe7c75ac1930c51c138f281a49 Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-08T16:49:13Z Add more binary classification metrics to BinaryClassificationMetrics commit d7144f63a99e575d5c996fd7919bdbe44266620f Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-08T17:20:52Z SPARK-18844 commit 981a1c14892e7e458e1492b3fdb6c77bbb35a0fb Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-11T17:18:01Z SPARK JIRA 18844 commit 47e56658b83b4c2763f636ba025bdfa39a635960 Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-14T14:38:06Z SPARK JIRA 18844 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20609: SPARK-18844[MLLIB] Add more binary classification...
GitHub user sandecho opened a pull request: https://github.com/apache/spark/pull/20609 SPARK-18844[MLLIB] Add more binary classification metrics to BinaryClassificationMetrics with Examples ## What changes were proposed in this pull request? In this PR, more binary classification metrics has been added to BinaryClassificationMetrics as mentioned in SPARK-18844 ## How was this patch tested? By running existing unit test (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sandecho/spark new_branch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20609.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20609 commit 9f33d677586043fe7c75ac1930c51c138f281a49 Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-08T16:49:13Z Add more binary classification metrics to BinaryClassificationMetrics commit d7144f63a99e575d5c996fd7919bdbe44266620f Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-08T17:20:52Z SPARK-18844 commit 981a1c14892e7e458e1492b3fdb6c77bbb35a0fb Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-11T17:18:01Z SPARK JIRA 18844 commit 47e56658b83b4c2763f636ba025bdfa39a635960 Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-14T14:38:06Z SPARK JIRA 18844 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20609: SPARK-18844[MLLIB] Add more binary classification...
GitHub user sandecho reopened a pull request: https://github.com/apache/spark/pull/20609 SPARK-18844[MLLIB] Add more binary classification metrics to BinaryClassificationMetrics with Examples ## What changes were proposed in this pull request? In this PR, more binary classification metrics has been added to BinaryClassificationMetrics as mentioned in SPARK-18844 ## How was this patch tested? By running existing unit test (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sandecho/spark new_branch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20609.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20609 commit 9f33d677586043fe7c75ac1930c51c138f281a49 Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-08T16:49:13Z Add more binary classification metrics to BinaryClassificationMetrics commit d7144f63a99e575d5c996fd7919bdbe44266620f Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-08T17:20:52Z SPARK-18844 commit 981a1c14892e7e458e1492b3fdb6c77bbb35a0fb Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-11T17:18:01Z SPARK JIRA 18844 commit 47e56658b83b4c2763f636ba025bdfa39a635960 Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-02-14T14:38:06Z SPARK JIRA 18844 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20549 I will generate the patch once again and submit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20709: [SPARK-18844][MLLIB] Adding more binary classific...
GitHub user sandecho opened a pull request: https://github.com/apache/spark/pull/20709 [SPARK-18844][MLLIB] Adding more binary classification evaluation metrics ## What changes were proposed in this pull request? The following additional binary classification metrics are added. False omission rate: `forByThreshold` False discovery rate: `fdrByThreshold` Negative predictive value: `npvByThreshold` False negative rate: `fnrByThreshold` True negative rate (Specificity): `specificityByThreshold` False positive rate: `fprByThreshold` ## How was this patch tested? Unit Testing [EvaluationMetrics.zip](https://github.com/apache/spark/files/1772914/EvaluationMetrics.zip) You can merge this pull request into a Git repository by running: $ git pull https://github.com/sandecho/spark binary Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20709.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20709 commit cb5dce1565edca67a3763b7610137b48545ea998 Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-03-01T16:15:12Z Adding more binary classification evaluation metrics --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20709: [SPARK-18844][MLLIB] Adding more binary classification e...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20709 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20707: [SPARK-21209][MLLLIB] Implement Incremental PCA a...
GitHub user sandecho reopened a pull request: https://github.com/apache/spark/pull/20707 [SPARK-21209][MLLLIB] Implement Incremental PCA algorithm ## What changes were proposed in this pull request? A new feature called Incremental Principal Component Analysis Algorithm(IPCA) has been proposed. It divides the incoming data in batch size and compute the PCA of the individual batch to generate Principal Component of entire data. ## How was this patch tested? Unit Testing You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20707.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20707 commit 6bb22961c0c9df1a1f22e9491894895b297f5288 Author: Sameer Agarwal <sameerag@...> Date: 2018-01-11T23:23:17Z Preparing development version 2.3.1-SNAPSHOT commit 55695c7127cb2f357dfdf677cab4d21fc840aa3d Author: WeichenXu <weichen.xu@...> Date: 2018-01-12T00:20:30Z [SPARK-23008][ML] OnehotEncoderEstimator python API ## What changes were proposed in this pull request? OnehotEncoderEstimator python API. ## How was this patch tested? doctest Author: WeichenXu <weichen...@databricks.com> Closes #20209 from WeichenXu123/ohe_py. (cherry picked from commit b5042d75c2faa5f15bc1e160d75f06dfdd6eea37) Signed-off-by: Joseph K. Bradley <jos...@databricks.com> commit 3ae3e1bb71aa88be1c963b4416986ef679d7c8a2 Author: ho3rexqj <ho3rexqj@...> Date: 2018-01-12T07:27:00Z [SPARK-22986][CORE] Use a cache to avoid instantiating multiple instances of broadcast variable values When resources happen to be constrained on an executor the first time a broadcast variable is instantiated it is persisted to disk by the BlockManager. Consequently, every subsequent call to TorrentBroadcast::readBroadcastBlock from other instances of that broadcast variable spawns another instance of the underlying value. That is, broadcast variables are spawned once per executor **unless** memory is constrained, in which case every instance of a broadcast variable is provided with a unique copy of the underlying value. This patch fixes the above by explicitly caching the underlying values using weak references in a ReferenceMap. Author: ho3rexqj <ho3re...@gmail.com> Closes #20183 from ho3rexqj/fix/cache-broadcast-values. (cherry picked from commit cbe7c6fbf9dc2fc422b93b3644c40d449a869eea) Signed-off-by: Wenchen Fan <wenc...@databricks.com> commit d512d873b3f445845bd113272d7158388427f8a6 Author: WeichenXu <weichen.xu@...> Date: 2018-01-12T09:27:02Z [SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder python API deprecated ## What changes were proposed in this pull request? mark OneHotEncoder python API deprecated ## How was this patch tested? N/A Author: WeichenXu <weichen...@databricks.com> Closes #20241 from WeichenXu123/mark_ohe_deprecated. (cherry picked from commit a7d98d53ceaf69cabaecc6c9113f17438c4e61f6) Signed-off-by: Nick Pentreath <ni...@za.ibm.com> commit 6152da3893a05b3f8dc0f13895af9be9548e5895 Author: Marco Gaido <marcogaido91@...> Date: 2018-01-12T10:04:44Z [SPARK-23025][SQL] Support Null type in scala reflection ## What changes were proposed in this pull request? Add support for `Null` type in the `schemaFor` method for Scala reflection. ## How was this patch tested? Added UT Author: Marco Gaido <marcogaid...@gmail.com> Closes #20219 from mgaido91/SPARK-23025. (cherry picked from commit 505086806997b4331d4a8c2fc5e08345d869a23c) Signed-off-by: gatorsmile <gatorsm...@gmail.com> commit db27a93652780f234f3c5fe750ef07bc5525d177 Author: Dongjoon Hyun <dongjoon@...> Date: 2018-01-12T18:18:42Z [MINOR][BUILD] Fix Java linter errors ## What changes were proposed in this pull request? This PR cleans up the java-lint errors (for v2.3.0-rc1 tag). Hopefully, this will be the final one. ``` $ dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java:[85] (sizes) LineLength: Line is longer than 100 characters (found 101). [ERROR] src/main/java/org/apache/spark/launcher/InProcessAppHandle.java:[20,8] (imports) UnusedImports: Unused import - java.io.IOException. [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java:[41,9] (modifier) ModifierOrder: '
[GitHub] spark pull request #20707: [SPARK-21209][MLLLIB] Implement Incremental PCA a...
Github user sandecho closed the pull request at: https://github.com/apache/spark/pull/20707 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20549: SPARK-18844[MLLIB] Add more binary classification...
Github user sandecho closed the pull request at: https://github.com/apache/spark/pull/20549 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20708 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA a...
GitHub user sandecho opened a pull request: https://github.com/apache/spark/pull/20708 [SPARK-21209][MLLLIB] Implement Incremental PCA algorithm ## What changes were proposed in this pull request? A new feature called Incremental Principal Component Analysis Algorithm(IPCA) has been proposed. It divides the incoming data in batch size and compute the PCA of the individual batch to generate Principal Component of entire data. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Unit Testing [IPCA.zip](https://github.com/apache/spark/files/1772562/IPCA.zip) You can merge this pull request into a Git repository by running: $ git pull https://github.com/sandecho/spark IPCA Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20708.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20708 commit 7900d21138de542fd89763a68417d74792725afd Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...> Date: 2018-03-01T13:35:20Z Implemented Incremental PCA --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20707: [SPARK-21209][MLLLIB] Implement Incremental PCA a...
GitHub user sandecho opened a pull request: https://github.com/apache/spark/pull/20707 [SPARK-21209][MLLLIB] Implement Incremental PCA algorithm ## What changes were proposed in this pull request? A new feature called Incremental Principal Component Analysis Algorithm(IPCA) has been proposed. It divides the incoming data in batch size and compute the PCA of the individual batch to generate Principal Component of entire data. ## How was this patch tested? Unit Testing Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20707.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20707 commit 6bb22961c0c9df1a1f22e9491894895b297f5288 Author: Sameer Agarwal <sameerag@...> Date: 2018-01-11T23:23:17Z Preparing development version 2.3.1-SNAPSHOT commit 55695c7127cb2f357dfdf677cab4d21fc840aa3d Author: WeichenXu <weichen.xu@...> Date: 2018-01-12T00:20:30Z [SPARK-23008][ML] OnehotEncoderEstimator python API ## What changes were proposed in this pull request? OnehotEncoderEstimator python API. ## How was this patch tested? doctest Author: WeichenXu <weichen...@databricks.com> Closes #20209 from WeichenXu123/ohe_py. (cherry picked from commit b5042d75c2faa5f15bc1e160d75f06dfdd6eea37) Signed-off-by: Joseph K. Bradley <jos...@databricks.com> commit 3ae3e1bb71aa88be1c963b4416986ef679d7c8a2 Author: ho3rexqj <ho3rexqj@...> Date: 2018-01-12T07:27:00Z [SPARK-22986][CORE] Use a cache to avoid instantiating multiple instances of broadcast variable values When resources happen to be constrained on an executor the first time a broadcast variable is instantiated it is persisted to disk by the BlockManager. Consequently, every subsequent call to TorrentBroadcast::readBroadcastBlock from other instances of that broadcast variable spawns another instance of the underlying value. That is, broadcast variables are spawned once per executor **unless** memory is constrained, in which case every instance of a broadcast variable is provided with a unique copy of the underlying value. This patch fixes the above by explicitly caching the underlying values using weak references in a ReferenceMap. Author: ho3rexqj <ho3re...@gmail.com> Closes #20183 from ho3rexqj/fix/cache-broadcast-values. (cherry picked from commit cbe7c6fbf9dc2fc422b93b3644c40d449a869eea) Signed-off-by: Wenchen Fan <wenc...@databricks.com> commit d512d873b3f445845bd113272d7158388427f8a6 Author: WeichenXu <weichen.xu@...> Date: 2018-01-12T09:27:02Z [SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder python API deprecated ## What changes were proposed in this pull request? mark OneHotEncoder python API deprecated ## How was this patch tested? N/A Author: WeichenXu <weichen...@databricks.com> Closes #20241 from WeichenXu123/mark_ohe_deprecated. (cherry picked from commit a7d98d53ceaf69cabaecc6c9113f17438c4e61f6) Signed-off-by: Nick Pentreath <ni...@za.ibm.com> commit 6152da3893a05b3f8dc0f13895af9be9548e5895 Author: Marco Gaido <marcogaido91@...> Date: 2018-01-12T10:04:44Z [SPARK-23025][SQL] Support Null type in scala reflection ## What changes were proposed in this pull request? Add support for `Null` type in the `schemaFor` method for Scala reflection. ## How was this patch tested? Added UT Author: Marco Gaido <marcogaid...@gmail.com> Closes #20219 from mgaido91/SPARK-23025. (cherry picked from commit 505086806997b4331d4a8c2fc5e08345d869a23c) Signed-off-by: gatorsmile <gatorsm...@gmail.com> commit db27a93652780f234f3c5fe750ef07bc5525d177 Author: Dongjoon Hyun <dongjoon@...> Date: 2018-01-12T18:18:42Z [MINOR][BUILD] Fix Java linter errors ## What changes were proposed in this pull request? This PR cleans up the java-lint errors (for v2.3.0-rc1 tag). Hopefully, this will be the final one. ``` $ dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java:[85] (sizes) LineLength: Line is longer than 100 characters (found 101). [ERROR] src/main/java/org/apache/spark/launcher/InProcessAppHandle.java:[20,8] (imports) UnusedImports: Unused import - java.io.IOException. [ERROR] src/main/java/org/apache/spark
[GitHub] spark issue #20707: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20707 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20707: [SPARK-21209][MLLLIB] Implement Incremental PCA a...
Github user sandecho closed the pull request at: https://github.com/apache/spark/pull/20707 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20708 @sethah Thank you. I accept your recommendation. I will take it to ML. Secondly I have written unit tests and I have also adhere to style guidelines. But my concern is that no one is having a discussion on the JIRA. Even the creator of the JIRA @wbstclair is not reachable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20709: [SPARK-18844][MLLIB] Adding more binary classification e...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20709 @sethah Would you recommended closing this one and opening the previous one? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20708 Thanks @wbstclair . That's a good suggestion. Although I would have to take it to ML from MLLIB, rest will be the same. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20709: [SPARK-18844][MLLIB] Adding more binary classification e...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20709 Actually the previous pull request was not able to merge. So, I opened a new pull request. Can you please run the test? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20708 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20709: [SPARK-18844][MLLIB] Adding more binary classification e...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20709 Can you please test it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org