from:"sandecho"

[GitHub] spark pull request #20709: [SPARK-18844][MLLIB] Adding more binary classific...

2018-06-11 Thread sandecho

Github user sandecho closed the pull request at:

https://github.com/apache/spark/pull/20709


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA a...

2018-06-11 Thread sandecho

Github user sandecho closed the pull request at:

https://github.com/apache/spark/pull/20708


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...

2018-06-09 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20708
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20709: [SPARK-18844][MLLIB] Adding more binary classification e...

2018-06-09 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20709
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20063: Branch 2.1

2017-12-23 Thread sandecho

GitHub user sandecho opened a pull request:

https://github.com/apache/spark/pull/20063

Branch 2.1

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20063.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20063


commit 21afc4534f90e063330ad31033aa178b37ef8340
Author: Marcelo Vanzin <vanzin@...>
Date:   2017-02-22T21:19:31Z

[SPARK-19652][UI] Do auth checks for REST API access (branch-2.1).

The REST API has a security filter that performs auth checks
based on the UI root's security manager. That works fine when
the UI root is the app's UI, but not when it's the history server.

In the SHS case, all users would be allowed to see all applications
through the REST API, even if the UI itself wouldn't be available
to them.

This change adds auth checks for each app access through the API
too, so that only authorized users can see the app's data.

The change also modifies the existing security filter to use
`HttpServletRequest.getRemoteUser()`, which is used in other
places. That is not necessarily the same as the principal's
name; for example, when using Hadoop's SPNEGO auth filter,
the remote user strips the realm information, which then matches
the user name registered as the owner of the application.

I also renamed the UIRootFromServletContext trait to a more generic
name since I'm using it to store more context information now.

Tested manually with an authentication filter enabled.

Author: Marcelo Vanzin <van...@cloudera.com>

Closes #17019 from vanzin/SPARK-19652_2.1.

commit d30238f1b9096c9fd85527d95be639de9388fcc7
Author: actuaryzhang <actuaryzhang10@...>
Date:   2017-02-23T19:12:02Z

[SPARK-19682][SPARKR] Issue warning (or error) when subset method "[[" 
takes vector index

## What changes were proposed in this pull request?
The `[[` method is supposed to take a single index and return a column. 
This is different from base R which takes a vector index.  We should check for 
this and issue warning or error when vector index is supplied (which is very 
likely given the behavior in base R).

Currently I'm issuing a warning message and just take the first element of 
the vector index. We could change this to an error it that's better.

## How was this patch tested?
new tests

Author: actuaryzhang <actuaryzhan...@gmail.com>

Closes #17017 from actuaryzhang/sparkRSubsetter.

(cherry picked from commit 7bf09433f5c5e08154ba106be21fe24f17cd282b)
Signed-off-by: Felix Cheung <felixche...@apache.org>

commit 43084b3cc3918b720fe28053d2037fa22a71264e
Author: Herman van Hovell <hvanhovell@...>
Date:   2017-02-23T22:58:02Z

[SPARK-19459][SQL][BRANCH-2.1] Support for nested char/varchar fields in ORC

## What changes were proposed in this pull request?
This is a backport of the two following commits: 
https://github.com/apache/spark/commit/78eae7e67fd5dec0c2d5b1853ce86cd0f1ae 
& 
https://github.com/apache/spark/commit/de8a03e68202647555e30fffba551f65bc77608d

This PR adds support for ORC tables with (nested) char/varchar fields.

## How was this patch tested?
Added a regression test to `OrcSourceSuite`.

Author: Herman van Hovell <hvanhov...@databricks.com>

Closes #17041 from hvanhovell/SPARK-19459-branch-2.1.

commit 66a7ca28a9de92e67ce24896a851a0c96c92aec6
Author: Takeshi Yamamuro <yamamuro@...>
Date:   2017-02-24T09:54:00Z

[SPARK-19691][SQL][BRANCH-2.1] Fix ClassCastException when calculating 
percentile of decimal column

## What changes were proposed in this pull request?
This is a backport of the two following commits: 
https://github.com/apache/spark/commit/93aa4271596a30752dc5234d869c3ae2f6e8e723

This pr fixed a class-cast exception below;
```
scala> spark.range(10).selectExpr("cast (id as decimal) as 
x").selectExpr("percentile(x, 0.5)").collect()
 java.lang.ClassCastException: org.apache.spark.sql.types.Decimal cannot be 
cast to java.lang.Number
at 
org.a

[GitHub] spark pull request #20063: Branch 2.1

2017-12-23 Thread sandecho

Github user sandecho closed the pull request at:

https://github.com/apache/spark/pull/20063


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20063: Branch 2.1

2017-12-23 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20063
  
I want to work on Spark MLLIB Jira.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20549: Add more binary classification metrics to BinaryC...

2018-02-08 Thread sandecho

GitHub user sandecho reopened a pull request:

https://github.com/apache/spark/pull/20549

Add more binary classification metrics to BinaryClassificationMetrics

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandecho/spark new_branch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20549.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20549


commit 9f33d677586043fe7c75ac1930c51c138f281a49
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-08T16:49:13Z

Add more binary classification metrics to BinaryClassificationMetrics

commit d7144f63a99e575d5c996fd7919bdbe44266620f
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-08T17:20:52Z

SPARK-18844




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20549: Add more binary classification metrics to BinaryC...

2018-02-08 Thread sandecho

Github user sandecho closed the pull request at:

https://github.com/apache/spark/pull/20549


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...

2018-02-08 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20549
  


[SPARK-18844.zip](https://github.com/apache/spark/files/1708136/SPARK-18844.zip)




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...

2018-02-08 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20549
  
ok to test. Jenkins, add to whitelist.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20549: Add more binary classification metrics to BinaryC...

2018-02-08 Thread sandecho

GitHub user sandecho opened a pull request:

https://github.com/apache/spark/pull/20549

Add more binary classification metrics to BinaryClassificationMetrics

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandecho/spark new_branch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20549.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20549


commit 9f33d677586043fe7c75ac1930c51c138f281a49
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-08T16:49:13Z

Add more binary classification metrics to BinaryClassificationMetrics




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20549: SPARK-18844[MLLIB] Add more binary classification...

2018-02-08 Thread sandecho

GitHub user sandecho reopened a pull request:

https://github.com/apache/spark/pull/20549

SPARK-18844[MLLIB] Add more binary classification metrics to 
BinaryClassificationMetrics

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandecho/spark new_branch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20549.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20549


commit 9f33d677586043fe7c75ac1930c51c138f281a49
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-08T16:49:13Z

Add more binary classification metrics to BinaryClassificationMetrics

commit d7144f63a99e575d5c996fd7919bdbe44266620f
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-08T17:20:52Z

SPARK-18844




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20549: Add more binary classification metrics to BinaryC...

2018-02-08 Thread sandecho

Github user sandecho closed the pull request at:

https://github.com/apache/spark/pull/20549


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...

2018-02-08 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20549
  
Srowen: Will the result of the test not be posted?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...

2018-02-09 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20549
  
As a first time contributor and a novice on it, I would submit the final 
patch and leave it up to you. You can merge it or leave it. I will close the 
pull request after that, whether patch is accepted or not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...

2018-02-09 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20549
  
Then why was the status of JIRA left open from so many days. It was 
supposed to be closed earlier.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...

2018-02-11 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20549
  
I have committed the changes. Can you please run the test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20549: SPARK-18844[MLLIB] Add more binary classification...

2018-02-14 Thread sandecho

Github user sandecho closed the pull request at:

https://github.com/apache/spark/pull/20549


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...

2018-02-14 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20549
  
Can you please run the test again?

[SPARK-JIRA-18844.zip](https://github.com/apache/spark/files/1724418/SPARK-JIRA-18844.zip)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20609: SPARK-18844[MLLIB] Add more binary classification metric...

2018-02-14 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20609
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20609: SPARK-18844[MLLIB] Add more binary classification...

2018-02-14 Thread sandecho

Github user sandecho closed the pull request at:

https://github.com/apache/spark/pull/20609


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20609: SPARK-18844[MLLIB] Add more binary classification...

2018-02-14 Thread sandecho

Github user sandecho closed the pull request at:

https://github.com/apache/spark/pull/20609


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20549: SPARK-18844[MLLIB] Add more binary classification...

2018-02-14 Thread sandecho

GitHub user sandecho reopened a pull request:

https://github.com/apache/spark/pull/20549

SPARK-18844[MLLIB] Add more binary classification metrics to 
BinaryClassificationMetrics

## What changes were proposed in this pull request?

In this PR, more binary classification metrics has been added to 
BinaryClassificationMetrics as mentioned in SPARK-18844
## How was this patch tested?
By running existing unit test

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandecho/spark new_branch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20549.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20549


commit 9f33d677586043fe7c75ac1930c51c138f281a49
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-08T16:49:13Z

Add more binary classification metrics to BinaryClassificationMetrics

commit d7144f63a99e575d5c996fd7919bdbe44266620f
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-08T17:20:52Z

SPARK-18844

commit 981a1c14892e7e458e1492b3fdb6c77bbb35a0fb
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-11T17:18:01Z

SPARK JIRA 18844

commit 47e56658b83b4c2763f636ba025bdfa39a635960
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-14T14:38:06Z

SPARK JIRA 18844




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20609: SPARK-18844[MLLIB] Add more binary classification...

2018-02-14 Thread sandecho

GitHub user sandecho opened a pull request:

https://github.com/apache/spark/pull/20609

SPARK-18844[MLLIB] Add more binary classification metrics to 
BinaryClassificationMetrics with Examples

## What changes were proposed in this pull request?

In this PR, more binary classification metrics has been added to 
BinaryClassificationMetrics as mentioned in SPARK-18844

## How was this patch tested?

By running existing unit test
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandecho/spark new_branch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20609.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20609


commit 9f33d677586043fe7c75ac1930c51c138f281a49
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-08T16:49:13Z

Add more binary classification metrics to BinaryClassificationMetrics

commit d7144f63a99e575d5c996fd7919bdbe44266620f
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-08T17:20:52Z

SPARK-18844

commit 981a1c14892e7e458e1492b3fdb6c77bbb35a0fb
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-11T17:18:01Z

SPARK JIRA 18844

commit 47e56658b83b4c2763f636ba025bdfa39a635960
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-14T14:38:06Z

SPARK JIRA 18844




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20609: SPARK-18844[MLLIB] Add more binary classification...

2018-02-14 Thread sandecho

GitHub user sandecho reopened a pull request:

https://github.com/apache/spark/pull/20609

SPARK-18844[MLLIB] Add more binary classification metrics to 
BinaryClassificationMetrics with Examples

## What changes were proposed in this pull request?

In this PR, more binary classification metrics has been added to 
BinaryClassificationMetrics as mentioned in SPARK-18844

## How was this patch tested?

By running existing unit test
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandecho/spark new_branch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20609.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20609


commit 9f33d677586043fe7c75ac1930c51c138f281a49
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-08T16:49:13Z

Add more binary classification metrics to BinaryClassificationMetrics

commit d7144f63a99e575d5c996fd7919bdbe44266620f
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-08T17:20:52Z

SPARK-18844

commit 981a1c14892e7e458e1492b3fdb6c77bbb35a0fb
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-11T17:18:01Z

SPARK JIRA 18844

commit 47e56658b83b4c2763f636ba025bdfa39a635960
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-02-14T14:38:06Z

SPARK JIRA 18844




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...

2018-02-09 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20549
  
I will generate the patch once again and submit


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20709: [SPARK-18844][MLLIB] Adding more binary classific...

2018-03-01 Thread sandecho

GitHub user sandecho opened a pull request:

https://github.com/apache/spark/pull/20709

[SPARK-18844][MLLIB] Adding more binary classification evaluation metrics

## What changes were proposed in this pull request?

The following additional binary classification metrics are added.
False omission rate: `forByThreshold`
False discovery rate: `fdrByThreshold`
Negative predictive value: `npvByThreshold`
False negative rate: `fnrByThreshold`
True negative rate (Specificity): `specificityByThreshold`
False positive rate: `fprByThreshold`

## How was this patch tested?
Unit Testing

[EvaluationMetrics.zip](https://github.com/apache/spark/files/1772914/EvaluationMetrics.zip)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandecho/spark binary

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20709.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20709


commit cb5dce1565edca67a3763b7610137b48545ea998
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-03-01T16:15:12Z

Adding more binary classification evaluation metrics




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20709: [SPARK-18844][MLLIB] Adding more binary classification e...

2018-03-01 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20709
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20707: [SPARK-21209][MLLLIB] Implement Incremental PCA a...

2018-03-01 Thread sandecho

GitHub user sandecho reopened a pull request:

https://github.com/apache/spark/pull/20707

[SPARK-21209][MLLLIB] Implement Incremental PCA algorithm

## What changes were proposed in this pull request?

A new feature called Incremental Principal Component Analysis 
Algorithm(IPCA) has been proposed. It divides the incoming data in batch size 
and compute the PCA of the individual batch to generate Principal Component of 
entire data.

## How was this patch tested?
Unit Testing



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20707.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20707


commit 6bb22961c0c9df1a1f22e9491894895b297f5288
Author: Sameer Agarwal <sameerag@...>
Date:   2018-01-11T23:23:17Z

Preparing development version 2.3.1-SNAPSHOT

commit 55695c7127cb2f357dfdf677cab4d21fc840aa3d
Author: WeichenXu <weichen.xu@...>
Date:   2018-01-12T00:20:30Z

[SPARK-23008][ML] OnehotEncoderEstimator python API

## What changes were proposed in this pull request?

OnehotEncoderEstimator python API.

## How was this patch tested?

doctest

Author: WeichenXu <weichen...@databricks.com>

Closes #20209 from WeichenXu123/ohe_py.

(cherry picked from commit b5042d75c2faa5f15bc1e160d75f06dfdd6eea37)
Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit 3ae3e1bb71aa88be1c963b4416986ef679d7c8a2
Author: ho3rexqj <ho3rexqj@...>
Date:   2018-01-12T07:27:00Z

[SPARK-22986][CORE] Use a cache to avoid instantiating multiple instances 
of broadcast variable values

When resources happen to be constrained on an executor the first time a 
broadcast variable is instantiated it is persisted to disk by the BlockManager. 
Consequently, every subsequent call to TorrentBroadcast::readBroadcastBlock 
from other instances of that broadcast variable spawns another instance of the 
underlying value. That is, broadcast variables are spawned once per executor 
**unless** memory is constrained, in which case every instance of a broadcast 
variable is provided with a unique copy of the underlying value.

This patch fixes the above by explicitly caching the underlying values 
using weak references in a ReferenceMap.

Author: ho3rexqj <ho3re...@gmail.com>

Closes #20183 from ho3rexqj/fix/cache-broadcast-values.

(cherry picked from commit cbe7c6fbf9dc2fc422b93b3644c40d449a869eea)
Signed-off-by: Wenchen Fan <wenc...@databricks.com>

commit d512d873b3f445845bd113272d7158388427f8a6
Author: WeichenXu <weichen.xu@...>
Date:   2018-01-12T09:27:02Z

[SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder python API deprecated

## What changes were proposed in this pull request?

mark OneHotEncoder python API deprecated

## How was this patch tested?

N/A

Author: WeichenXu <weichen...@databricks.com>

Closes #20241 from WeichenXu123/mark_ohe_deprecated.

(cherry picked from commit a7d98d53ceaf69cabaecc6c9113f17438c4e61f6)
Signed-off-by: Nick Pentreath <ni...@za.ibm.com>

commit 6152da3893a05b3f8dc0f13895af9be9548e5895
Author: Marco Gaido <marcogaido91@...>
Date:   2018-01-12T10:04:44Z

[SPARK-23025][SQL] Support Null type in scala reflection

## What changes were proposed in this pull request?

Add support for `Null` type in the `schemaFor` method for Scala reflection.

## How was this patch tested?

Added UT

Author: Marco Gaido <marcogaid...@gmail.com>

Closes #20219 from mgaido91/SPARK-23025.

(cherry picked from commit 505086806997b4331d4a8c2fc5e08345d869a23c)
Signed-off-by: gatorsmile <gatorsm...@gmail.com>

commit db27a93652780f234f3c5fe750ef07bc5525d177
Author: Dongjoon Hyun <dongjoon@...>
Date:   2018-01-12T18:18:42Z

[MINOR][BUILD] Fix Java linter errors

## What changes were proposed in this pull request?

This PR cleans up the java-lint errors (for v2.3.0-rc1 tag). Hopefully, 
this will be the final one.

```
$ dev/lint-java
Using `mvn` from path: /usr/local/bin/mvn
Checkstyle checks failed at following occurrences:
[ERROR] 
src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java:[85] 
(sizes) LineLength: Line is longer than 100 characters (found 101).
[ERROR] 
src/main/java/org/apache/spark/launcher/InProcessAppHandle.java:[20,8] 
(imports) UnusedImports: Unused import - java.io.IOException.
[ERROR] 
src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java:[41,9]
 (modifier) ModifierOrder: '

[GitHub] spark pull request #20707: [SPARK-21209][MLLLIB] Implement Incremental PCA a...

2018-03-01 Thread sandecho

Github user sandecho closed the pull request at:

https://github.com/apache/spark/pull/20707


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20549: SPARK-18844[MLLIB] Add more binary classification...

2018-03-01 Thread sandecho

Github user sandecho closed the pull request at:

https://github.com/apache/spark/pull/20549


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...

2018-03-01 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20708
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA a...

2018-03-01 Thread sandecho

GitHub user sandecho opened a pull request:

https://github.com/apache/spark/pull/20708

[SPARK-21209][MLLLIB] Implement Incremental PCA algorithm

## What changes were proposed in this pull request?

A new feature called Incremental Principal Component Analysis 
Algorithm(IPCA) has been proposed. It divides the incoming data in batch size 
and compute the PCA of the individual batch to generate Principal Component of 
entire data.
## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
Unit Testing
[IPCA.zip](https://github.com/apache/spark/files/1772562/IPCA.zip)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandecho/spark IPCA

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20708.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20708


commit 7900d21138de542fd89763a68417d74792725afd
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date:   2018-03-01T13:35:20Z

Implemented Incremental PCA




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20707: [SPARK-21209][MLLLIB] Implement Incremental PCA a...

2018-03-01 Thread sandecho

GitHub user sandecho opened a pull request:

https://github.com/apache/spark/pull/20707

[SPARK-21209][MLLLIB] Implement Incremental PCA algorithm

## What changes were proposed in this pull request?

A new feature called Incremental Principal Component Analysis 
Algorithm(IPCA) has been proposed. It divides the incoming data in batch size 
and compute the PCA of the individual batch to generate Principal Component of 
entire data.

## How was this patch tested?
Unit Testing

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20707.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20707


commit 6bb22961c0c9df1a1f22e9491894895b297f5288
Author: Sameer Agarwal <sameerag@...>
Date:   2018-01-11T23:23:17Z

Preparing development version 2.3.1-SNAPSHOT

commit 55695c7127cb2f357dfdf677cab4d21fc840aa3d
Author: WeichenXu <weichen.xu@...>
Date:   2018-01-12T00:20:30Z

[SPARK-23008][ML] OnehotEncoderEstimator python API

## What changes were proposed in this pull request?

OnehotEncoderEstimator python API.

## How was this patch tested?

doctest

Author: WeichenXu <weichen...@databricks.com>

Closes #20209 from WeichenXu123/ohe_py.

(cherry picked from commit b5042d75c2faa5f15bc1e160d75f06dfdd6eea37)
Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit 3ae3e1bb71aa88be1c963b4416986ef679d7c8a2
Author: ho3rexqj <ho3rexqj@...>
Date:   2018-01-12T07:27:00Z

[SPARK-22986][CORE] Use a cache to avoid instantiating multiple instances 
of broadcast variable values

When resources happen to be constrained on an executor the first time a 
broadcast variable is instantiated it is persisted to disk by the BlockManager. 
Consequently, every subsequent call to TorrentBroadcast::readBroadcastBlock 
from other instances of that broadcast variable spawns another instance of the 
underlying value. That is, broadcast variables are spawned once per executor 
**unless** memory is constrained, in which case every instance of a broadcast 
variable is provided with a unique copy of the underlying value.

This patch fixes the above by explicitly caching the underlying values 
using weak references in a ReferenceMap.

Author: ho3rexqj <ho3re...@gmail.com>

Closes #20183 from ho3rexqj/fix/cache-broadcast-values.

(cherry picked from commit cbe7c6fbf9dc2fc422b93b3644c40d449a869eea)
Signed-off-by: Wenchen Fan <wenc...@databricks.com>

commit d512d873b3f445845bd113272d7158388427f8a6
Author: WeichenXu <weichen.xu@...>
Date:   2018-01-12T09:27:02Z

[SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder python API deprecated

## What changes were proposed in this pull request?

mark OneHotEncoder python API deprecated

## How was this patch tested?

N/A

Author: WeichenXu <weichen...@databricks.com>

Closes #20241 from WeichenXu123/mark_ohe_deprecated.

(cherry picked from commit a7d98d53ceaf69cabaecc6c9113f17438c4e61f6)
Signed-off-by: Nick Pentreath <ni...@za.ibm.com>

commit 6152da3893a05b3f8dc0f13895af9be9548e5895
Author: Marco Gaido <marcogaido91@...>
Date:   2018-01-12T10:04:44Z

[SPARK-23025][SQL] Support Null type in scala reflection

## What changes were proposed in this pull request?

Add support for `Null` type in the `schemaFor` method for Scala reflection.

## How was this patch tested?

Added UT

Author: Marco Gaido <marcogaid...@gmail.com>

Closes #20219 from mgaido91/SPARK-23025.

(cherry picked from commit 505086806997b4331d4a8c2fc5e08345d869a23c)
Signed-off-by: gatorsmile <gatorsm...@gmail.com>

commit db27a93652780f234f3c5fe750ef07bc5525d177
Author: Dongjoon Hyun <dongjoon@...>
Date:   2018-01-12T18:18:42Z

[MINOR][BUILD] Fix Java linter errors

## What changes were proposed in this pull request?

This PR cleans up the java-lint errors (for v2.3.0-rc1 tag). Hopefully, 
this will be the final one.

```
$ dev/lint-java
Using `mvn` from path: /usr/local/bin/mvn
Checkstyle checks failed at following occurrences:
[ERROR] 
src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java:[85] 
(sizes) LineLength: Line is longer than 100 characters (found 101).
[ERROR] 
src/main/java/org/apache/spark/launcher/InProcessAppHandle.java:[20,8] 
(imports) UnusedImports: Unused import - java.io.IOException.
[ERROR] 
src/main/java/org/apache/spark

[GitHub] spark issue #20707: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...

2018-03-01 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20707
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20707: [SPARK-21209][MLLLIB] Implement Incremental PCA a...

2018-03-01 Thread sandecho

Github user sandecho closed the pull request at:

https://github.com/apache/spark/pull/20707


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...

2018-03-01 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20708
  
@sethah Thank you. I accept your recommendation. I will take it to ML. 
Secondly I have written unit tests and I have also adhere to style guidelines. 
But my concern is that no one is having a discussion on the JIRA. Even the 
creator of the JIRA @wbstclair is not reachable.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20709: [SPARK-18844][MLLIB] Adding more binary classification e...

2018-03-01 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20709
  
@sethah Would you recommended closing this one and opening the previous one?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...

2018-03-01 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20708
  
Thanks @wbstclair . That's a good suggestion. Although I would have to take 
it to ML from MLLIB, rest will be the same.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20709: [SPARK-18844][MLLIB] Adding more binary classification e...

2018-03-01 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20709
  
Actually the previous pull request was not able to merge. So, I opened a 
new pull request. Can you please run the test? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...

2018-03-01 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20708
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20709: [SPARK-18844][MLLIB] Adding more binary classification e...

2018-03-02 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20709
  
Can you please test it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

43 matches

Mail list logo