date:20171030

[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...

2017-10-30 Thread ScrapCodes

Github user ScrapCodes commented on the issue:

https://github.com/apache/spark/pull/14151
  
@gatorsmile Ping !   


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check for vers...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19620
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check for vers...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19620
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83243/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check for vers...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19620
  
**[Test build #83243 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83243/testReport)**
 for PR 19620 at commit 
[`a762b1f`](https://github.com/apache/spark/commit/a762b1fbebcb73964e4fb2bcd910014fb9a67989).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check for vers...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19619
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83244/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check for vers...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19619
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check for vers...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19619
  
**[Test build #83244 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83244/testReport)**
 for PR 19619 at commit 
[`efa16a6`](https://github.com/apache/spark/commit/efa16a636ec508c13a54a42b292233b0eed55df9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #83249 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83249/testReport)**
 for PR 16677 at commit 
[`e53648e`](https://github.com/apache/spark/commit/e53648e7f58f439bb09a702521c2f84cf2e344bd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83247/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #83247 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83247/testReport)**
 for PR 16677 at commit 
[`7598337`](https://github.com/apache/spark/commit/759833712a9be4b3f3f65cf4722ddd33851726e8).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19557: [SPARK-22281][SPARKR] Handle R method breaking signature...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19557
  
**[Test build #83248 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83248/testReport)**
 for PR 19557 at commit 
[`1b41f73`](https://github.com/apache/spark/commit/1b41f73a2cdea5ebc7a0c3346dd37d9841cc72df).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2017-10-30 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
ping @cloud-fan @jiangxb1987 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to prevent ...

2017-10-30 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19528
  
@shaneknapp - could you help check - what version of SciPy Jenkins is 
running with? thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19557: [SPARK-22281][SPARKR] Handle R method breaking signature...

2017-10-30 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19557
  
rebased


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #83247 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83247/testReport)**
 for PR 16677 at commit 
[`7598337`](https://github.com/apache/spark/commit/759833712a9be4b3f3f65cf4722ddd33851726e8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-30 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16578
  
@mallman I will try to go through this again. Do you think this can be 
generalize to data source v2 API?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19617
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83242/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19617
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19617
  
**[Test build #83242 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83242/testReport)**
 for PR 19617 at commit 
[`a43430b`](https://github.com/apache/spark/commit/a43430b99d0e5aab351467386fe566461b2a4b06).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19601
  
**[Test build #83246 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83246/testReport)**
 for PR 19601 at commit 
[`b971506`](https://github.com/apache/spark/commit/b971506f8d5138a2c23e039427d547b736079c13).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-30 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16578
  
thanks! ping/add  @rxin @hvanhovell @gatorsmile @cloud-fan @liancheng 
@joseph-torres 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-30 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19601
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18906
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check for vers...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19619
  
**[Test build #83244 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83244/testReport)**
 for PR 19619 at commit 
[`efa16a6`](https://github.com/apache/spark/commit/efa16a636ec508c13a54a42b292233b0eed55df9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check for vers...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19620
  
**[Test build #83243 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83243/testReport)**
 for PR 19620 at commit 
[`a762b1f`](https://github.com/apache/spark/commit/a762b1fbebcb73964e4fb2bcd910014fb9a67989).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19618: [SPARK-5484][Followup] PeriodicRDDCheckpointer doc clean...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19618
  
**[Test build #83245 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83245/testReport)**
 for PR 19618 at commit 
[`2858cbb`](https://github.com/apache/spark/commit/2858cbb5c8264d7bee592835b56a415961ed1dc4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check f...

2017-10-30 Thread felixcheung

GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/19620

[SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check for version warning

## What changes were proposed in this pull request?

Will need to port to this to branch-1.6, -2.0, -2.1, -2.2

## How was this patch tested?

manually
Jenkins, AppVeyor

Author: Felix Cheung 

Closes #19549 from felixcheung/rcranversioncheck.

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rcranversioncheck21

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19620.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19620


commit a762b1fbebcb73964e4fb2bcd910014fb9a67989
Author: Felix Cheung 
Date:   2017-10-31T04:44:24Z

[SPARK-22327][SPARKR][TEST] check for version warning

## What changes were proposed in this pull request?

Will need to port to this to branch-1.6, -2.0, -2.1, -2.2

## How was this patch tested?

manually
Jenkins, AppVeyor

Author: Felix Cheung 

Closes #19549 from felixcheung/rcranversioncheck.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check f...

2017-10-30 Thread felixcheung

GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/19619

[SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check for version warning

## What changes were proposed in this pull request?

Will need to port to this to branch-1.6, -2.0, -2.1, -2.2

## How was this patch tested?

manually
Jenkins, AppVeyor

Author: Felix Cheung 

Closes #19549 from felixcheung/rcranversioncheck.

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rcranversioncheck22

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19619.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19619


commit efa16a636ec508c13a54a42b292233b0eed55df9
Author: Felix Cheung 
Date:   2017-10-31T04:44:24Z

[SPARK-22327][SPARKR][TEST] check for version warning

## What changes were proposed in this pull request?

Will need to port to this to branch-1.6, -2.0, -2.1, -2.2

## How was this patch tested?

manually
Jenkins, AppVeyor

Author: Felix Cheung 

Closes #19549 from felixcheung/rcranversioncheck.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19550: [SPARK-22327][SPARKR][TEST][BACKPORT-2.0] check f...

2017-10-30 Thread felixcheung

Github user felixcheung closed the pull request at:

https://github.com/apache/spark/pull/19550


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19618: [SPARK-5484][Followup] PeriodicRDDCheckpointer do...

2017-10-30 Thread zhengruifeng

GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/19618

[SPARK-5484][Followup] PeriodicRDDCheckpointer doc cleanup

## What changes were proposed in this pull request?
PeriodicRDDCheckpointer was already moved out of mllib in Spark-5484

## How was this patch tested?
existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark checkpointer_doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19618.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19618


commit 2858cbb5c8264d7bee592835b56a415961ed1dc4
Author: Zheng RuiFeng 
Date:   2017-10-31T04:39:59Z

create pr




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19549: [SPARK-22327][SPARKR][TEST] check for version war...

2017-10-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19549


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19549: [SPARK-22327][SPARKR][TEST] check for version warning

2017-10-30 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19549
  
merged to master. will backport separately


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19617
  
**[Test build #83242 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83242/testReport)**
 for PR 19617 at commit 
[`a43430b`](https://github.com/apache/spark/commit/a43430b99d0e5aab351467386fe566461b2a4b06).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

2017-10-30 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19617
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

2017-10-30 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19617
  
cc @HyukjinKwon @BryanCutler 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19617
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19617
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83241/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19617: [SPARK-22347][PySpark][DOC] Add document to notic...

2017-10-30 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/19617

[SPARK-22347][PySpark][DOC] Add document to notice users for using udfs 
with conditional expressions

## What changes were proposed in this pull request?

Under the current execution mode of Python UDFs, we don't well support 
Python UDFs as branch values or else value in CaseWhen expression.

Since to fix it might need the change not small and this issue has simpler 
workaround. We should just notice users in the document about this.

## How was this patch tested?

Only document change.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-22347-3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19617.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19617


commit a43430b99d0e5aab351467386fe566461b2a4b06
Author: Liang-Chi Hsieh 
Date:   2017-10-31T04:28:16Z

Add document to notice users for using udfs with conditional expressions.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19592: [SPARK-22347][SQL][PySpark] Support optionally running P...

2017-10-30 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19592
  
After collected the opinions so far, doing just document is the consensus. 
I will close this for now and submit a simple PR to document it later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19592: [SPARK-22347][SQL][PySpark] Support optionally ru...

2017-10-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19592#discussion_r147892336
  
--- Diff: python/pyspark/worker.py ---
@@ -105,8 +105,14 @@ def read_single_udf(pickleSer, infile, eval_type):
 elif eval_type == PythonEvalType.SQL_PANDAS_GROUPED_UDF:
 # a groupby apply udf has already been wrapped under apply()
 return arg_offsets, row_func
-else:
+elif eval_type == PythonEvalType.SQL_BATCHED_UDF:
 return arg_offsets, wrap_udf(row_func, return_type)
+elif eval_type == PythonEvalType.SQL_BATCHED_OPT_UDF:
--- End diff --

One possible is, we do the wrapping when creating UDFs in Python side. Even 
for UDFs not used in conditional expressions, we still add an extra boolean 
argument to the end of its argument list. We don't need another eval_type with 
this fix.

But currently I think documenting it seems a more acceptable fix for others.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19592: [SPARK-22347][SQL][PySpark] Support optionally ru...

2017-10-30 Thread viirya

Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/19592


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19592: [SPARK-22347][SQL][PySpark] Support optionally ru...

2017-10-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19592#discussion_r147891641
  
--- Diff: python/pyspark/worker.py ---
@@ -105,8 +105,14 @@ def read_single_udf(pickleSer, infile, eval_type):
 elif eval_type == PythonEvalType.SQL_PANDAS_GROUPED_UDF:
 # a groupby apply udf has already been wrapped under apply()
 return arg_offsets, row_func
-else:
+elif eval_type == PythonEvalType.SQL_BATCHED_UDF:
 return arg_offsets, wrap_udf(row_func, return_type)
+elif eval_type == PythonEvalType.SQL_BATCHED_OPT_UDF:
--- End diff --

Because the python functions are serialized and maybe broadcasted further, 
I didn't figure out a way to do this wrapping in `BatchEvalPython` in Scala 
side.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19613: Fixed a typo

2017-10-30 Thread jmchung

Github user jmchung commented on the issue:

https://github.com/apache/spark/pull/19613
  
Hi @ganeshchand , could you also fix the typo in `JdbcUtils.scala`? Thanks!
#L459 underling => underlying


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19601
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83240/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19601
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19606: [SPARK-22333][SQL][Backport-2.2]timeFunctionCall(CURRENT...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19606
  
**[Test build #83239 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83239/testReport)**
 for PR 19606 at commit 
[`2bcc2ea`](https://github.com/apache/spark/commit/2bcc2ea6fd0ca9f12959246bb9ee6796cb7a90a0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19606: [SPARK-22333][SQL][Backport-2.2]timeFunctionCall(CURRENT...

2017-10-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19606
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-30 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19601
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19479: [SPARK-17074] [SQL] Generate equi-height histogra...

2017-10-30 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19479#discussion_r147887853
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -89,19 +93,159 @@ case class AnalyzeColumnCommand(
 // The first element in the result will be the overall row count, the 
following elements
 // will be structs containing all column stats.
 // The layout of each struct follows the layout of the ColumnStats.
-val ndvMaxErr = sparkSession.sessionState.conf.ndvMaxError
 val expressions = Count(Literal(1)).toAggregateExpression() +:
-  attributesToAnalyze.map(ColumnStat.statExprs(_, ndvMaxErr))
+  attributesToAnalyze.map(statExprs(_, sparkSession.sessionState.conf))
 
 val namedExpressions = expressions.map(e => Alias(e, e.toString)())
 val statsRow = new QueryExecution(sparkSession, Aggregate(Nil, 
namedExpressions, relation))
   .executedPlan.executeTake(1).head
 
 val rowCount = statsRow.getLong(0)
-val columnStats = attributesToAnalyze.zipWithIndex.map { case (attr, 
i) =>
-  // according to `ColumnStat.statExprs`, the stats struct always have 
6 fields.
-  (attr.name, ColumnStat.rowToColumnStat(statsRow.getStruct(i + 1, 6), 
attr))
-}.toMap
-(rowCount, columnStats)
+val colStats = rowToColumnStats(sparkSession, relation, 
attributesToAnalyze, statsRow, rowCount)
+(rowCount, colStats)
+  }
+
+  /**
+   * Constructs an expression to compute column statistics for a given 
column.
+   *
+   * The expression should create a single struct column with the 
following schema:
+   * distinctCount: Long, min: T, max: T, nullCount: Long, avgLen: Long, 
maxLen: Long,
+   * percentiles: Array[T]
+   *
+   * Together with [[rowToColumnStats]], this function is used to create 
[[ColumnStat]] and
+   * as a result should stay in sync with it.
+   */
+  private def statExprs(col: Attribute, conf: SQLConf): CreateNamedStruct 
= {
+def struct(exprs: Expression*): CreateNamedStruct = 
CreateStruct(exprs.map { expr =>
+  expr.transformUp { case af: AggregateFunction => 
af.toAggregateExpression() }
+})
+val one = Literal(1, LongType)
+
+// the approximate ndv (num distinct value) should never be larger 
than the number of rows
+val numNonNulls = if (col.nullable) Count(col) else Count(one)
+val ndv = Least(Seq(HyperLogLogPlusPlus(col, conf.ndvMaxError), 
numNonNulls))
+val numNulls = Subtract(Count(one), numNonNulls)
+val defaultSize = Literal(col.dataType.defaultSize, LongType)
+val nullArray = Literal(null, ArrayType(DoubleType))
+
+def fixedLenTypeExprs(castType: DataType) = {
+  // For fixed width types, avg size should be the same as max size.
+  Seq(ndv, Cast(Min(col), castType), Cast(Max(col), castType), 
numNulls, defaultSize,
+defaultSize)
+}
+
+def fixedLenTypeStruct(castType: DataType, genHistogram: Boolean) = {
+  val percentileExpr = if (genHistogram) {
+// To generate equi-height histogram, we need to:
+// 1. get percentiles p(1/n), p(2/n) ... p((n-1)/n),
+// 2. use min, max, and percentiles as range values of buckets, 
e.g. [min, p(1/n)],
+// [p(1/n), p(2/n)] ... [p((n-1)/n), max], and then count ndv in 
each bucket.
+// Step 2 will be performed in `rowToColumnStats`.
--- End diff --

Do you mean calculate percentiles for min/max at the step 1? Currently 
other percentiles are already calculated at step 1.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19272
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83235/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19272
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19272
  
**[Test build #83235 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83235/testReport)**
 for PR 19272 at commit 
[`864ab7e`](https://github.com/apache/spark/commit/864ab7ec659a5071e0ed1a87d2448c507b815a79).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19479: [SPARK-17074] [SQL] Generate equi-height histogra...

2017-10-30 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19479#discussion_r147887335
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -216,65 +218,61 @@ object ColumnStat extends Logging {
 }
   }
 
-  /**
-   * Constructs an expression to compute column statistics for a given 
column.
-   *
-   * The expression should create a single struct column with the 
following schema:
-   * distinctCount: Long, min: T, max: T, nullCount: Long, avgLen: Long, 
maxLen: Long
-   *
-   * Together with [[rowToColumnStat]], this function is used to create 
[[ColumnStat]] and
-   * as a result should stay in sync with it.
-   */
-  def statExprs(col: Attribute, relativeSD: Double): CreateNamedStruct = {
-def struct(exprs: Expression*): CreateNamedStruct = 
CreateStruct(exprs.map { expr =>
-  expr.transformUp { case af: AggregateFunction => 
af.toAggregateExpression() }
-})
-val one = Literal(1, LongType)
+  private def convertToHistogram(s: String): EquiHeightHistogram = {
+val idx = s.indexOf(",")
+if (idx <= 0) {
+  throw new AnalysisException("Failed to parse histogram.")
+}
+val height = s.substring(0, idx).toDouble
+val pattern = "Bucket\\(([^,]+), ([^,]+), ([^\\)]+)\\)".r
+val buckets = pattern.findAllMatchIn(s).map { m =>
+  EquiHeightBucket(m.group(1).toDouble, m.group(2).toDouble, 
m.group(3).toLong)
+}.toSeq
+EquiHeightHistogram(height, buckets)
+  }
 
-// the approximate ndv (num distinct value) should never be larger 
than the number of rows
-val numNonNulls = if (col.nullable) Count(col) else Count(one)
-val ndv = Least(Seq(HyperLogLogPlusPlus(col, relativeSD), numNonNulls))
-val numNulls = Subtract(Count(one), numNonNulls)
-val defaultSize = Literal(col.dataType.defaultSize, LongType)
+}
 
-def fixedLenTypeStruct(castType: DataType) = {
-  // For fixed width types, avg size should be the same as max size.
-  struct(ndv, Cast(Min(col), castType), Cast(Max(col), castType), 
numNulls, defaultSize,
-defaultSize)
-}
+/**
+ * There are a few types of histograms in state-of-the-art estimation 
methods. E.g. equi-width
+ * histogram, equi-height histogram, frequency histogram (value-frequency 
pairs) and hybrid
+ * histogram, etc.
+ * Currently in Spark, we support equi-height histogram since it is good 
at handling skew
+ * distribution, and also provides reasonable accuracy in other cases.
+ * We can add other histograms in the future, which will make estimation 
logic more complicated.
--- End diff --

It's not in high priority, here I just want to say it's doable, but will 
complicate the estimation logic.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19479: [SPARK-17074] [SQL] Generate equi-height histogra...

2017-10-30 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19479#discussion_r147886882
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -177,13 +180,12 @@ object ColumnStat extends Logging {
   Some(ColumnStat(
 distinctCount = BigInt(map(KEY_DISTINCT_COUNT).toLong),
 // Note that flatMap(Option.apply) turns Option(null) into None.
-min = map.get(KEY_MIN_VALUE)
-  .map(fromExternalString(_, field.name, 
field.dataType)).flatMap(Option.apply),
-max = map.get(KEY_MAX_VALUE)
-  .map(fromExternalString(_, field.name, 
field.dataType)).flatMap(Option.apply),
+min = map.get(KEY_MIN_VALUE).map(fromString(_, field.name, 
field.dataType)),
+max = map.get(KEY_MAX_VALUE).map(fromString(_, field.name, 
field.dataType)),
--- End diff --

Yea, but I tend to revert the change because keep `flatMap(Option.apply)` 
is more robust.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19439
  
**[Test build #83238 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83238/testReport)**
 for PR 19439 at commit 
[`e314327`](https://github.com/apache/spark/commit/e314327dd74c0092194c311a531c8a8bb90fdb86).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19479: [SPARK-17074] [SQL] Generate equi-height histogra...

2017-10-30 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19479#discussion_r147886758
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -155,6 +156,8 @@ object ColumnStat extends Logging {
   private val KEY_NULL_COUNT = "nullCount"
   private val KEY_AVG_LEN = "avgLen"
   private val KEY_MAX_LEN = "maxLen"
+  val KEY_HISTOGRAM = "histogram"
+  val KEY_HISTOGRAM_SEPARATOR = "-"
--- End diff --

they are used in `HiveExternalCatalog` for stats/properties conversion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-10-30 Thread imatiach-msft

Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/19439
  
Jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19601
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83237/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19601
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-30 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19601
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19601
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83236/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19601
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-30 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19601
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19459
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19459
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83233/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19459
  
**[Test build #83233 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83233/testReport)**
 for PR 19459 at commit 
[`cfb1c3d`](https://github.com/apache/spark/commit/cfb1c3dd48abc7073cf0f98e529afae4e1157d78).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19615: [SPARK-19611][SQL][followup] set dataSchema correctly in...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19615
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19615: [SPARK-19611][SQL][followup] set dataSchema correctly in...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19615
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83234/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19615: [SPARK-19611][SQL][followup] set dataSchema correctly in...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19615
  
**[Test build #83234 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83234/testReport)**
 for PR 19615 at commit 
[`46f530f`](https://github.com/apache/spark/commit/46f530fe777c921d43a2f323abc91d8bb69423d5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19611: [SPARK-22305] Write HDFSBackedStateStoreProvider.loadMap...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19611
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19611: [SPARK-22305] Write HDFSBackedStateStoreProvider.loadMap...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19611
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83229/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19611: [SPARK-22305] Write HDFSBackedStateStoreProvider.loadMap...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19611
  
**[Test build #83229 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83229/testReport)**
 for PR 19611 at commit 
[`d98ce9e`](https://github.com/apache/spark/commit/d98ce9e34050d0ef08a6e8802952a3c3bb6fc896).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19614
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83232/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19614
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19614
  
**[Test build #83232 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83232/testReport)**
 for PR 19614 at commit 
[`5c04540`](https://github.com/apache/spark/commit/5c045400659f3bf149e39ba8ec6d4a13f1210e72).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19616: [SPARK-22404][YARN][WIP] Provide an option to use unmana...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19616
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19616: [SPARK-22404][YARN][WIP] Provide an option to use...

2017-10-30 Thread devaraj-kavali

GitHub user devaraj-kavali opened a pull request:

https://github.com/apache/spark/pull/19616

[SPARK-22404][YARN][WIP] Provide an option to use unmanaged AM in 
yarn-client mode

## What changes were proposed in this pull request?

Providing a new configuration "spark.yarn.un-managed-am" (defaults to 
false) to enable the Unmanaged AM Application in Yarn Client mode which 
launches the Application Master service as part of the Client. It utilizes the 
existing code for communicating between the Application Master <-> Task 
Scheduler for the container requests/allocations/launch, and eliminates these
1. Allocating and launching the Application Master container
2. Remote Node/Process communication between Application Master <-> Task 
Scheduler

## How was this patch tested?

I verified manually running the applications in yarn-client mode with 
"spark.yarn.un-managed-am" enabled, and also ensured that there is no impact to 
the existing execution flows. I am verifying some more failure scenarios, will 
update the PR if anything needs to be fixed. I would like to hear others 
feedback/thoughts on this.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/devaraj-kavali/spark SPARK-22404

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19616.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19616


commit e51f99ef04e4fd797f4c715b1773c1d245a8a0cd
Author: Devaraj K 
Date:   2017-10-31T00:06:48Z

[SPARK-22404][YARN] Provide an option to use unmanaged AM in yarn-client
mode




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19272
  
**[Test build #83235 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83235/testReport)**
 for PR 19272 at commit 
[`864ab7e`](https://github.com/apache/spark/commit/864ab7ec659a5071e0ed1a87d2448c507b815a79).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...

2017-10-30 Thread ArtRand

Github user ArtRand commented on a diff in the pull request:

https://github.com/apache/spark/pull/19272#discussion_r147866441
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterManager.scala
 ---
@@ -17,7 +17,7 @@
 
 package org.apache.spark.scheduler.cluster.mesos
 
-import org.apache.spark.{SparkContext, SparkException}
+import org.apache.spark.SparkContext
--- End diff --

`SparkException` is unused, not sure why it was there in the first place


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...

2017-10-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19568
  
@rdblue Yes, the current implementation implicitly assumes the rule 
`CollapseCodegenStages ` excludes all the illegal cases. How about adding an 
`assert` to do the check that the condition of `SortMergeJoinExec` does not 
have `CodegenFallback ` expressions? Also write a code comment to explain 
`CollapseCodegenStages ` guarantees the assumption?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...

2017-10-30 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19459
  
I think it is a bug, we should fix it first.

BTW I'm fine to upgrade arrow, just make sure we get everything we need at 
the arrow version we wanna upgrade, then remove all the hacks at Spark side. We 
should throw exception if users have an old arrow version installed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19615: [SPARK-19611][SQL][followup] set dataSchema correctly in...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19615
  
**[Test build #83234 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83234/testReport)**
 for PR 19615 at commit 
[`46f530f`](https://github.com/apache/spark/commit/46f530fe777c921d43a2f323abc91d8bb69423d5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...

2017-10-30 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/19459
  
After incorporating date and timestamp types for this, I had to refactor a 
little to use `_create_batch` from serializers to make Arrow batches from 
Columns even when the user doesn't specify the schema to be able to use the 
casts for these types. It doesn't seem to affect performance from the initial 
benchmark.

I came across an issue when using pandas DataFrame with timestamps without 
Arrow.  Spark will read values as long and not datetime, so currently a test 
for this will fail

```
In [1]: spark.conf.set("spark.sql.execution.arrow.enabled", "false")

In [2]: import pandas as pd
   ...: from datetime import datetime
   ...: 

In [3]: pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)]})

In [4]: df = spark.createDataFrame(pdf)

In [5]: df.show()
+---+
| ts|
+---+
|15094116610|
+---+


In [6]: df.schema
Out[6]: StructType(List(StructField(ts,LongType,true)))

In [7]: pdf
Out[7]: 
   ts
0 2017-10-31 01:01:01

In [9]: pdf.dtypes
Out[9]: 
tsdatetime64[ns]
dtype: object
```
@HyukjinKwon or @ueshin could you confirm you see the same? and do you 
consider this a bug?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19615: [SPARK-19611][SQL][followup] set dataSchema correctly in...

2017-10-30 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19615
  
cc @budde @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19615: [SPARK-19611][SQL][followup] set dataSchema corre...

2017-10-30 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/19615

[SPARK-19611][SQL][followup] set dataSchema correctly in 
HiveMetastoreCatalog.convertToLogicalRelation

## What changes were proposed in this pull request?

We made a mistake in https://github.com/apache/spark/pull/16944 . In 
`HiveMetastoreCatalog#inferIfNeeded` we infer the data schema, merge with full 
schema, and return the new full schema. At caller side we treat the full schema 
as data schema and set it to `HadoopFsRelation`.

This doesn't cause any problem because both parquet and orc can work with a 
wrong data schema that has extra columns, but it's better to fix this mistake.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark infer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19615.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19615


commit 46f530fe777c921d43a2f323abc91d8bb69423d5
Author: Wenchen Fan 
Date:   2017-10-30T23:05:57Z

set dataSchema correctly in HiveMetastoreCatalog.convertToLogicalRelation




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19614
  
**[Test build #83232 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83232/testReport)**
 for PR 19614 at commit 
[`5c04540`](https://github.com/apache/spark/commit/5c045400659f3bf149e39ba8ec6d4a13f1210e72).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19459
  
**[Test build #83233 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83233/testReport)**
 for PR 19459 at commit 
[`cfb1c3d`](https://github.com/apache/spark/commit/cfb1c3dd48abc7073cf0f98e529afae4e1157d78).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper

2017-10-30 Thread bomeng

Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/19614
  
I will fix the style shortly. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15770
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83230/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15770
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15770
  
**[Test build #83230 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83230/testReport)**
 for PR 15770 at commit 
[`cfa18af`](https://github.com/apache/spark/commit/cfa18af7ed27eccebc7af97be8d7e1f4227a5ffa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19614
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper

2017-10-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19614
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83231/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19614
  
**[Test build #83231 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83231/testReport)**
 for PR 19614 at commit 
[`ddc97ef`](https://github.com/apache/spark/commit/ddc97efed418698b81cce70e8cd0498e46dbcd88).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19614
  
**[Test build #83231 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83231/testReport)**
 for PR 19614 at commit 
[`ddc97ef`](https://github.com/apache/spark/commit/ddc97efed418698b81cce70e8cd0498e46dbcd88).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19614: update the location of reference paper

2017-10-30 Thread bomeng

GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/19614

update the location of reference paper

## What changes were proposed in this pull request?
Update the url of reference paper.

## How was this patch tested?
It is comments, so nothing tested.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark 22399

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19614.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19614


commit ddc97efed418698b81cce70e8cd0498e46dbcd88
Author: bomeng 
Date:   2017-10-30T22:31:05Z

update the location of reference paper




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19611: [SPARK-22305] Write HDFSBackedStateStoreProvider.loadMap...

2017-10-30 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/19611
  
LGTM pending tests


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15770
  
**[Test build #83230 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83230/testReport)**
 for PR 15770 at commit 
[`cfa18af`](https://github.com/apache/spark/commit/cfa18af7ed27eccebc7af97be8d7e1f4227a5ffa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19611: [SPARK-22305] Write HDFSBackedStateStoreProvider.loadMap...

2017-10-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19611
  
**[Test build #83229 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83229/testReport)**
 for PR 19611 at commit 
[`d98ce9e`](https://github.com/apache/spark/commit/d98ce9e34050d0ef08a6e8802952a3c3bb6fc896).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 321 matches

Mail list logo