[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-09-18 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/19868
  
can somebody explain to me what the pr description has to do with 
missingFiles? I'm probably missing something but i feel the implementation is 
very different from the pr description.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96211/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22192
  
**[Test build #96211 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96211/testReport)**
 for PR 22192 at commit 
[`447c5e5`](https://github.com/apache/spark/commit/447c5e5974ca2a176026e63518a7a6cf29b78008).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22457: [SPARK-24626] Add statistics prefix to parallelFi...

2018-09-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22457


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22459: [SPARK-23173][SQL] rename spark.sql.fromJsonForce...

2018-09-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22459


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22173
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22173
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96208/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22173
  
**[Test build #96208 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96208/testReport)**
 for PR 22173 at commit 
[`0a43f22`](https://github.com/apache/spark/commit/0a43f223fb97da4fcf355dba945a3a350245c5f3).
 * This patch **fails from timeout after a configured wait of `400m`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22462
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22462
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3219/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22462
  
**[Test build #96225 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96225/testReport)**
 for PR 22462 at commit 
[`d238a66`](https://github.com/apache/spark/commit/d238a66f0ebcfe0d93a14e6685d6294668b82953).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...

2018-09-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22462
  
cc @cloud-fan, this actually bugged me. Mind taking a look when you are 
available please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not...

2018-09-18 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/22462

[SPARK-25460][SS] DataSourceV2: SS sources do not respect 
SessionConfigSupport

## What changes were proposed in this pull request?

This PR proposes to respect `SessionConfigSupport` in SS datasources as 
well. Currently these are only respected in batch sources:


https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L198-L203


https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L244-L249

If a developer makes a datasource V2 that supports both structured 
streaming and batch jobs, batch jobs respect a specific configuration, let's 
say, URL to connect and fetch data (which end users might not be aware of); 
however, structured streaming ends up with not supporting this (and should 
explicitly be set into options).

## How was this patch tested?

Unit tests were added.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-25460

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22462.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22462


commit d238a66f0ebcfe0d93a14e6685d6294668b82953
Author: hyukjinkwon 
Date:   2018-09-19T04:44:40Z

DataSourceV2: Structured Streaming does not respect SessionConfigSupport




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22449: [SPARK-22666][ML][FOLLOW-UP] Return a correctly formatte...

2018-09-18 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/22449
  
@WeichenXu123 I think we should fix the test instead of removing "//" from 
URI if authority is empty. Because both "scheme:/" and "scheme:///" are valid.

~~~scala
scala> val u1 = new URI("file:///a/b/c")
u1: java.net.URI = file:///a/b/c

scala> val u2 = new URI("file:/a/b/c")
u2: java.net.URI = file:/a/b/c

scala> u1 == u2
res1: Boolean = true
~~~

Shall we update the test? Instead of compare the row record, we compare its 
fields one by one and convert `origin` to `URI` before comparison?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite IllegalA...

2018-09-18 Thread seancxmao
Github user seancxmao commented on the issue:

https://github.com/apache/spark/pull/22461
  
@gatorsmile Thanks a lot!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite IllegalA...

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22461
  
**[Test build #96224 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96224/testReport)**
 for PR 22461 at commit 
[`fb7b9d2`](https://github.com/apache/spark/commit/fb7b9d22812f789b970f8dae8a9cd3763c0eccff).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite IllegalA...

2018-09-18 Thread seancxmao
Github user seancxmao commented on the issue:

https://github.com/apache/spark/pull/22461
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22433: [SPARK-25442][SQL][K8S] Support STS to run in k8s deploy...

2018-09-18 Thread suryag10
Github user suryag10 commented on the issue:

https://github.com/apache/spark/pull/22433
  
@mridulm @liyinan926 @jacobdr @ifilonenko 
code check for space,"/" handling is already present at 
https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L259

I had reverted back the fix in start-thriftserver.sh. Please review and 
merge.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite IllegalA...

2018-09-18 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22461
  
add to whitelist


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22392: [SPARK-23200] Reset Kubernetes-specific config on...

2018-09-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22392


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22433: [SPARK-25442][SQL][K8S] Support STS to run in k8s deploy...

2018-09-18 Thread suryag10
Github user suryag10 commented on the issue:

https://github.com/apache/spark/pull/22433
  
> > Agreed with @mridulm that the naming restriction is specific to k8s and 
should be handled in a k8s specific way, e.g., somewhere around 
https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L208.
> 
> Ok, Will update the PR with the same.

Hi, Handling of this conversion is already present in 


https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L259

I had reverted back the change in start-thriftserver.sh file. Please review 
and merge.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22443
  
**[Test build #96223 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96223/testReport)**
 for PR 22443 at commit 
[`075ef7a`](https://github.com/apache/spark/commit/075ef7a0696332c3b9d35ff1750ea7ade08e7c3d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22443
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3218/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22443
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-18 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22443
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16677
  
I'm convinced, there are 2 major issues:
1. abusing shuffle. we need a new mechanism for driver to analyze some 
statistics about data (records per map task)
2. too many small tasks. We need a better algorithm to decide the 
parallelism of limit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22459: [SPARK-23173][SQL] rename spark.sql.fromJsonForceNullabl...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22459
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22459: [SPARK-23173][SQL] rename spark.sql.fromJsonForceNullabl...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22459
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96210/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22459: [SPARK-23173][SQL] rename spark.sql.fromJsonForceNullabl...

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22459
  
**[Test build #96210 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96210/testReport)**
 for PR 22459 at commit 
[`f6f427f`](https://github.com/apache/spark/commit/f6f427fd06a87274d6950550137ce0f551276ff6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96212/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22192
  
**[Test build #96212 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96212/testReport)**
 for PR 22192 at commit 
[`447c5e5`](https://github.com/apache/spark/commit/447c5e5974ca2a176026e63518a7a6cf29b78008).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22443
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96213/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22443
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22443
  
**[Test build #96213 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96213/testReport)**
 for PR 22443 at commit 
[`075ef7a`](https://github.com/apache/spark/commit/075ef7a0696332c3b9d35ff1750ea7ade08e7c3d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-09-18 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16677
  
ok after thinking about it more, i think we should just revert all of these 
changes and go back to the drawing board. here's why:

1. the prs change some of the most common/core parts of spark, and are not 
properly designed (as in they haven't gone through actual discussions; there's 
not even a doc on how they work). the prs created a much more complicated 
implementations for limit / top k. you might be able to justify the complexity 
with the perf improvements, but we better write them down, discuss them, and 
make sure they are the right design choices. this is just a comment about the 
process, not the actual design.

2. now onto the design, i am having issues with two major parts:

2a. this pr really wanted an abstraction to buffer data, and then have the 
driver analyze some statistics about data (records per map task), and then make 
decisions. because spark doesn't yet have that infrastructure, this pr just 
adds some hacks to shuffle to make it work. there is no proper abstraction here.

2b. i'm not even sure if the algorithm here is the right one. the pr tries 
to parallelize as much as possible by keeping the same number of tasks. imo a 
simpler design that would work for more common cases is to buffer the data, get 
the records per map task, and create a new rdd with the first N number of 
partitions that reach limit. that way, we don't launch too many asks, and we 
retain ordering.

3. the pr implementation quality is poor. variable names are confusing 
(output vs records); it's severely lacking documentation; the doc for the 
config option is arcane.

sorry about all of the above, but we gotta do better.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22448: [SPARK-25417][SQL] Improve findTightestCommonType to coe...

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22448
  
**[Test build #96222 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96222/testReport)**
 for PR 22448 at commit 
[`2304c7d`](https://github.com/apache/spark/commit/2304c7d759ee950785b81219b48d8a441741bd83).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22448: [SPARK-25417][SQL] Improve findTightestCommonType to coe...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22448
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22448: [SPARK-25417][SQL] Improve findTightestCommonType to coe...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22448
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3217/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22456: [SPARK-19355][SQL] Fix variable names numberOfOut...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22456#discussion_r218666270
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -31,7 +31,7 @@ import org.apache.spark.util.Utils
 
 /**
  * Result returned by a ShuffleMapTask to a scheduler. Includes the block 
manager address that the
- * task ran on, the sizes of outputs for each reducer, and the number of 
outputs of the map task,
+ * task ran on, the sizes of outputs for each reducer, and the number of 
records of the map task,
--- End diff --

size was about bytes; so it doesn't really matter whether it's a record or 
a row or a block. it's also already pointed out below that it's about bytes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16677#discussion_r218665902
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -93,25 +96,93 @@ trait BaseLimitExec extends UnaryExecNode with 
CodegenSupport {
 }
 
 /**
- * Take the first `limit` elements of each child partition, but do not 
collect or shuffle them.
+ * Take the `limit` elements of the child output.
  */
-case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
+case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
UnaryExecNode {
 
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+  override def output: Seq[Attribute] = child.output
 
   override def outputPartitioning: Partitioning = child.outputPartitioning
-}
 
-/**
- * Take the first `limit` elements of the child's single output partition.
- */
-case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
 
-  override def requiredChildDistribution: List[Distribution] = AllTuples 
:: Nil
+  private val serializer: Serializer = new 
UnsafeRowSerializer(child.output.size)
 
-  override def outputPartitioning: Partitioning = child.outputPartitioning
+  protected override def doExecute(): RDD[InternalRow] = {
+val childRDD = child.execute()
+val partitioner = LocalPartitioning(childRDD)
+val shuffleDependency = ShuffleExchangeExec.prepareShuffleDependency(
+  childRDD, child.output, partitioner, serializer)
+val numberOfOutput: Seq[Long] = if 
(shuffleDependency.rdd.getNumPartitions != 0) {
+  // submitMapStage does not accept RDD with 0 partition.
+  // So, we will not submit this dependency.
+  val submittedStageFuture = 
sparkContext.submitMapStage(shuffleDependency)
+  submittedStageFuture.get().recordsByPartitionId.toSeq
+} else {
+  Nil
+}
 
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+// During global limit, try to evenly distribute limited rows across 
data
+// partitions. If disabled, scanning data partitions sequentially 
until reaching limit number.
+// Besides, if child output has certain ordering, we can't evenly pick 
up rows from
+// each parititon.
+val flatGlobalLimit = sqlContext.conf.limitFlatGlobalLimit && 
child.outputOrdering == Nil
+
+val shuffled = new ShuffledRowRDD(shuffleDependency)
+
+val sumOfOutput = numberOfOutput.sum
+if (sumOfOutput <= limit) {
+  shuffled
+} else if (!flatGlobalLimit) {
+  var numRowTaken = 0
+  val takeAmounts = numberOfOutput.map { num =>
+if (numRowTaken + num < limit) {
+  numRowTaken += num.toInt
+  num.toInt
+} else {
+  val toTake = limit - numRowTaken
+  numRowTaken += toTake
+  toTake
+}
+  }
+  val broadMap = sparkContext.broadcast(takeAmounts)
+  shuffled.mapPartitionsWithIndexInternal { case (index, iter) =>
+iter.take(broadMap.value(index).toInt)
+  }
+} else {
+  // We try to evenly require the asked limit number of rows across 
all child rdd's partitions.
+  var rowsNeedToTake: Long = limit
+  val takeAmountByPartition: Array[Long] = 
Array.fill[Long](numberOfOutput.length)(0L)
+  val remainingRowsByPartition: Array[Long] = Array(numberOfOutput: _*)
+
+  while (rowsNeedToTake > 0) {
+val nonEmptyParts = remainingRowsByPartition.count(_ > 0)
+// If the rows needed to take are less the number of non-empty 
partitions, take one row from
+// each non-empty partitions until we reach `limit` rows.
+// Otherwise, evenly divide the needed rows to each non-empty 
partitions.
+val takePerPart = math.max(1, rowsNeedToTake / nonEmptyParts)
+remainingRowsByPartition.zipWithIndex.foreach { case (num, index) 
=>
+  // In case `rowsNeedToTake` < `nonEmptyParts`, we may run out of 
`rowsNeedToTake` during
+  // the traversal, so we need to add this check.
+  if (rowsNeedToTake > 0 && num > 0) {
+if (num >= takePerPart) {
+  rowsNeedToTake -= takePerPart
+  takeAmountByPartition(index) += takePerPart
+  remainingRowsByPartition(index) -= takePerPart
+} else {
+  rowsNeedToTake -= num
+  takeAmountByPartition(index) += num
+  remainingRowsByPartition(index) -= num
+}
+  }
+}
+  }
+  val broadMap 

[GitHub] spark pull request #22448: [SPARK-25417][SQL] Improve findTightestCommonType...

2018-09-18 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22448#discussion_r218662840
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -106,6 +107,22 @@ object TypeCoercion {
 case (t1, t2) => findTypeForComplex(t1, t2, findTightestCommonType)
   }
 
+  /**
+   * Finds a wider decimal type between the two supplied decimal types 
without
+   * any loss of precision.
+   */
+  def findWiderDecimalType(d1: DecimalType, d2: DecimalType): 
Option[DecimalType] = {
+val scale = max(d1.scale, d2.scale)
+val range = max(d1.precision - d1.scale, d2.precision - d2.scale)
+
+// Check the resultant decimal type does not exceed the allowable 
limits.
+if (range + scale <= DecimalType.MAX_PRECISION && scale <= 
DecimalType.MAX_SCALE) {
--- End diff --

@maropu OK.. i will remove this check.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22381: [SPARK-25394][CORE] Add an application status met...

2018-09-18 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/22381#discussion_r218660634
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -503,9 +503,12 @@ private[spark] object AppStatusStore {
   /**
* Create an in-memory store for a live application.
*/
-  def createLiveStore(conf: SparkConf): AppStatusStore = {
+  def createLiveStore(
+  conf: SparkConf,
+  appStatusSource: Option[AppStatusSource] = None):
--- End diff --

Yep, sorry for the late reply. :(


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22460
  
**[Test build #4341 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4341/testReport)**
 for PR 22460 at commit 
[`76c99af`](https://github.com/apache/spark/commit/76c99afc9c5013968f34e9603a896e21c92a4eb2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96218/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19773: [SPARK-22546][SQL] Supporting for changing column dataTy...

2018-09-18 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/19773
  
@maropu @dongjoon-hyun Great thanks for your guidance !
```
Apache Spark already supports changing column types as a part of schema 
evolution. Especially, ORC vectorized reader support upcasting although it's 
not the same with canCast.

For the detail support Spark coverage, see SPARK-23007. It covered all 
built-in data source at that time.
```
Great thanks, I'll study these background soon.
```
Please note that every data sources have different capability. So, this PR 
needs to prevent ALTER TABLE CHANGE COLUMN for those data sources case-by-case. 
And, we need corresponding test cases.
```
Got it, I'll keep following the cases in this PR, I roughly split these 
into 4 tasks and update the description of this PR firstly. I'll pay attention 
to the corresponding test cases in each task.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22447
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3216/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22461: [SPARK-25453] OracleIntegrationSuite IllegalArgumentExce...

2018-09-18 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22461
  
To `[SPARK-25453][SQL][TEST]`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22447
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22461: [SPARK-25453] OracleIntegrationSuite IllegalArgumentExce...

2018-09-18 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22461
  
Thanks! LGTM, too.
@gatorsmile @HyukjinKwon Can you trigger tests?
btw, the jenkins does not run these docker integration tests?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22461: [SPARK-25453] OracleIntegrationSuite IllegalArgumentExce...

2018-09-18 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22461
  
cc @maropu


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22447
  
**[Test build #96221 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96221/testReport)**
 for PR 22447 at commit 
[`7193de3`](https://github.com/apache/spark/commit/7193de3ad8675229eef131214ed62f2ece5cd416).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22461: [SPARK-25453] OracleIntegrationSuite IllegalArgumentExce...

2018-09-18 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22461
  
Could you add `[TEST]` to title, otherwise LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-18 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22447
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22461: [SPARK-25453] OracleIntegrationSuite IllegalArgumentExce...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22461
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21632
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3215/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21632
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22461: [SPARK-25453] OracleIntegrationSuite IllegalArgumentExce...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22461
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22461: [SPARK-25453] OracleIntegrationSuite IllegalArgumentExce...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22461
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22456: [SPARK-19355][SQL] Fix variable names numberOfOutput -> ...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22456
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22456: [SPARK-19355][SQL] Fix variable names numberOfOutput -> ...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22456
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96206/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453] OracleIntegrationSuite IllegalArgum...

2018-09-18 Thread seancxmao
GitHub user seancxmao opened a pull request:

https://github.com/apache/spark/pull/22461

[SPARK-25453] OracleIntegrationSuite IllegalArgumentException: Timestamp 
format must be -mm-dd hh:mm:ss[.f]

## What changes were proposed in this pull request?
This PR aims to fix the failed test of `OracleIntegrationSuite`.

## How was this patch tested?
Existing integration tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/seancxmao/spark SPARK-25453

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22461.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22461


commit fb7b9d22812f789b970f8dae8a9cd3763c0eccff
Author: seancxmao 
Date:   2018-09-19T03:18:38Z

[SPARK-25453] OracleIntegrationSuite IllegalArgumentException: Timestamp 
format must be -mm-dd hh:mm:ss[.f]




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21632
  
**[Test build #96220 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96220/testReport)**
 for PR 21632 at commit 
[`b9f2425`](https://github.com/apache/spark/commit/b9f2425bcf29997160b4f582ecc74d8a94c708cf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22456: [SPARK-19355][SQL] Fix variable names numberOfOutput -> ...

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22456
  
**[Test build #96206 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96206/testReport)**
 for PR 22456 at commit 
[`793fc19`](https://github.com/apache/spark/commit/793fc19d2519f47f5f3278b79e827f1159d9e440).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21632: [SPARK-19591][ML][MLlib] Add sample weights to de...

2018-09-18 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/21632#discussion_r218657097
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
 ---
@@ -66,6 +69,9 @@ class DecisionTreeClassifier @Since("1.4.0") (
   override def setMinInstancesPerNode(value: Int): this.type = 
set(minInstancesPerNode, value)
 
   /** @group setParam */
+  @Since("2.2.0")
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21632: [SPARK-19591][ML][MLlib] Add sample weights to de...

2018-09-18 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/21632#discussion_r218657039
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala 
---
@@ -65,6 +68,9 @@ class DecisionTreeRegressor @Since("1.4.0") 
(@Since("1.4.0") override val uid: S
   override def setMinInstancesPerNode(value: Int): this.type = 
set(minInstancesPerNode, value)
 
   /** @group setParam */
+  @Since("2.2.0")
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21632: [SPARK-19591][ML][MLlib] Add sample weights to de...

2018-09-18 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/21632#discussion_r218657065
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
 ---
@@ -97,28 +103,48 @@ class DecisionTreeClassifier @Since("1.4.0") (
   @Since("1.6.0")
   override def setSeed(value: Long): this.type = set(seed, value)
 
+  /**
+   * Sets the value of param [[weightCol]].
+   * If this is not set or empty, we treat all instance weights as 1.0.
+   * Default is not set, so all instances have weight one.
+   *
+   * @group setParam
+   */
+  @Since("2.2.0")
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21632
  
**[Test build #96219 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96219/testReport)**
 for PR 21632 at commit 
[`8a18157`](https://github.com/apache/spark/commit/8a18157b9ba7952457d91c771cae5c70405cacf7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21632
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3214/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21632
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21527: [SPARK-24519] Make the threshold for highly compr...

2018-09-18 Thread hthuynh2
Github user hthuynh2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21527#discussion_r218656012
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -50,7 +50,9 @@ private[spark] sealed trait MapStatus {
 private[spark] object MapStatus {
 
   def apply(loc: BlockManagerId, uncompressedSizes: Array[Long]): 
MapStatus = {
-if (uncompressedSizes.length > 2000) {
+if (uncompressedSizes.length >  Option(SparkEnv.get)
--- End diff --

How about creating a "static" val shuffleMinNumPartsToHighlyCompress for 
this? Please let me know if this is good for you so I can update it. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22460
  
**[Test build #96218 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96218/testReport)**
 for PR 22460 at commit 
[`76c99af`](https://github.com/apache/spark/commit/76c99afc9c5013968f34e9603a896e21c92a4eb2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3213/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22460: DO NOT MERGE

2018-09-18 Thread squito
GitHub user squito opened a pull request:

https://github.com/apache/spark/pull/22460

DO NOT MERGE



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/squito/spark debugging

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22460.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22460


commit 76c99afc9c5013968f34e9603a896e21c92a4eb2
Author: Imran Rashid 
Date:   2018-09-19T03:04:03Z

debugging




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22457: [SPARK-24626] Add statistics prefix to parallelFileListi...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22457
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96209/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22457: [SPARK-24626] Add statistics prefix to parallelFileListi...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22457
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22451: [SPARK-24777][SQL] Add write benchmark for AVRO

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22451
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3212/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22451: [SPARK-24777][SQL] Add write benchmark for AVRO

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22451
  
**[Test build #96217 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96217/testReport)**
 for PR 22451 at commit 
[`34af59d`](https://github.com/apache/spark/commit/34af59db044274e0c32a3d01cea7f260f0a36bdd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22457: [SPARK-24626] Add statistics prefix to parallelFileListi...

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22457
  
**[Test build #96209 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96209/testReport)**
 for PR 22457 at commit 
[`dde781f`](https://github.com/apache/spark/commit/dde781fa51782150f84932a5e2e701d0fdf2d355).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22451: [SPARK-24777][SQL] Add write benchmark for AVRO

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22451
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22451: [SPARK-24777][SQL] Add write benchmark for AVRO

2018-09-18 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22451
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16677
  
Let me take an example from the PR description
> For example, we have three partitions with rows (100, 100, 50) 
respectively. In global limit of 100 rows, we may take (34, 33, 33) rows for 
each partition locally. After global limit we still have three partitions.

Without this patch, we need to take the first 100 rows from each partition, 
and then perform a shuffle to send all data into one partition and take the 
first 100 rows.

So if the limit is big, this patch is super useful, if the limit is small, 
this patch is not that useful but should not be slower.

The only overhead I can think of is, `MapStatus` needs to carry the 
numRecords metrics. It should be a small overhead, as `MapStatus` already 
carries many information.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16677#discussion_r218652707
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -93,25 +96,93 @@ trait BaseLimitExec extends UnaryExecNode with 
CodegenSupport {
 }
 
 /**
- * Take the first `limit` elements of each child partition, but do not 
collect or shuffle them.
+ * Take the `limit` elements of the child output.
  */
-case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
+case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
UnaryExecNode {
 
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+  override def output: Seq[Attribute] = child.output
 
   override def outputPartitioning: Partitioning = child.outputPartitioning
-}
 
-/**
- * Take the first `limit` elements of the child's single output partition.
- */
-case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
 
-  override def requiredChildDistribution: List[Distribution] = AllTuples 
:: Nil
+  private val serializer: Serializer = new 
UnsafeRowSerializer(child.output.size)
 
-  override def outputPartitioning: Partitioning = child.outputPartitioning
+  protected override def doExecute(): RDD[InternalRow] = {
+val childRDD = child.execute()
+val partitioner = LocalPartitioning(childRDD)
+val shuffleDependency = ShuffleExchangeExec.prepareShuffleDependency(
+  childRDD, child.output, partitioner, serializer)
+val numberOfOutput: Seq[Long] = if 
(shuffleDependency.rdd.getNumPartitions != 0) {
+  // submitMapStage does not accept RDD with 0 partition.
+  // So, we will not submit this dependency.
+  val submittedStageFuture = 
sparkContext.submitMapStage(shuffleDependency)
+  submittedStageFuture.get().recordsByPartitionId.toSeq
+} else {
+  Nil
+}
 
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+// During global limit, try to evenly distribute limited rows across 
data
+// partitions. If disabled, scanning data partitions sequentially 
until reaching limit number.
+// Besides, if child output has certain ordering, we can't evenly pick 
up rows from
+// each parititon.
+val flatGlobalLimit = sqlContext.conf.limitFlatGlobalLimit && 
child.outputOrdering == Nil
+
+val shuffled = new ShuffledRowRDD(shuffleDependency)
+
+val sumOfOutput = numberOfOutput.sum
+if (sumOfOutput <= limit) {
+  shuffled
+} else if (!flatGlobalLimit) {
+  var numRowTaken = 0
+  val takeAmounts = numberOfOutput.map { num =>
+if (numRowTaken + num < limit) {
+  numRowTaken += num.toInt
+  num.toInt
+} else {
+  val toTake = limit - numRowTaken
+  numRowTaken += toTake
+  toTake
+}
+  }
+  val broadMap = sparkContext.broadcast(takeAmounts)
+  shuffled.mapPartitionsWithIndexInternal { case (index, iter) =>
+iter.take(broadMap.value(index).toInt)
+  }
+} else {
+  // We try to evenly require the asked limit number of rows across 
all child rdd's partitions.
+  var rowsNeedToTake: Long = limit
+  val takeAmountByPartition: Array[Long] = 
Array.fill[Long](numberOfOutput.length)(0L)
+  val remainingRowsByPartition: Array[Long] = Array(numberOfOutput: _*)
+
+  while (rowsNeedToTake > 0) {
+val nonEmptyParts = remainingRowsByPartition.count(_ > 0)
+// If the rows needed to take are less the number of non-empty 
partitions, take one row from
+// each non-empty partitions until we reach `limit` rows.
+// Otherwise, evenly divide the needed rows to each non-empty 
partitions.
+val takePerPart = math.max(1, rowsNeedToTake / nonEmptyParts)
+remainingRowsByPartition.zipWithIndex.foreach { case (num, index) 
=>
+  // In case `rowsNeedToTake` < `nonEmptyParts`, we may run out of 
`rowsNeedToTake` during
+  // the traversal, so we need to add this check.
+  if (rowsNeedToTake > 0 && num > 0) {
+if (num >= takePerPart) {
+  rowsNeedToTake -= takePerPart
+  takeAmountByPartition(index) += takePerPart
+  remainingRowsByPartition(index) -= takePerPart
+} else {
+  rowsNeedToTake -= num
+  takeAmountByPartition(index) += num
+  remainingRowsByPartition(index) -= num
+}
+  }
+}
+  }
+  val 

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22237
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22237
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96205/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22237
  
**[Test build #96205 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96205/testReport)**
 for PR 22237 at commit 
[`5884173`](https://github.com/apache/spark/commit/58841735e14fe334c779b83eb915bff3a4f900e6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...

2018-09-18 Thread stanzhai
Github user stanzhai commented on the issue:

https://github.com/apache/spark/pull/18544
  
The issue has been addressed a long time ago @cloud-fan @maropu 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22448: [SPARK-25417][SQL] Improve findTightestCommonType to coe...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22448
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96204/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16677#discussion_r218651545
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -93,25 +96,93 @@ trait BaseLimitExec extends UnaryExecNode with 
CodegenSupport {
 }
 
 /**
- * Take the first `limit` elements of each child partition, but do not 
collect or shuffle them.
+ * Take the `limit` elements of the child output.
  */
-case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
+case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
UnaryExecNode {
 
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+  override def output: Seq[Attribute] = child.output
 
   override def outputPartitioning: Partitioning = child.outputPartitioning
-}
 
-/**
- * Take the first `limit` elements of the child's single output partition.
- */
-case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
 
-  override def requiredChildDistribution: List[Distribution] = AllTuples 
:: Nil
+  private val serializer: Serializer = new 
UnsafeRowSerializer(child.output.size)
 
-  override def outputPartitioning: Partitioning = child.outputPartitioning
+  protected override def doExecute(): RDD[InternalRow] = {
+val childRDD = child.execute()
+val partitioner = LocalPartitioning(childRDD)
+val shuffleDependency = ShuffleExchangeExec.prepareShuffleDependency(
+  childRDD, child.output, partitioner, serializer)
+val numberOfOutput: Seq[Long] = if 
(shuffleDependency.rdd.getNumPartitions != 0) {
+  // submitMapStage does not accept RDD with 0 partition.
+  // So, we will not submit this dependency.
+  val submittedStageFuture = 
sparkContext.submitMapStage(shuffleDependency)
+  submittedStageFuture.get().recordsByPartitionId.toSeq
+} else {
+  Nil
+}
 
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+// During global limit, try to evenly distribute limited rows across 
data
+// partitions. If disabled, scanning data partitions sequentially 
until reaching limit number.
+// Besides, if child output has certain ordering, we can't evenly pick 
up rows from
+// each parititon.
+val flatGlobalLimit = sqlContext.conf.limitFlatGlobalLimit && 
child.outputOrdering == Nil
+
+val shuffled = new ShuffledRowRDD(shuffleDependency)
+
+val sumOfOutput = numberOfOutput.sum
+if (sumOfOutput <= limit) {
+  shuffled
+} else if (!flatGlobalLimit) {
+  var numRowTaken = 0
+  val takeAmounts = numberOfOutput.map { num =>
+if (numRowTaken + num < limit) {
+  numRowTaken += num.toInt
+  num.toInt
+} else {
+  val toTake = limit - numRowTaken
+  numRowTaken += toTake
+  toTake
+}
+  }
+  val broadMap = sparkContext.broadcast(takeAmounts)
+  shuffled.mapPartitionsWithIndexInternal { case (index, iter) =>
+iter.take(broadMap.value(index).toInt)
+  }
+} else {
+  // We try to evenly require the asked limit number of rows across 
all child rdd's partitions.
+  var rowsNeedToTake: Long = limit
+  val takeAmountByPartition: Array[Long] = 
Array.fill[Long](numberOfOutput.length)(0L)
+  val remainingRowsByPartition: Array[Long] = Array(numberOfOutput: _*)
+
+  while (rowsNeedToTake > 0) {
+val nonEmptyParts = remainingRowsByPartition.count(_ > 0)
+// If the rows needed to take are less the number of non-empty 
partitions, take one row from
+// each non-empty partitions until we reach `limit` rows.
+// Otherwise, evenly divide the needed rows to each non-empty 
partitions.
+val takePerPart = math.max(1, rowsNeedToTake / nonEmptyParts)
+remainingRowsByPartition.zipWithIndex.foreach { case (num, index) 
=>
+  // In case `rowsNeedToTake` < `nonEmptyParts`, we may run out of 
`rowsNeedToTake` during
+  // the traversal, so we need to add this check.
+  if (rowsNeedToTake > 0 && num > 0) {
+if (num >= takePerPart) {
+  rowsNeedToTake -= takePerPart
+  takeAmountByPartition(index) += takePerPart
+  remainingRowsByPartition(index) -= takePerPart
+} else {
+  rowsNeedToTake -= num
+  takeAmountByPartition(index) += num
+  remainingRowsByPartition(index) -= num
+}
+  }
+}
+  }
+  val 

[GitHub] spark issue #22448: [SPARK-25417][SQL] Improve findTightestCommonType to coe...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22448
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22448: [SPARK-25417][SQL] Improve findTightestCommonType to coe...

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22448
  
**[Test build #96204 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96204/testReport)**
 for PR 22448 at commit 
[`0de3328`](https://github.com/apache/spark/commit/0de332801c4b50f175b4c2eb0205096bcd4093b3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22456: [SPARK-19355][SQL] Fix variable names numberOfOut...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22456#discussion_r218651070
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -31,7 +31,7 @@ import org.apache.spark.util.Utils
 
 /**
  * Result returned by a ShuffleMapTask to a scheduler. Includes the block 
manager address that the
- * task ran on, the sizes of outputs for each reducer, and the number of 
outputs of the map task,
+ * task ran on, the sizes of outputs for each reducer, and the number of 
records of the map task,
--- End diff --

Shall we also change `the sizes of outputs for each reducer` to `the sizes 
of output records for each reducer`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22381: [SPARK-25394][CORE] Add an application status metrics so...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22381
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3211/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22381: [SPARK-25394][CORE] Add an application status metrics so...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22381
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22381: [SPARK-25394][CORE] Add an application status metrics so...

2018-09-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22381
  
**[Test build #96215 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96215/testReport)**
 for PR 22381 at commit 
[`3a2db16`](https://github.com/apache/spark/commit/3a2db16813a2bab3160f444fe0855855187a6178).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

2018-09-18 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22414
  
How about this? 
https://github.com/apache/spark/compare/master...maropu:pr22414
IMO simple fixes and tests are better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...

2018-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22165
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3210/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >