date:20180109

[GitHub] spark issue #20195: [SPARK-22972] Couldn't find corresponding Hive SerDe for...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20195
  
**[Test build #85842 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85842/testReport)**
 for PR 20195 at commit 
[`f55ace6`](https://github.com/apache/spark/commit/f55ace645b46a429a512eb8e922a7074c4cd8cc0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20013: [SPARK-20657][core] Speed up rendering of the stages pag...

2018-01-09 Thread gengliangwang

Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/20013
  
The major concern is that with these code changes,  the memory usage will 
be much larger with `InMemoryStore`.
Also building so many new indexes just for getting `computedQuantiles`,  
seems overkilling.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18853
  
**[Test build #85840 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85840/testReport)**
 for PR 18853 at commit 
[`97a071d`](https://github.com/apache/spark/commit/97a071d91ec25159bba655b2bd9f6e2134d87088).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20096
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85832/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20176
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18853
  
**[Test build #85841 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85841/testReport)**
 for PR 18853 at commit 
[`408e889`](https://github.com/apache/spark/commit/408e889caa8d61b7267f0f391be4af5fde82a0c9).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #85839 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85839/testReport)**
 for PR 13599 at commit 
[`9896de6`](https://github.com/apache/spark/commit/9896de66a6a2eb376aed75be6189c3852cd83f92).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: 
Boolean)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20176
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85838/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20096
  
**[Test build #85835 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85835/testReport)**
 for PR 20096 at commit 
[`341fb20`](https://github.com/apache/spark/commit/341fb20aa4d18f6964d27c87b48822588dfb1833).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, 
Long]) extends OffsetV2 `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20176
  
**[Test build #85838 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85838/testReport)**
 for PR 20176 at commit 
[`6f5b080`](https://github.com/apache/spark/commit/6f5b0803fb65b1cc88b0dc2e09d2e9efd76a1368).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20096
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19943
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85837/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20096
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85835/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18853
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85841/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18853
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18853
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85840/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20096
  
**[Test build #85832 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85832/testReport)**
 for PR 20096 at commit 
[`2261566`](https://github.com/apache/spark/commit/22615669cc20cda77819786df4ff34aab925a958).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class KafkaContinuousSourceTopicDeletionSuite extends 
KafkaContinuousTest `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #85830 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85830/testReport)**
 for PR 13599 at commit 
[`e231516`](https://github.com/apache/spark/commit/e231516ab7a9c1d380005f568f2a8decb2987186).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: 
Boolean)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20096
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18853
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19943
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19943
  
**[Test build #85837 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85837/testReport)**
 for PR 19943 at commit 
[`2cf98b6`](https://github.com/apache/spark/commit/2cf98b6734c806f66e21df50520a465b03d9f060).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85839/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85830/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19290: [SPARK-22063][R] Fixes lint check failures in R by lates...

2018-01-09 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19290
  
BTW, I believe we are testing it with R 3.4.1 via AppVeyor too. I have been 
thinking it's good to test both old and new versions ...  I think we have a 
weak promise for `R 3.1+`  - 
http://spark.apache.org/docs/latest/index.html#downloading


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-09 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13599
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests ...

2018-01-09 Thread Ngone51

GitHub user Ngone51 opened a pull request:

https://github.com/apache/spark/pull/20199

[Spark-22967][Hive]Fix VersionSuite's unit tests by change Windows path 
into URI path

## What changes were proposed in this pull request?

Two unit test will fail due to Windows format path:

1.test(s"$version: read avro file containing decimal")
```
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:java.lang.IllegalArgumentException: Can not create a Path 
from an empty string);
```

2.test(s"$version: SPARK-17920: Insert into/overwrite avro table")
```
Unable to infer the schema. The schema specification is required to create 
the table `default`.`tab2`.;
org.apache.spark.sql.AnalysisException: Unable to infer the schema. The 
schema specification is required to create the table `default`.`tab2`.;
```

This pr fix these two unit test by change Windows path into URI path.

## How was this patch tested?
Existed.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Ngone51/spark SPARK-22967

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20199.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20199


commit 3d1cafa1c9387b017a98f2983a4e98842a4a5921
Author: wuyi5 
Date:   2018-01-09T08:01:03Z

change Windows path into URI format path

commit 22669d1ff0cb00261fa146d276af237c115a0488
Author: wuyi5 
Date:   2018-01-09T08:08:30Z

leave deletion work to ShutdownHookManager




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #85843 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85843/testReport)**
 for PR 13599 at commit 
[`9896de6`](https://github.com/apache/spark/commit/9896de66a6a2eb376aed75be6189c3852cd83f92).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20199
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-09 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19943
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...

2018-01-09 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20176
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20176
  
**[Test build #85844 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85844/testReport)**
 for PR 20176 at commit 
[`6f5b080`](https://github.com/apache/spark/commit/6f5b0803fb65b1cc88b0dc2e09d2e9efd76a1368).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19943
  
**[Test build #85845 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85845/testReport)**
 for PR 19943 at commit 
[`2cf98b6`](https://github.com/apache/spark/commit/2cf98b6734c806f66e21df50520a465b03d9f060).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-01-09 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18853
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20196: [SPARK-23000] Fix Flaky test suite DataSourceWithHiveMet...

2018-01-09 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20196
  
LGTM, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20196: [SPARK-23000] Fix Flaky test suite DataSourceWith...

2018-01-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20196


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests ...

2018-01-09 Thread Ngone51

Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20199#discussion_r160344054
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -58,7 +58,7 @@ class VersionsSuite extends SparkFunSuite with Logging {
*/
   protected def withTempDir(f: File => Unit): Unit = {
 val dir = Utils.createTempDir().getCanonicalFile
-try f(dir) finally Utils.deleteRecursively(dir)
+f(dir)
--- End diff --

Leave deletion work to ShutdownHookManager to avoid delete IOException 
caused by 'file occupation in other program' error on Windows. (SEE SPARK-22967)
And temp dirs will be cleaned up after unit test completed, but this is 
only guaranteed for test(s"$version: SPARK-17920: Insert into/overwrite avro 
table"). And a lot of temp dirs produced by some other unit tests will still 
remains on Windows for unclear reason, maybe 'file occupation in other program 
' too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18853
  
**[Test build #85846 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85846/testReport)**
 for PR 18853 at commit 
[`408e889`](https://github.com/apache/spark/commit/408e889caa8d61b7267f0f391be4af5fde82a0c9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...

2018-01-09 Thread Ngone51

Github user Ngone51 commented on the issue:

https://github.com/apache/spark/pull/20199
  
cc @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20163: [SPARK-22966][PySpark] Spark SQL should handle Py...

2018-01-09 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20163#discussion_r160345387
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala
 ---
@@ -120,10 +121,18 @@ object EvaluatePython {
 case (c: java.math.BigDecimal, dt: DecimalType) => Decimal(c, 
dt.precision, dt.scale)
 
 case (c: Int, DateType) => c
+// Pyrolite will unpickle a Python datetime.date to a 
java.util.Calendar
+case (c: Calendar, DateType) => 
DateTimeUtils.fromJavaCalendarForDate(c)
--- End diff --

How about we return `null` in this case? Other cases seems also returning 
`null` if it fails to be converted:

```
>>> from pyspark.sql.functions import udf
>>> f = udf(lambda x: x, "double")
>>> spark.range(1).select(f("id")).show()
++
|(id)|
++
|null|
++
```

Seems we can do it like:

```scala

case StringType => (obj: Any) => nullSafeConvert(obj) {
  case c: Calendar => null
  case _ => UTF8String.fromString(obj.toString)
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...

2018-01-09 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20199
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...

2018-01-09 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20199
  
Will take a look soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20199
  
**[Test build #85847 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85847/testReport)**
 for PR 20199 at commit 
[`22669d1`](https://github.com/apache/spark/commit/22669d1ff0cb00261fa146d276af237c115a0488).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...

2018-01-09 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20179#discussion_r160347387
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala ---
@@ -196,11 +196,24 @@ private[spark] class IndexShuffleBlockResolver(
 // find out the consolidated file, then the offset within that from 
our index
 val indexFile = getIndexFile(blockId.shuffleId, blockId.mapId)
 
-val in = new DataInputStream(new FileInputStream(indexFile))
+// SPARK-22982: if this FileInputStream's position is seeked forward 
by another piece of code
+// which is incorrectly using our file descriptor then this code will 
fetch the wrong offsets
+// (which may cause a reducer to be sent a different reducer's data). 
The explicit position
+// checks added here were a useful debugging aid during SPARK-22982 
and may help prevent this
+// class of issue from re-occurring in the future which is why they 
are left here even though
+// SPARK-22982 is fixed.
+val channel = Files.newByteChannel(indexFile.toPath)
+channel.position(blockId.reduceId * 8)
+val in = new DataInputStream(Channels.newInputStream(channel))
 try {
-  ByteStreams.skipFully(in, blockId.reduceId * 8)
   val offset = in.readLong()
   val nextOffset = in.readLong()
+  val actualPosition = channel.position()
+  val expectedPosition = blockId.reduceId * 8 + 16
+  if (actualPosition != expectedPosition) {
+throw new Exception(s"SPARK-22982: Incorrect channel position 
after index file reads: " +
--- End diff --

Maybe we'd better change to some specific `Exception` type here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

2018-01-09 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20085#discussion_r160347559
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -182,6 +182,111 @@ case class StaticInvoke(
   }
 }
 
+/**
+ * Invokes a call to reference to a static field.
+ *
+ * @param staticObject The target of the static call.  This can either be 
the object itself
+ * (methods defined on scala objects), or the class 
object
+ * (static methods defined in java).
+ * @param dataType The expected return type of the function call.
+ * @param fieldName The field to reference.
+ */
+case class StaticField(
+  staticObject: Class[_],
+  dataType: DataType,
+  fieldName: String) extends Expression with NonSQLExpression {
+
+  val objectName = staticObject.getName.stripSuffix("$")
+
+  override def nullable: Boolean = false
+  override def children: Seq[Expression] = Nil
+
+  override def eval(input: InternalRow): Any =
+throw new UnsupportedOperationException("Only code-generated 
evaluation is supported.")
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val javaType = ctx.javaType(dataType)
+
+val code = s"""
+  final $javaType ${ev.value} = $objectName.$fieldName;
--- End diff --

do we need this expression for such a simple function?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...

2018-01-09 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20179#discussion_r160347716
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala ---
@@ -196,11 +196,24 @@ private[spark] class IndexShuffleBlockResolver(
 // find out the consolidated file, then the offset within that from 
our index
 val indexFile = getIndexFile(blockId.shuffleId, blockId.mapId)
 
-val in = new DataInputStream(new FileInputStream(indexFile))
+// SPARK-22982: if this FileInputStream's position is seeked forward 
by another piece of code
+// which is incorrectly using our file descriptor then this code will 
fetch the wrong offsets
+// (which may cause a reducer to be sent a different reducer's data). 
The explicit position
+// checks added here were a useful debugging aid during SPARK-22982 
and may help prevent this
+// class of issue from re-occurring in the future which is why they 
are left here even though
+// SPARK-22982 is fixed.
+val channel = Files.newByteChannel(indexFile.toPath)
+channel.position(blockId.reduceId * 8)
--- End diff --

Sorry I'm not clear whether the change here is related to "asynchronous 
close()" issue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...

2018-01-09 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20179#discussion_r160347954
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala 
---
@@ -376,18 +374,13 @@ private[netty] class NettyRpcEnv(
 
 def setError(e: Throwable): Unit = {
   error = e
-  source.close()
 }
 
 override def read(dst: ByteBuffer): Int = {
   Try(source.read(dst)) match {
+case _ if error != null => throw error
--- End diff --

I think it is better to also add a short comment here. This bug is subtle 
and no test against it now. Just from this code, it is hard to know why we 
check error even success.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20179: [SPARK-22982] Remove unsafe asynchronous close() call fr...

2018-01-09 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20179
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...

2018-01-09 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20085#discussion_r160349007
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -1237,47 +1342,91 @@ case class DecodeUsingSerializer[T](child: 
Expression, tag: ClassTag[T], kryo: B
 }
 
 /**
- * Initialize a Java Bean instance by setting its field values via setters.
+ * Initialize an object by invoking the given sequence of method names and 
method arguments.
+ *
+ * @param objectInstance An expression evaluating to a new instance of the 
object to initialize
+ * @param setters A sequence of method names and their sequence of 
argument expressions to apply in
+ *series to the object instance
  */
-case class InitializeJavaBean(beanInstance: Expression, setters: 
Map[String, Expression])
+case class InitializeObject(
+  objectInstance: Expression,
+  setters: Seq[(String, Seq[Expression])])
--- End diff --

To generalize, I think we can just have a `NewObject` expression, which 
just do `new SomeClass`, the setters are just a bunch of `Invoke`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...

2018-01-09 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20179#discussion_r160349274
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala ---
@@ -196,11 +196,24 @@ private[spark] class IndexShuffleBlockResolver(
 // find out the consolidated file, then the offset within that from 
our index
 val indexFile = getIndexFile(blockId.shuffleId, blockId.mapId)
 
-val in = new DataInputStream(new FileInputStream(indexFile))
+// SPARK-22982: if this FileInputStream's position is seeked forward 
by another piece of code
+// which is incorrectly using our file descriptor then this code will 
fetch the wrong offsets
+// (which may cause a reducer to be sent a different reducer's data). 
The explicit position
+// checks added here were a useful debugging aid during SPARK-22982 
and may help prevent this
+// class of issue from re-occurring in the future which is why they 
are left here even though
+// SPARK-22982 is fixed.
+val channel = Files.newByteChannel(indexFile.toPath)
+channel.position(blockId.reduceId * 8)
--- End diff --

It's used to detect bugs like "asynchronous close()" earlier in the future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20163: [SPARK-22966][PySpark] Spark SQL should handle Py...

2018-01-09 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20163#discussion_r160349750
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala
 ---
@@ -120,10 +121,18 @@ object EvaluatePython {
 case (c: java.math.BigDecimal, dt: DecimalType) => Decimal(c, 
dt.precision, dt.scale)
 
 case (c: Int, DateType) => c
+// Pyrolite will unpickle a Python datetime.date to a 
java.util.Calendar
+case (c: Calendar, DateType) => 
DateTimeUtils.fromJavaCalendarForDate(c)
--- End diff --

Yea it's consistent with other un-convertible cases, but `StringType` is 
the default return type, I'm afraid many users many hit this and get confused.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...

2018-01-09 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20179#discussion_r160351383
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala ---
@@ -196,11 +196,24 @@ private[spark] class IndexShuffleBlockResolver(
 // find out the consolidated file, then the offset within that from 
our index
 val indexFile = getIndexFile(blockId.shuffleId, blockId.mapId)
 
-val in = new DataInputStream(new FileInputStream(indexFile))
+// SPARK-22982: if this FileInputStream's position is seeked forward 
by another piece of code
+// which is incorrectly using our file descriptor then this code will 
fetch the wrong offsets
+// (which may cause a reducer to be sent a different reducer's data). 
The explicit position
+// checks added here were a useful debugging aid during SPARK-22982 
and may help prevent this
+// class of issue from re-occurring in the future which is why they 
are left here even though
+// SPARK-22982 is fixed.
+val channel = Files.newByteChannel(indexFile.toPath)
+channel.position(blockId.reduceId * 8)
--- End diff --

I see. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20163: [SPARK-22966][PySpark] Spark SQL should handle Py...

2018-01-09 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20163#discussion_r160355531
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala
 ---
@@ -120,10 +121,18 @@ object EvaluatePython {
 case (c: java.math.BigDecimal, dt: DecimalType) => Decimal(c, 
dt.precision, dt.scale)
 
 case (c: Int, DateType) => c
+// Pyrolite will unpickle a Python datetime.date to a 
java.util.Calendar
+case (c: Calendar, DateType) => 
DateTimeUtils.fromJavaCalendarForDate(c)
--- End diff --

Right. Let's go ahead for 2. then. I am fine if it's done as an exception 
for practical purpose. Maybe we could add an if `isinstance(.., basestring)` 
and return directly as a shortcut. I haven't checked the perf diff but I think 
we can do it easily via profile as I mentioned above.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20193: [SPARK-22998][K8S] Set missing value for SPARK_MO...

2018-01-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20193


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20193: [SPARK-22998][K8S] Set missing value for SPARK_MOUNTED_C...

2018-01-09 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20193
  
merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-09 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20163
  
I investigated the behavior differences between `udf` and `pandas_udf` for 
the wrong return types and found there are many differences actually.
Basically `udf`s return `null` as @HyukjinKwon mentioned, whereas 
`pandas_udf`s throw some `ArrowException`. There seem some exceptions, though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20163: [SPARK-22966][PySpark] Spark SQL should handle Py...

2018-01-09 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20163#discussion_r160358011
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala
 ---
@@ -120,10 +121,18 @@ object EvaluatePython {
 case (c: java.math.BigDecimal, dt: DecimalType) => Decimal(c, 
dt.precision, dt.scale)
 
 case (c: Int, DateType) => c
+// Pyrolite will unpickle a Python datetime.date to a 
java.util.Calendar
+case (c: Calendar, DateType) => 
DateTimeUtils.fromJavaCalendarForDate(c)
--- End diff --

WDYT about ^ @ueshin?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-09 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19943#discussion_r160358917
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java
 ---
@@ -0,0 +1,523 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc;
+
+import java.io.IOException;
+import java.util.stream.IntStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileSplit;
+import org.apache.orc.OrcConf;
+import org.apache.orc.OrcFile;
+import org.apache.orc.Reader;
+import org.apache.orc.TypeDescription;
+import org.apache.orc.mapred.OrcInputFormat;
+import org.apache.orc.storage.common.type.HiveDecimal;
+import org.apache.orc.storage.ql.exec.vector.*;
+import org.apache.orc.storage.serde2.io.HiveDecimalWritable;
+
+import org.apache.spark.memory.MemoryMode;
+import org.apache.spark.sql.catalyst.InternalRow;
+import org.apache.spark.sql.execution.vectorized.ColumnVectorUtils;
+import org.apache.spark.sql.execution.vectorized.OffHeapColumnVector;
+import org.apache.spark.sql.execution.vectorized.OnHeapColumnVector;
+import org.apache.spark.sql.execution.vectorized.WritableColumnVector;
+import org.apache.spark.sql.types.*;
+import org.apache.spark.sql.vectorized.ColumnarBatch;
+
+
+/**
+ * To support vectorization in WholeStageCodeGen, this reader returns 
ColumnarBatch.
+ * After creating, `initialize` and `initBatch` should be called 
sequentially.
+ */
+public class OrcColumnarBatchReader extends RecordReader {
+
+  /**
+   * The default size of batch. We use this value for both ORC and Spark 
consistently
--- End diff --

nit: We use this value for ORC reader to make it consistent with Spark's 
columnar batch, because their default batch sizes are different like the 
following.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

2018-01-09 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20163
  
Probably we consider to catch and set nulls in pandas_udf if possible to 
match the behaviour with udf ... 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-09 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19943#discussion_r160360721
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 ---
@@ -118,6 +118,13 @@ class OrcFileFormat
 }
   }
 
+  override def supportBatch(sparkSession: SparkSession, schema: 
StructType): Boolean = {
+val conf = sparkSession.sessionState.conf
+conf.orcVectorizedReaderEnabled && conf.wholeStageEnabled &&
+  schema.length <= conf.wholeStageMaxNumFields &&
+  schema.forall(_.dataType.isInstanceOf[AtomicType])
+  }
+
--- End diff --

Do we need to implement `vectorTypes` as `ParquetFileFormat`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20096
  
**[Test build #85848 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85848/testReport)**
 for PR 20096 at commit 
[`2628bd4`](https://github.com/apache/spark/commit/2628bd4fd170b2d11dd77947312a57361b186bf7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2018-01-09 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r160361785
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -117,6 +117,7 @@ object DecimalType extends AbstractDataType {
   val MAX_SCALE = 38
   val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 18)
   val USER_DEFAULT: DecimalType = DecimalType(10, 0)
+  val MINIMUM_ADJUSTED_SCALE = 6
--- End diff --

@gatorsmile what about `spark.sql.decimalOperations.mode` which defaults to 
`native` and accepts also `hive` (and in future also `sql2011` for throwing 
exception instead of returning NULL)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20163: [SPARK-22966][PySpark] Spark SQL should handle Py...

2018-01-09 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20163#discussion_r160364055
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala
 ---
@@ -120,10 +121,18 @@ object EvaluatePython {
 case (c: java.math.BigDecimal, dt: DecimalType) => Decimal(c, 
dt.precision, dt.scale)
 
 case (c: Int, DateType) => c
+// Pyrolite will unpickle a Python datetime.date to a 
java.util.Calendar
+case (c: Calendar, DateType) => 
DateTimeUtils.fromJavaCalendarForDate(c)
--- End diff --

Yeah, 2. should work for `StringType`.

I'd also like to add some documents like 1. for users to be careful about 
the return type. I've found that `udf`s return `null` and `pandas_udf`s throw 
some exception in most case when the return type is mismatching.
Of course we can try to make the behavior differences between `udf` and 
`pandas_udf` closer as possible in the future, but I think it is the best 
effort basis for the mismatching return type.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18853
  
**[Test build #85846 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85846/testReport)**
 for PR 18853 at commit 
[`408e889`](https://github.com/apache/spark/commit/408e889caa8d61b7267f0f391be4af5fde82a0c9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18853
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18853
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85846/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18853
  
**[Test build #85849 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85849/testReport)**
 for PR 18853 at commit 
[`e763330`](https://github.com/apache/spark/commit/e763330edae88d4dad410214608fb5448d90a989).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20195: [SPARK-22972] Couldn't find corresponding Hive SerDe for...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20195
  
**[Test build #85842 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85842/testReport)**
 for PR 20195 at commit 
[`f55ace6`](https://github.com/apache/spark/commit/f55ace645b46a429a512eb8e922a7074c4cd8cc0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20195: [SPARK-22972] Couldn't find corresponding Hive SerDe for...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20195
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20195: [SPARK-22972] Couldn't find corresponding Hive SerDe for...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20195
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85842/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2018-01-09 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r160376096
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -117,6 +117,7 @@ object DecimalType extends AbstractDataType {
   val MAX_SCALE = 38
   val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 18)
   val USER_DEFAULT: DecimalType = DecimalType(10, 0)
+  val MINIMUM_ADJUSTED_SCALE = 6
--- End diff --

how about `spark.sql.decimalOperations.allowTruncat`? Let's leave the mode 
stuff to the type coercion mode.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2018-01-09 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r160376186
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -117,6 +117,7 @@ object DecimalType extends AbstractDataType {
   val MAX_SCALE = 38
   val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 18)
   val USER_DEFAULT: DecimalType = DecimalType(10, 0)
+  val MINIMUM_ADJUSTED_SCALE = 6
--- End diff --

We should make it an internal conf and remove it after some releases.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20199
  
**[Test build #85847 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85847/testReport)**
 for PR 20199 at commit 
[`22669d1`](https://github.com/apache/spark/commit/22669d1ff0cb00261fa146d276af237c115a0488).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20199
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20199
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85847/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20189: [SPARK-22975] MetricsReporter should not throw exception...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20189
  
**[Test build #85850 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85850/testReport)**
 for PR 20189 at commit 
[`7242eab`](https://github.com/apache/spark/commit/7242eabe00ce84cb132a4a4f16cb53bed1e6afa7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20167: Allow providing Mesos principal & secret via files (SPAR...

2018-01-09 Thread rvesse

Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/20167
  
CC @ArtRand @vanzin I would appreciate your reviews as and when you have 
time


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20176
  
**[Test build #85844 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85844/testReport)**
 for PR 20176 at commit 
[`6f5b080`](https://github.com/apache/spark/commit/6f5b0803fb65b1cc88b0dc2e09d2e9efd76a1368).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20176
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20176
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85844/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19943
  
**[Test build #85845 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85845/testReport)**
 for PR 19943 at commit 
[`2cf98b6`](https://github.com/apache/spark/commit/2cf98b6734c806f66e21df50520a465b03d9f060).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19943
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19943
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85845/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #85843 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85843/testReport)**
 for PR 13599 at commit 
[`9896de6`](https://github.com/apache/spark/commit/9896de66a6a2eb376aed75be6189c3852cd83f92).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: 
Boolean)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85843/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20200: [SPARK-23005][Core] Improve RDD.take on small num...

2018-01-09 Thread gengliangwang

GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/20200

[SPARK-23005][Core] Improve RDD.take on small number of partitions

## What changes were proposed in this pull request?
In current implementation of RDD.take, we overestimate the number of 
partitions we need to try by 50%:
`(1.5 * num * partsScanned / buf.size).toInt`
However, when the number is small, the result of `.toInt` is not what we 
want. 
E.g, 2.9 will become 2, which should be 3.
Use math.Ceil fix the problem.

Also clean up the code in RDD.scala.

## How was this patch tested?

Unit test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark Take

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20200.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20200


commit 93a3d8447f5d0d3c576a312084144f16c787cf16
Author: Wang Gengliang 
Date:   2018-01-09T11:46:36Z

Improve take and clean up RDD.scala




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20200: [SPARK-23005][Core] Improve RDD.take on small number of ...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20200
  
**[Test build #85851 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85851/testReport)**
 for PR 20200 at commit 
[`93a3d84`](https://github.com/apache/spark/commit/93a3d8447f5d0d3c576a312084144f16c787cf16).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20200: [SPARK-23005][Core] Improve RDD.take on small num...

2018-01-09 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20200#discussion_r160390893
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -985,7 +985,7 @@ abstract class RDD[T: ClassTag](
   def subtract(
   other: RDD[T],
   p: Partitioner)(implicit ord: Ordering[T] = null): RDD[T] = 
withScope {
-if (partitioner == Some(p)) {
+if (partitioner.contains(p)) {
--- End diff --

Do we still support scala 2.10? If we do, this will fail compilation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20200: [SPARK-23005][Core] Improve RDD.take on small num...

2018-01-09 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20200#discussion_r160391233
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -985,7 +985,7 @@ abstract class RDD[T: ClassTag](
   def subtract(
   other: RDD[T],
   p: Partitioner)(implicit ord: Ordering[T] = null): RDD[T] = 
withScope {
-if (partitioner == Some(p)) {
+if (partitioner.contains(p)) {
--- End diff --

Actually I think the previous code is more readable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20200: [SPARK-23005][Core] Improve RDD.take on small num...

2018-01-09 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20200#discussion_r160391487
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1345,13 +1346,12 @@ abstract class RDD[T: ClassTag](
   if (buf.isEmpty) {
 numPartsToTry = partsScanned * scaleUpFactor
   } else {
-// the left side of max is >=1 whenever partsScanned >= 2
-numPartsToTry = Math.max((1.5 * num * partsScanned / 
buf.size).toInt - partsScanned, 1)
+// As left > 0, numPartsToTry is always >= 1
--- End diff --

This is the same a s SparkPlan. executeTake(). Should we also fix that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20096
  
**[Test build #85848 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85848/testReport)**
 for PR 20096 at commit 
[`2628bd4`](https://github.com/apache/spark/commit/2628bd4fd170b2d11dd77947312a57361b186bf7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20096
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85848/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...

2018-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20096
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...

2018-01-09 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20023#discussion_r160394589
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -117,6 +117,7 @@ object DecimalType extends AbstractDataType {
   val MAX_SCALE = 38
   val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 18)
   val USER_DEFAULT: DecimalType = DecimalType(10, 0)
+  val MINIMUM_ADJUSTED_SCALE = 6
--- End diff --

ok, I'll go with that, thanks @cloud-fan.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20201: [SPARK-22389][SQL] data source v2 partitioning re...

2018-01-09 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/20201

[SPARK-22389][SQL] data source v2 partitioning reporting interface

## What changes were proposed in this pull request?

a new interface which allows data source to report partitioning and avoid 
shuffle at Spark side.

The design is pretty like the internal distribution/partitioing framework. 
Spark defines a `Distribution` interfaces and several concrete implementations, 
and ask the data source to report a `Partitioning`, the `Partitioning` should 
tell Spark if it can satisfy a `Distribution` or not.

## How was this patch tested?

new test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark partition-reporting

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20201.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20201


commit be14e3bd7598eb3ed583e18c1d9927d5c7f563b4
Author: Wenchen Fan 
Date:   2018-01-09T02:08:53Z

data source v2 partitioning reporting interface




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20201: [SPARK-22389][SQL] data source v2 partitioning reporting...

2018-01-09 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20201
  
cc @rxin @RussellSpitzer @kiszk @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20201: [SPARK-22389][SQL] data source v2 partitioning reporting...

2018-01-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20201
  
**[Test build #85852 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85852/testReport)**
 for PR 20201 at commit 
[`be14e3b`](https://github.com/apache/spark/commit/be14e3bd7598eb3ed583e18c1d9927d5c7f563b4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 517 matches

Mail list logo