date:20180130

[GitHub] spark issue #10949: [SPARK-12832][MESOS] mesos scheduler respect agent attri...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/10949
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...

2018-01-30 Thread bersprockets

Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/20424#discussion_r164855511
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -191,7 +191,20 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
 daemon = pb.start()
 
 val in = new DataInputStream(daemon.getInputStream)
-daemonPort = in.readInt()
+try {
+  daemonPort = in.readInt()
+} catch {
+  case exc: EOFException =>
+throw new IOException(s"No port number in $daemonModule's 
stdout")
+}
+
+// test that the returned port number is within a valid range.
+// note: this does not cover the case where the port number
+// is arbitrary data but is also coincidentally within range
+if (daemonPort < 1 || daemonPort > 0x) {
--- End diff --

Oh, I see. Let me address the two parts of the comment separately:

First part: What's the point of throwing an exception for a bad port number 
when the original handling did that already?

The original handling was:


 java.lang.IllegalArgumentException: port out of range:1315905645
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)


This error occurred in a different function than the one that obtained the 
port number. So you had to track down the source of the port number. This 
actually added an extra step to the original debugging for the sitecustomize.py 
issue.

The proposed handling is:


java.io.IOException: Bad port number in pyspark.daemon's stdout: 0x4e6f206d
at 
org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:205)


Here we're not saying "somehow we got a bad port number", but that a 
particular python module (the name of which is displayed, since the name of 
that module is configurable) has returned bad data.

Also, the check is at the point at which the port number is obtained, not 
waiting until sometime later in another function where the port number is used 
(and where possibly something else has changed the port number, which is not 
likely, but you would need to check that).

Perhaps the message could be better, e.g.,:


Bad data in pyspark.daemon's output.
Expected valid port number, got 0x4e6f206d.
PYTHONPATH set to 
'/Users/brobbins/github/spark_fork/python/lib/pyspark.zip:/Users/brobbins/github/spark_fork/python/lib/py4j-0.10.6-src.zip:/Users/brobbins/github/spark_fork/assembly/target/scala-2.11/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/Users/brobbins/github/spark_fork/python/lib/py4j-0.10.6-src.zip:/Users/brobbins/github/spark_fork/python/:'
Command to run python module was 'python -m pyspark.daemon'
Check whether you have a sitecustomize.py module that may be printing 
output to stdout.


Second part: Why don't we fix this?

That's reasonable. A couple of points:

- The name of the python daemon module is now configurable so that the 
module can be wrapped with customizations. It appears that this is only in the 
main branch and not even released on Spark 2.3, so it might be safe to change 
daemon.py (and potentially its existing wrappers) to return the port number in 
a different way.
- I don't have a good feel for how often sitecustomize.py is used, so not 
sure of the relative value of some mild hacking up of this code vs. just 
letting the user know what happened.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...

2018-01-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20434


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20434
  
Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20434
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86835/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20434
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20434
  
**[Test build #86835 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86835/testReport)**
 for PR 20434 at commit 
[`c64bdfa`](https://github.com/apache/spark/commit/c64bdfa919cbb61cef636519673d780a2f2b6923).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20439: [SPARK-23261][PYSPARK][BACKPORT-2.3] Rename Pandas UDFs

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20439
  
**[Test build #86841 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86841/testReport)**
 for PR 20439 at commit 
[`1a70ae1`](https://github.com/apache/spark/commit/1a70ae195d345962fb9bc03a2abf4e3b812ae376).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20439: [SPARK-23261][PYSPARK][BACKPORT-2.3] Rename Pandas UDFs

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20439
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/399/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20439: [SPARK-23261][PYSPARK][BACKPORT-2.3] Rename Pandas UDFs

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20439
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20439: [SPARK-23261][PYSPARK][BACKPORT-2.3] Rename Pandas UDFs

2018-01-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20439
  
cc @HyukjinKwon @cloud-fan @ueshin @BryanCutler @icexelloss 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20439: [SPARK-23261][PYSPARK][BACKPORT-2.3] Rename Panda...

2018-01-30 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/20439

[SPARK-23261][PYSPARK][BACKPORT-2.3] Rename Pandas UDFs

This PR is to backport https://github.com/apache/spark/pull/20428 to Spark 
2.3 without adding the changes regarding `GROUPED AGG PANDAS UDF`

---

## What changes were proposed in this pull request?
Rename the public APIs and names of pandas udfs. 

- `PANDAS SCALAR UDF` -> `SCALAR PANDAS UDF`
- `PANDAS GROUP MAP UDF` -> `GROUPED MAP PANDAS UDF`
- `PANDAS GROUP AGG UDF` -> `GROUPED AGG PANDAS UDF`

## How was this patch tested?
The existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark backport2.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20439.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20439


commit 1a70ae195d345962fb9bc03a2abf4e3b812ae376
Author: gatorsmile 
Date:   2018-01-30T12:55:55Z

[SPARK-23261][PYSPARK] Rename Pandas UDFs

Rename the public APIs and names of pandas udfs.

- `PANDAS SCALAR UDF` -> `SCALAR PANDAS UDF`
- `PANDAS GROUP MAP UDF` -> `GROUPED MAP PANDAS UDF`
- `PANDAS GROUP AGG UDF` -> `GROUPED AGG PANDAS UDF`

The existing tests

Author: gatorsmile 

Closes #20428 from gatorsmile/renamePandasUDFs.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19802: [SPARK-22594][CORE] Handling spark-submit and master ver...

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19802
  
**[Test build #4086 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4086/testReport)**
 for PR 19802 at commit 
[`4f79632`](https://github.com/apache/spark/commit/4f79632d22b67128a6be8a285f4fc1fec0d5f12f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20434
  
Yes. We need to avoid the performance regression since the last release 
Spark 2.2


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20434
  
I see. The baseline is 2.2, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20408: [SPARK-23189][Core][Web UI] Reflect stage level blacklis...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20408
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20434
  
This is to revert back to the original behavior. Thus, we do not introduce 
anything else compared with 2.2 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20408: [SPARK-23189][Core][Web UI] Reflect stage level blacklis...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20408
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86833/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20408: [SPARK-23189][Core][Web UI] Reflect stage level blacklis...

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20408
  
**[Test build #86833 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86833/testReport)**
 for PR 20408 at commit 
[`ea47877`](https://github.com/apache/spark/commit/ea478779392429f2e84f762819ed29fa392abae1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20434
  
@gatorsmile .
In the original PR, https://github.com/apache/spark/pull/18810, there was a 
microbenchmark.
Can we have the result on the same benchmark here, too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...

2018-01-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20434#discussion_r164845127
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -660,12 +660,13 @@ object SQLConf {
   val WHOLESTAGE_HUGE_METHOD_LIMIT = 
buildConf("spark.sql.codegen.hugeMethodLimit")
 .internal()
 .doc("The maximum bytecode size of a single compiled Java function 
generated by whole-stage " +
-  "codegen. When the compiled function exceeds this threshold, " +
-  "the whole-stage codegen is deactivated for this subtree of the 
current query plan. " +
-  s"The default value is 
${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " +
-  "this is a limit in the OpenJDK JVM implementation.")
+  "codegen. When the compiled function exceeds this threshold, the 
whole-stage codegen is " +
+  "deactivated for this subtree of the current query plan. The default 
value is 65535, which " +
+  "is the largest bytecode size possible for a valid Java method. When 
running on HotSpot, " +
+  s"it may be preferable to set the value to 
${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} " +
+  "to match HotSpot's implementation.")
 .intConf
-.createWithDefault(CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT)
+.createWithDefault(65535)
--- End diff --

cc @mgaido91 .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20422
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86831/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20422
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20422
  
**[Test build #86831 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86831/testReport)**
 for PR 20422 at commit 
[`6196770`](https://github.com/apache/spark/commit/61967706c6f3804a84819f8484abeff5d1d77eea).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20343
  
Thanks for submitting the PR https://github.com/apache/spark/pull/20433. It 
sounds like there are still some test failure. Will review it after 2.3 
release. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20343
  
@maropu Yeah. As long as the queries are different, we should keep both 
versions. This is to help the others understand we fully support TPC-DS queries 
without the changes. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20435
  
**[Test build #86840 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86840/testReport)**
 for PR 20435 at commit 
[`0b6b59e`](https://github.com/apache/spark/commit/0b6b59ea86d00e8128af98891fa5d10934cb65cd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20435
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20435
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/398/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/20435
  
cc @marmbrus 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20435
  
LGTM to adding the new package of partitioning/distribution. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20435
  
**[Test build #86839 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86839/testReport)**
 for PR 20435 at commit 
[`0b6b59e`](https://github.com/apache/spark/commit/0b6b59ea86d00e8128af98891fa5d10934cb65cd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread gengliangwang

Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/20435
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20386
  
@rdblue The target is 2.3 release. Thanks for your time!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread jose-torres

Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/20435
  
Streaming part LGTM; I have no particular opinion or context on the 
distribution stuff.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20435
  
cc @zsxwing @marmbrus too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20435
  
**[Test build #86838 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86838/testReport)**
 for PR 20435 at commit 
[`0b6b59e`](https://github.com/apache/spark/commit/0b6b59ea86d00e8128af98891fa5d10934cb65cd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20408: [SPARK-23189][Core][Web UI] Reflect stage level b...

2018-01-30 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/20408#discussion_r164824439
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
@@ -594,12 +606,24 @@ private[spark] class AppStatusListener(
 
   stage.executorSummaries.values.foreach(update(_, now))
   update(stage, now, last = true)
+
+  val executorIdsForStage = stage.executorSummaries.keySet
+  executorIdsForStage.foreach { executorId =>
+liveExecutors.get(executorId).foreach { exec =>
+  removeBlackListedStageFrom(exec, event.stageInfo.stageId, now)
+}
+  }
 }
 
 appSummary = new AppSummary(appSummary.numCompletedJobs, 
appSummary.numCompletedStages + 1)
 kvstore.write(appSummary)
   }
 
+  private def removeBlackListedStageFrom(exec: LiveExecutor, stageId: Int, 
now: Long) = {
+exec.blacklistedInStages -= stageId
+liveUpdate(exec, now)
--- End diff --

hmm actually I just thought of something else.  It looks like you're 
calling `liveUpdate` here for *every* executor when the stage finishes.  Say 
you have 1000 execs, a very quick stage, and no blacklisting, this is an 
expensive update for no actual change.

So you should at least avoid the `liveUpdate` if `exec.blacklistedInStages` 
hasn't changed at all.  But really, I think that `LiveStage` should maintain a 
set of blacklisted executors, so you avoid calling this entirely for execs 
which aren't blacklisted.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20435
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20435
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/397/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20435
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20167: [SPARK-16501] [MESOS] Allow providing Mesos princ...

2018-01-30 Thread ArtRand

Github user ArtRand commented on a diff in the pull request:

https://github.com/apache/spark/pull/20167#discussion_r164825718
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala
 ---
@@ -71,40 +74,64 @@ trait MesosSchedulerUtils extends Logging {
   failoverTimeout: Option[Double] = None,
   frameworkId: Option[String] = None): SchedulerDriver = {
 val fwInfoBuilder = 
FrameworkInfo.newBuilder().setUser(sparkUser).setName(appName)
-val credBuilder = Credential.newBuilder()
+
fwInfoBuilder.setHostname(Option(conf.getenv("SPARK_PUBLIC_DNS")).getOrElse(
+  conf.get(DRIVER_HOST_ADDRESS)))
 webuiUrl.foreach { url => fwInfoBuilder.setWebuiUrl(url) }
 checkpoint.foreach { checkpoint => 
fwInfoBuilder.setCheckpoint(checkpoint) }
 failoverTimeout.foreach { timeout => 
fwInfoBuilder.setFailoverTimeout(timeout) }
 frameworkId.foreach { id =>
   fwInfoBuilder.setId(FrameworkID.newBuilder().setValue(id).build())
 }
-
fwInfoBuilder.setHostname(Option(conf.getenv("SPARK_PUBLIC_DNS")).getOrElse(
-  conf.get(DRIVER_HOST_ADDRESS)))
-conf.getOption("spark.mesos.principal").foreach { principal =>
-  fwInfoBuilder.setPrincipal(principal)
-  credBuilder.setPrincipal(principal)
-}
-conf.getOption("spark.mesos.secret").foreach { secret =>
-  credBuilder.setSecret(secret)
-}
-if (credBuilder.hasSecret && !fwInfoBuilder.hasPrincipal) {
-  throw new SparkException(
-"spark.mesos.principal must be configured when spark.mesos.secret 
is set")
-}
+
 conf.getOption("spark.mesos.role").foreach { role =>
   fwInfoBuilder.setRole(role)
 }
 val maxGpus = conf.getInt("spark.mesos.gpus.max", 0)
 if (maxGpus > 0) {
   
fwInfoBuilder.addCapabilities(Capability.newBuilder().setType(Capability.Type.GPU_RESOURCES))
 }
+val credBuilder = buildCredentials(conf, fwInfoBuilder)
 if (credBuilder.hasPrincipal) {
   new MesosSchedulerDriver(
 scheduler, fwInfoBuilder.build(), masterUrl, credBuilder.build())
 } else {
   new MesosSchedulerDriver(scheduler, fwInfoBuilder.build(), masterUrl)
 }
   }
+  
+  def buildCredentials(
+  conf: SparkConf, 
+  fwInfoBuilder: Protos.FrameworkInfo.Builder): 
Protos.Credential.Builder = {
+val credBuilder = Credential.newBuilder()
+conf.getOption("spark.mesos.principal")
+  .orElse(Option(conf.getenv("SPARK_MESOS_PRINCIPAL")))
--- End diff --

Sorry for the delay. I have a use case where I start the Dispatcher in the 
Mesos cluster and then execute `spark-submit` cluster calls from within the 
container. Unfortunately this requires me to unset a few environment variables 
(`MESOS_EXECUTOR_ID MESOS_FRAMEWORK_ID MESOS_SLAVE_ID MESOS_SLAVE_PID 
MESOS_TASK_ID`) because they interfere with `spark-submit` due to this function 
in the rest client. 

If the Dispatcher is started in a mode where it needs these Mesos 
authentication credentials, can we assume that we'll want to always forward 
them this same way? I realize I might be getting into the weeds here and this 
might me a _me_ problem. But I thought I'd bring it up. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs

2018-01-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20428
  
Will submit a new PR to 2.3 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs

2018-01-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20428
  
Let me manually push it to 2.3. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs

2018-01-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20428
  
Yes. We need to backport it to 2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20436: [MINOR] Fix typos in dev/* scripts.

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20436
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86828/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20436: [MINOR] Fix typos in dev/* scripts.

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20436
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20436: [MINOR] Fix typos in dev/* scripts.

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20436
  
**[Test build #86828 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86828/testReport)**
 for PR 20436 at commit 
[`0a09dcb`](https://github.com/apache/spark/commit/0a09dcb4ac012b8ec8a5833e1e08e0a678b70302).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20435
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86832/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20435
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20435
  
**[Test build #86832 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86832/testReport)**
 for PR 20435 at commit 
[`0b6b59e`](https://github.com/apache/spark/commit/0b6b59ea86d00e8128af98891fa5d10934cb65cd).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20386
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86829/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20386
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20386
  
**[Test build #86829 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86829/testReport)**
 for PR 20386 at commit 
[`540ff06`](https://github.com/apache/spark/commit/540ff0631471a27af23abb7e8c034bad1ba27cbc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-30 Thread jose-torres

Github user jose-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/20386#discussion_r164810169
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java
 ---
@@ -63,32 +68,42 @@
   DataWriterFactory createWriterFactory();
 
   /**
-   * Commits this writing job with a list of commit messages. The commit 
messages are collected from
-   * successful data writers and are produced by {@link 
DataWriter#commit()}.
+   * Handles a commit message which is collected from a successful data 
writer.
+   *
+   * Note that, implementations might need to cache all commit messages 
before calling
+   * {@link #commit()} or {@link #abort()}.
*
* If this method fails (by throwing an exception), this writing job is 
considered to to have been
-   * failed, and {@link #abort(WriterCommitMessage[])} would be called. 
The state of the destination
-   * is undefined and @{@link #abort(WriterCommitMessage[])} may not be 
able to deal with it.
+   * failed, and {@link #abort()} would be called. The state of the 
destination
+   * is undefined and @{@link #abort()} may not be able to deal with it.
+   */
+  void add(WriterCommitMessage message);
+
+  /**
+   * Commits this writing job.
+   * When this method is called, the number of commit messages added by
+   * {@link #add(WriterCommitMessage)} equals to the number of input data 
partitions.
*
-   * Note that, one partition may have multiple committed data writers 
because of speculative tasks.
-   * Spark will pick the first successful one and get its commit message. 
Implementations should be
-   * aware of this and handle it correctly, e.g., have a coordinator to 
make sure only one data
-   * writer can commit, or have a way to clean up the data of 
already-committed writers.
+   * If this method fails (by throwing an exception), this writing job is 
considered to to have been
+   * failed, and {@link #abort()} would be called. The state of the 
destination
+   * is undefined and @{@link #abort()} may not be able to deal with it.
*/
-  void commit(WriterCommitMessage[] messages);
+  void commit();
--- End diff --

WDYT of using the same API as FileCommitProtocol, where the engine both 
calls add() for each message but also passes them in to commit() at the end? It 
seems like most writers will have to keep an array of the messages they 
received.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread rdblue

Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20386
  
@cloud-fan, is the intent to get this into 2.3.0? If so, I'll make time to 
review it today.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

2018-01-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20438#discussion_r164807808
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
@@ -236,9 +238,29 @@ public MapData getMap(int ordinal) {
   public abstract byte[] getBinary(int rowId);
 
   /**
-   * Returns the ordinal's child column vector.
+   * Returns the calendar interval type value for rowId.
+   *
+   * In Spark, calendar interval type value is basically an integer value 
representing the number of
+   * months in this interval, and a long value representing the number of 
microseconds in this
+   * interval. An interval type vector is the same as a struct type vector 
with 2 fields: `months`
+   * and `microseconds`.
+   *
+   * To support interval type, implementations must implement {@link 
#getChild(int)} and define 2
+   * child vectors: the first child vector is an int type vector, 
containing all the month values of
+   * all the interval values in this vector. The second child vector is a 
long type vector,
+   * containing all the microsecond values of all the interval values in 
this vector.
+   */
+  public final CalendarInterval getInterval(int rowId) {
+if (isNullAt(rowId)) return null;
+final int months = getChild(0).getInt(rowId);
+final long microseconds = getChild(1).getLong(rowId);
+return new CalendarInterval(months, microseconds);
+  }
+
+  /**
+   * @return child [[ColumnVector]] at the given ordinal.
*/
-  public abstract ColumnVector getChild(int ordinal);
+  protected abstract ColumnVector getChild(int ordinal);
--- End diff --

Since `ColumnVector` is public, could you add some description in PR 
description for this kind of visibility change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20427: [SPARK-23260][SPARK-23262][SQL] several data sour...

2018-01-30 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/20427#discussion_r164807449
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java
 ---
@@ -25,7 +25,7 @@
  * session.
  */
 @InterfaceStability.Evolving
-public interface SessionConfigSupport {
+public interface SessionConfigSupport extends DataSourceV2 {
--- End diff --

Ping me on the new PR. I'm happy to review it (though it is non-binding).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20295
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20295
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/396/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

2018-01-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20438#discussion_r164806777
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
@@ -236,9 +238,29 @@ public MapData getMap(int ordinal) {
   public abstract byte[] getBinary(int rowId);
 
   /**
-   * Returns the ordinal's child column vector.
+   * Returns the calendar interval type value for rowId.
+   *
+   * In Spark, calendar interval type value is basically an integer value 
representing the number of
+   * months in this interval, and a long value representing the number of 
microseconds in this
+   * interval. An interval type vector is the same as a struct type vector 
with 2 fields: `months`
+   * and `microseconds`.
+   *
+   * To support interval type, implementations must implement {@link 
#getChild(int)} and define 2
+   * child vectors: the first child vector is an int type vector, 
containing all the month values of
+   * all the interval values in this vector. The second child vector is a 
long type vector,
+   * containing all the microsecond values of all the interval values in 
this vector.
+   */
+  public final CalendarInterval getInterval(int rowId) {
+if (isNullAt(rowId)) return null;
+final int months = getChild(0).getInt(rowId);
+final long microseconds = getChild(1).getLong(rowId);
+return new CalendarInterval(months, microseconds);
+  }
+
+  /**
+   * @return child [[ColumnVector]] at the given ordinal.
*/
-  public abstract ColumnVector getChild(int ordinal);
+  protected abstract ColumnVector getChild(int ordinal);
--- End diff --

Oh, I see. Now, it became `protected`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20427: [SPARK-23260][SPARK-23262][SQL] several data sour...

2018-01-30 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/20427#discussion_r164806676
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java
 ---
@@ -25,7 +25,7 @@
  * session.
  */
 @InterfaceStability.Evolving
-public interface SessionConfigSupport {
+public interface SessionConfigSupport extends DataSourceV2 {
--- End diff --

Mixing large migration commits like this one with unrelated changes makes 
it harder to pick or revert changes without unintended side-effects. What 
happens if we realize that this rename was a bad idea? Reverting this commit 
would also revert the constraint that SessionConfigSupport extends 
DataSourceV2. Similarly, if we realize that these mix-ins don't need to extend 
DataSourceV2, then we would have to find and remove them all instead of 
reverting a commit. That might even sound okay, but when you're picking commits 
deliberately to patch branches, you need to make as few changes as possible and 
cherry-pick conflicts make that much harder.

The fact that you're rushing to get commits into 2.3 is even more 
concerning and reason to be careful, not a reason to relax our standards. 
Please move this to its own PR and fix all of the interfaces at once.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20295
  
**[Test build #86837 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86837/testReport)**
 for PR 20295 at commit 
[`8f0782c`](https://github.com/apache/spark/commit/8f0782c07f4c6f02610918e6d4edc5907f7d6aaa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-30 Thread jose-torres

Github user jose-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/20386#discussion_r164805751
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriterSuite.scala
 ---
@@ -34,9 +33,9 @@ class ConsoleWriterSuite extends StreamTest {
 Console.withOut(captured) {
   val query = input.toDF().writeStream.format("console").start()
   try {
-input.addData(1, 2, 3)
+input.addData(1, 1, 1)
--- End diff --

Makes sense, but can we set the parallelism to 1 instead? I worry that 
making all the elements the same is more likely to disguise a bug.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20435
  
cc @jose-torres 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20386
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20386
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86822/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20386
  
**[Test build #86822 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86822/testReport)**
 for PR 20386 at commit 
[`86de2f0`](https://github.com/apache/spark/commit/86de2f0e6da1a82ea8bcb9b4b1d7a47e4ec0c7e3).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

2018-01-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20438#discussion_r164802008
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
@@ -236,9 +238,29 @@ public MapData getMap(int ordinal) {
   public abstract byte[] getBinary(int rowId);
 
   /**
-   * Returns the ordinal's child column vector.
+   * Returns the calendar interval type value for rowId.
+   *
+   * In Spark, calendar interval type value is basically an integer value 
representing the number of
+   * months in this interval, and a long value representing the number of 
microseconds in this
+   * interval. An interval type vector is the same as a struct type vector 
with 2 fields: `months`
+   * and `microseconds`.
+   *
+   * To support interval type, implementations must implement {@link 
#getChild(int)} and define 2
--- End diff --

It's a little annoying to type `calendar interval type` all the time...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

2018-01-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20438#discussion_r164801042
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
@@ -235,10 +237,30 @@ public MapData getMap(int ordinal) {
*/
   public abstract byte[] getBinary(int rowId);
 
+  /**
+   * Returns the calendar interval type value for rowId.
+   *
+   * In Spark, calendar interval type value is basically an integer value 
representing the number of
+   * months in this interval, and a long value representing the number of 
microseconds in this
+   * interval. A interval type vector is same as a struct type vector with 
2 fields: `months` and
+   * `microseconds`.
+   *
+   * To support interval type, implementations must implement {@link 
#getChild(int)} and define 2
+   * child vectors: the first child vector is a int type vector, 
containing all the month values of
+   * all the interval values in this vector. The second child vector is a 
long type vector,
+   * containing all the microsecond values of all the interval values in 
this vector.
+   */
+  public final CalendarInterval getInterval(int rowId) {
+if (isNullAt(rowId)) return null;
+final int months = getChild(0).getInt(rowId);
--- End diff --

It's from the previous code, probably it tries to make the JVM happy and 
run the code faster.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20438
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20438
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86825/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20438
  
**[Test build #86825 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86825/testReport)**
 for PR 20438 at commit 
[`2f23a1d`](https://github.com/apache/spark/commit/2f23a1d4a6f6968b1c1209b94ca340ca25cc67e1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20295
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86836/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20295
  
**[Test build #86836 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86836/testReport)**
 for PR 20295 at commit 
[`2399b77`](https://github.com/apache/spark/commit/2399b770551bcc16721af0199971b5b66536707b).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20295
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20295
  
**[Test build #86836 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86836/testReport)**
 for PR 20295 at commit 
[`2399b77`](https://github.com/apache/spark/commit/2399b770551bcc16721af0199971b5b66536707b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20295
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/395/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20295
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20386
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20386
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86826/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20386
  
**[Test build #86826 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86826/testReport)**
 for PR 20386 at commit 
[`d198671`](https://github.com/apache/spark/commit/d198671aa6794e76f606a364b479b3143bec2c19).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20434
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/394/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20434
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20434
  
**[Test build #86835 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86835/testReport)**
 for PR 20434 at commit 
[`c64bdfa`](https://github.com/apache/spark/commit/c64bdfa919cbb61cef636519673d780a2f2b6923).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...

2018-01-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20434#discussion_r164791479
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -660,12 +660,10 @@ object SQLConf {
   val WHOLESTAGE_HUGE_METHOD_LIMIT = 
buildConf("spark.sql.codegen.hugeMethodLimit")
 .internal()
 .doc("The maximum bytecode size of a single compiled Java function 
generated by whole-stage " +
-  "codegen. When the compiled function exceeds this threshold, " +
-  "the whole-stage codegen is deactivated for this subtree of the 
current query plan. " +
-  s"The default value is 
${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " +
-  "this is a limit in the OpenJDK JVM implementation.")
--- End diff --

Did the update


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20386
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86823/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20386
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20386
  
**[Test build #86823 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86823/testReport)**
 for PR 20386 at commit 
[`f72c86c`](https://github.com/apache/spark/commit/f72c86ce97ef7004c0a16b6fbe390308feda7759).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20434
  
@kiszk TPC-DS just shows the typical data analytics workloads. However, 
Spark SQL is being used for ETL like workloads. The regression happened in a 
complex pipeline of structured streaming workloads. Will do more investigation 
after 2.3 release. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycode...

2018-01-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20432


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20432
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20432
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86818/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....

2018-01-30 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20432
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20432
  
**[Test build #86818 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86818/testReport)**
 for PR 20432 at commit 
[`3fb3d78`](https://github.com/apache/spark/commit/3fb3d785a9b2497b6ec3b9ac9329db776568197c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20410: [SPARK-23234][ML][PYSPARK] Remove setting defaults on Ja...

2018-01-30 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20410
  
any more comments on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20295
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-01-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20295
  
**[Test build #86834 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86834/testReport)**
 for PR 20295 at commit 
[`2668251`](https://github.com/apache/spark/commit/266825167f0bf308c0b4213b1ef718a930a47c2b).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20295
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86834/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-01-30 Thread icexelloss

Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/20295
  
Rebased


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 >

301 - 400 of 664 matches

Mail list logo