[GitHub] spark issue #10949: [SPARK-12832][MESOS] mesos scheduler respect agent attri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/10949 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...
Github user bersprockets commented on a diff in the pull request: https://github.com/apache/spark/pull/20424#discussion_r164855511 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -191,7 +191,20 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String daemon = pb.start() val in = new DataInputStream(daemon.getInputStream) -daemonPort = in.readInt() +try { + daemonPort = in.readInt() +} catch { + case exc: EOFException => +throw new IOException(s"No port number in $daemonModule's stdout") +} + +// test that the returned port number is within a valid range. +// note: this does not cover the case where the port number +// is arbitrary data but is also coincidentally within range +if (daemonPort < 1 || daemonPort > 0x) { --- End diff -- Oh, I see. Let me address the two parts of the comment separately: First part: What's the point of throwing an exception for a bad port number when the original handling did that already? The original handling was: java.lang.IllegalArgumentException: port out of range:1315905645 at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143) This error occurred in a different function than the one that obtained the port number. So you had to track down the source of the port number. This actually added an extra step to the original debugging for the sitecustomize.py issue. The proposed handling is: java.io.IOException: Bad port number in pyspark.daemon's stdout: 0x4e6f206d at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:205) Here we're not saying "somehow we got a bad port number", but that a particular python module (the name of which is displayed, since the name of that module is configurable) has returned bad data. Also, the check is at the point at which the port number is obtained, not waiting until sometime later in another function where the port number is used (and where possibly something else has changed the port number, which is not likely, but you would need to check that). Perhaps the message could be better, e.g.,: Bad data in pyspark.daemon's output. Expected valid port number, got 0x4e6f206d. PYTHONPATH set to '/Users/brobbins/github/spark_fork/python/lib/pyspark.zip:/Users/brobbins/github/spark_fork/python/lib/py4j-0.10.6-src.zip:/Users/brobbins/github/spark_fork/assembly/target/scala-2.11/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/Users/brobbins/github/spark_fork/python/lib/py4j-0.10.6-src.zip:/Users/brobbins/github/spark_fork/python/:' Command to run python module was 'python -m pyspark.daemon' Check whether you have a sitecustomize.py module that may be printing output to stdout. Second part: Why don't we fix this? That's reasonable. A couple of points: - The name of the python daemon module is now configurable so that the module can be wrapped with customizations. It appears that this is only in the main branch and not even released on Spark 2.3, so it might be safe to change daemon.py (and potentially its existing wrappers) to return the port number in a different way. - I don't have a good feel for how often sitecustomize.py is used, so not sure of the relative value of some mild hacking up of this code vs. just letting the user know what happened. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20434 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20434 Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20434 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86835/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20434 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20434 **[Test build #86835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86835/testReport)** for PR 20434 at commit [`c64bdfa`](https://github.com/apache/spark/commit/c64bdfa919cbb61cef636519673d780a2f2b6923). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20439: [SPARK-23261][PYSPARK][BACKPORT-2.3] Rename Pandas UDFs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20439 **[Test build #86841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86841/testReport)** for PR 20439 at commit [`1a70ae1`](https://github.com/apache/spark/commit/1a70ae195d345962fb9bc03a2abf4e3b812ae376). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20439: [SPARK-23261][PYSPARK][BACKPORT-2.3] Rename Pandas UDFs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20439 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/399/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20439: [SPARK-23261][PYSPARK][BACKPORT-2.3] Rename Pandas UDFs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20439 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20439: [SPARK-23261][PYSPARK][BACKPORT-2.3] Rename Pandas UDFs
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20439 cc @HyukjinKwon @cloud-fan @ueshin @BryanCutler @icexelloss --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20439: [SPARK-23261][PYSPARK][BACKPORT-2.3] Rename Panda...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/20439 [SPARK-23261][PYSPARK][BACKPORT-2.3] Rename Pandas UDFs This PR is to backport https://github.com/apache/spark/pull/20428 to Spark 2.3 without adding the changes regarding `GROUPED AGG PANDAS UDF` --- ## What changes were proposed in this pull request? Rename the public APIs and names of pandas udfs. - `PANDAS SCALAR UDF` -> `SCALAR PANDAS UDF` - `PANDAS GROUP MAP UDF` -> `GROUPED MAP PANDAS UDF` - `PANDAS GROUP AGG UDF` -> `GROUPED AGG PANDAS UDF` ## How was this patch tested? The existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark backport2.3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20439.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20439 commit 1a70ae195d345962fb9bc03a2abf4e3b812ae376 Author: gatorsmile Date: 2018-01-30T12:55:55Z [SPARK-23261][PYSPARK] Rename Pandas UDFs Rename the public APIs and names of pandas udfs. - `PANDAS SCALAR UDF` -> `SCALAR PANDAS UDF` - `PANDAS GROUP MAP UDF` -> `GROUPED MAP PANDAS UDF` - `PANDAS GROUP AGG UDF` -> `GROUPED AGG PANDAS UDF` The existing tests Author: gatorsmile Closes #20428 from gatorsmile/renamePandasUDFs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19802: [SPARK-22594][CORE] Handling spark-submit and master ver...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19802 **[Test build #4086 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4086/testReport)** for PR 19802 at commit [`4f79632`](https://github.com/apache/spark/commit/4f79632d22b67128a6be8a285f4fc1fec0d5f12f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20434 Yes. We need to avoid the performance regression since the last release Spark 2.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20434 I see. The baseline is 2.2, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20408: [SPARK-23189][Core][Web UI] Reflect stage level blacklis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20408 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20434 This is to revert back to the original behavior. Thus, we do not introduce anything else compared with 2.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20408: [SPARK-23189][Core][Web UI] Reflect stage level blacklis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20408 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86833/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20408: [SPARK-23189][Core][Web UI] Reflect stage level blacklis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20408 **[Test build #86833 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86833/testReport)** for PR 20408 at commit [`ea47877`](https://github.com/apache/spark/commit/ea478779392429f2e84f762819ed29fa392abae1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20434 @gatorsmile . In the original PR, https://github.com/apache/spark/pull/18810, there was a microbenchmark. Can we have the result on the same benchmark here, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20434#discussion_r164845127 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -660,12 +660,13 @@ object SQLConf { val WHOLESTAGE_HUGE_METHOD_LIMIT = buildConf("spark.sql.codegen.hugeMethodLimit") .internal() .doc("The maximum bytecode size of a single compiled Java function generated by whole-stage " + - "codegen. When the compiled function exceeds this threshold, " + - "the whole-stage codegen is deactivated for this subtree of the current query plan. " + - s"The default value is ${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " + - "this is a limit in the OpenJDK JVM implementation.") + "codegen. When the compiled function exceeds this threshold, the whole-stage codegen is " + + "deactivated for this subtree of the current query plan. The default value is 65535, which " + + "is the largest bytecode size possible for a valid Java method. When running on HotSpot, " + + s"it may be preferable to set the value to ${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} " + + "to match HotSpot's implementation.") .intConf -.createWithDefault(CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT) +.createWithDefault(65535) --- End diff -- cc @mgaido91 . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20422 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86831/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20422 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20422 **[Test build #86831 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86831/testReport)** for PR 20422 at commit [`6196770`](https://github.com/apache/spark/commit/61967706c6f3804a84819f8484abeff5d1d77eea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20343 Thanks for submitting the PR https://github.com/apache/spark/pull/20433. It sounds like there are still some test failure. Will review it after 2.3 release. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20343 @maropu Yeah. As long as the queries are different, we should keep both versions. This is to help the others understand we fully support TPC-DS queries without the changes. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20435 **[Test build #86840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86840/testReport)** for PR 20435 at commit [`0b6b59e`](https://github.com/apache/spark/commit/0b6b59ea86d00e8128af98891fa5d10934cb65cd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20435 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20435 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/398/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/20435 cc @marmbrus --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20435 LGTM to adding the new package of partitioning/distribution. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20435 **[Test build #86839 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86839/testReport)** for PR 20435 at commit [`0b6b59e`](https://github.com/apache/spark/commit/0b6b59ea86d00e8128af98891fa5d10934cb65cd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/20435 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20386 @rdblue The target is 2.3 release. Thanks for your time! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20435 Streaming part LGTM; I have no particular opinion or context on the distribution stuff. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20435 cc @zsxwing @marmbrus too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20435 **[Test build #86838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86838/testReport)** for PR 20435 at commit [`0b6b59e`](https://github.com/apache/spark/commit/0b6b59ea86d00e8128af98891fa5d10934cb65cd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20408: [SPARK-23189][Core][Web UI] Reflect stage level b...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20408#discussion_r164824439 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala --- @@ -594,12 +606,24 @@ private[spark] class AppStatusListener( stage.executorSummaries.values.foreach(update(_, now)) update(stage, now, last = true) + + val executorIdsForStage = stage.executorSummaries.keySet + executorIdsForStage.foreach { executorId => +liveExecutors.get(executorId).foreach { exec => + removeBlackListedStageFrom(exec, event.stageInfo.stageId, now) +} + } } appSummary = new AppSummary(appSummary.numCompletedJobs, appSummary.numCompletedStages + 1) kvstore.write(appSummary) } + private def removeBlackListedStageFrom(exec: LiveExecutor, stageId: Int, now: Long) = { +exec.blacklistedInStages -= stageId +liveUpdate(exec, now) --- End diff -- hmm actually I just thought of something else. It looks like you're calling `liveUpdate` here for *every* executor when the stage finishes. Say you have 1000 execs, a very quick stage, and no blacklisting, this is an expensive update for no actual change. So you should at least avoid the `liveUpdate` if `exec.blacklistedInStages` hasn't changed at all. But really, I think that `LiveStage` should maintain a set of blacklisted executors, so you avoid calling this entirely for execs which aren't blacklisted. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20435 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20435 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/397/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20435 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20167: [SPARK-16501] [MESOS] Allow providing Mesos princ...
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/20167#discussion_r164825718 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -71,40 +74,64 @@ trait MesosSchedulerUtils extends Logging { failoverTimeout: Option[Double] = None, frameworkId: Option[String] = None): SchedulerDriver = { val fwInfoBuilder = FrameworkInfo.newBuilder().setUser(sparkUser).setName(appName) -val credBuilder = Credential.newBuilder() + fwInfoBuilder.setHostname(Option(conf.getenv("SPARK_PUBLIC_DNS")).getOrElse( + conf.get(DRIVER_HOST_ADDRESS))) webuiUrl.foreach { url => fwInfoBuilder.setWebuiUrl(url) } checkpoint.foreach { checkpoint => fwInfoBuilder.setCheckpoint(checkpoint) } failoverTimeout.foreach { timeout => fwInfoBuilder.setFailoverTimeout(timeout) } frameworkId.foreach { id => fwInfoBuilder.setId(FrameworkID.newBuilder().setValue(id).build()) } - fwInfoBuilder.setHostname(Option(conf.getenv("SPARK_PUBLIC_DNS")).getOrElse( - conf.get(DRIVER_HOST_ADDRESS))) -conf.getOption("spark.mesos.principal").foreach { principal => - fwInfoBuilder.setPrincipal(principal) - credBuilder.setPrincipal(principal) -} -conf.getOption("spark.mesos.secret").foreach { secret => - credBuilder.setSecret(secret) -} -if (credBuilder.hasSecret && !fwInfoBuilder.hasPrincipal) { - throw new SparkException( -"spark.mesos.principal must be configured when spark.mesos.secret is set") -} + conf.getOption("spark.mesos.role").foreach { role => fwInfoBuilder.setRole(role) } val maxGpus = conf.getInt("spark.mesos.gpus.max", 0) if (maxGpus > 0) { fwInfoBuilder.addCapabilities(Capability.newBuilder().setType(Capability.Type.GPU_RESOURCES)) } +val credBuilder = buildCredentials(conf, fwInfoBuilder) if (credBuilder.hasPrincipal) { new MesosSchedulerDriver( scheduler, fwInfoBuilder.build(), masterUrl, credBuilder.build()) } else { new MesosSchedulerDriver(scheduler, fwInfoBuilder.build(), masterUrl) } } + + def buildCredentials( + conf: SparkConf, + fwInfoBuilder: Protos.FrameworkInfo.Builder): Protos.Credential.Builder = { +val credBuilder = Credential.newBuilder() +conf.getOption("spark.mesos.principal") + .orElse(Option(conf.getenv("SPARK_MESOS_PRINCIPAL"))) --- End diff -- Sorry for the delay. I have a use case where I start the Dispatcher in the Mesos cluster and then execute `spark-submit` cluster calls from within the container. Unfortunately this requires me to unset a few environment variables (`MESOS_EXECUTOR_ID MESOS_FRAMEWORK_ID MESOS_SLAVE_ID MESOS_SLAVE_PID MESOS_TASK_ID`) because they interfere with `spark-submit` due to this function in the rest client. If the Dispatcher is started in a mode where it needs these Mesos authentication credentials, can we assume that we'll want to always forward them this same way? I realize I might be getting into the weeds here and this might me a _me_ problem. But I thought I'd bring it up. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20428 Will submit a new PR to 2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20428 Let me manually push it to 2.3. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20428 Yes. We need to backport it to 2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20436: [MINOR] Fix typos in dev/* scripts.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20436 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86828/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20436: [MINOR] Fix typos in dev/* scripts.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20436 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20436: [MINOR] Fix typos in dev/* scripts.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20436 **[Test build #86828 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86828/testReport)** for PR 20436 at commit [`0a09dcb`](https://github.com/apache/spark/commit/0a09dcb4ac012b8ec8a5833e1e08e0a678b70302). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20435 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86832/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20435 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20435 **[Test build #86832 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86832/testReport)** for PR 20435 at commit [`0b6b59e`](https://github.com/apache/spark/commit/0b6b59ea86d00e8128af98891fa5d10934cb65cd). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20386 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86829/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20386 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20386 **[Test build #86829 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86829/testReport)** for PR 20386 at commit [`540ff06`](https://github.com/apache/spark/commit/540ff0631471a27af23abb7e8c034bad1ba27cbc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r164810169 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java --- @@ -63,32 +68,42 @@ DataWriterFactory createWriterFactory(); /** - * Commits this writing job with a list of commit messages. The commit messages are collected from - * successful data writers and are produced by {@link DataWriter#commit()}. + * Handles a commit message which is collected from a successful data writer. + * + * Note that, implementations might need to cache all commit messages before calling + * {@link #commit()} or {@link #abort()}. * * If this method fails (by throwing an exception), this writing job is considered to to have been - * failed, and {@link #abort(WriterCommitMessage[])} would be called. The state of the destination - * is undefined and @{@link #abort(WriterCommitMessage[])} may not be able to deal with it. + * failed, and {@link #abort()} would be called. The state of the destination + * is undefined and @{@link #abort()} may not be able to deal with it. + */ + void add(WriterCommitMessage message); + + /** + * Commits this writing job. + * When this method is called, the number of commit messages added by + * {@link #add(WriterCommitMessage)} equals to the number of input data partitions. * - * Note that, one partition may have multiple committed data writers because of speculative tasks. - * Spark will pick the first successful one and get its commit message. Implementations should be - * aware of this and handle it correctly, e.g., have a coordinator to make sure only one data - * writer can commit, or have a way to clean up the data of already-committed writers. + * If this method fails (by throwing an exception), this writing job is considered to to have been + * failed, and {@link #abort()} would be called. The state of the destination + * is undefined and @{@link #abort()} may not be able to deal with it. */ - void commit(WriterCommitMessage[] messages); + void commit(); --- End diff -- WDYT of using the same API as FileCommitProtocol, where the engine both calls add() for each message but also passes them in to commit() at the end? It seems like most writers will have to keep an array of the messages they received. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20386 @cloud-fan, is the intent to get this into 2.3.0? If so, I'll make time to review it today. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20438#discussion_r164807808 --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java --- @@ -236,9 +238,29 @@ public MapData getMap(int ordinal) { public abstract byte[] getBinary(int rowId); /** - * Returns the ordinal's child column vector. + * Returns the calendar interval type value for rowId. + * + * In Spark, calendar interval type value is basically an integer value representing the number of + * months in this interval, and a long value representing the number of microseconds in this + * interval. An interval type vector is the same as a struct type vector with 2 fields: `months` + * and `microseconds`. + * + * To support interval type, implementations must implement {@link #getChild(int)} and define 2 + * child vectors: the first child vector is an int type vector, containing all the month values of + * all the interval values in this vector. The second child vector is a long type vector, + * containing all the microsecond values of all the interval values in this vector. + */ + public final CalendarInterval getInterval(int rowId) { +if (isNullAt(rowId)) return null; +final int months = getChild(0).getInt(rowId); +final long microseconds = getChild(1).getLong(rowId); +return new CalendarInterval(months, microseconds); + } + + /** + * @return child [[ColumnVector]] at the given ordinal. */ - public abstract ColumnVector getChild(int ordinal); + protected abstract ColumnVector getChild(int ordinal); --- End diff -- Since `ColumnVector` is public, could you add some description in PR description for this kind of visibility change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20427: [SPARK-23260][SPARK-23262][SQL] several data sour...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20427#discussion_r164807449 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java --- @@ -25,7 +25,7 @@ * session. */ @InterfaceStability.Evolving -public interface SessionConfigSupport { +public interface SessionConfigSupport extends DataSourceV2 { --- End diff -- Ping me on the new PR. I'm happy to review it (though it is non-binding). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20295 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20295 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/396/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20438#discussion_r164806777 --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java --- @@ -236,9 +238,29 @@ public MapData getMap(int ordinal) { public abstract byte[] getBinary(int rowId); /** - * Returns the ordinal's child column vector. + * Returns the calendar interval type value for rowId. + * + * In Spark, calendar interval type value is basically an integer value representing the number of + * months in this interval, and a long value representing the number of microseconds in this + * interval. An interval type vector is the same as a struct type vector with 2 fields: `months` + * and `microseconds`. + * + * To support interval type, implementations must implement {@link #getChild(int)} and define 2 + * child vectors: the first child vector is an int type vector, containing all the month values of + * all the interval values in this vector. The second child vector is a long type vector, + * containing all the microsecond values of all the interval values in this vector. + */ + public final CalendarInterval getInterval(int rowId) { +if (isNullAt(rowId)) return null; +final int months = getChild(0).getInt(rowId); +final long microseconds = getChild(1).getLong(rowId); +return new CalendarInterval(months, microseconds); + } + + /** + * @return child [[ColumnVector]] at the given ordinal. */ - public abstract ColumnVector getChild(int ordinal); + protected abstract ColumnVector getChild(int ordinal); --- End diff -- Oh, I see. Now, it became `protected`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20427: [SPARK-23260][SPARK-23262][SQL] several data sour...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20427#discussion_r164806676 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java --- @@ -25,7 +25,7 @@ * session. */ @InterfaceStability.Evolving -public interface SessionConfigSupport { +public interface SessionConfigSupport extends DataSourceV2 { --- End diff -- Mixing large migration commits like this one with unrelated changes makes it harder to pick or revert changes without unintended side-effects. What happens if we realize that this rename was a bad idea? Reverting this commit would also revert the constraint that SessionConfigSupport extends DataSourceV2. Similarly, if we realize that these mix-ins don't need to extend DataSourceV2, then we would have to find and remove them all instead of reverting a commit. That might even sound okay, but when you're picking commits deliberately to patch branches, you need to make as few changes as possible and cherry-pick conflicts make that much harder. The fact that you're rushing to get commits into 2.3 is even more concerning and reason to be careful, not a reason to relax our standards. Please move this to its own PR and fix all of the interfaces at once. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20295 **[Test build #86837 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86837/testReport)** for PR 20295 at commit [`8f0782c`](https://github.com/apache/spark/commit/8f0782c07f4c6f02610918e6d4edc5907f7d6aaa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r164805751 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriterSuite.scala --- @@ -34,9 +33,9 @@ class ConsoleWriterSuite extends StreamTest { Console.withOut(captured) { val query = input.toDF().writeStream.format("console").start() try { -input.addData(1, 2, 3) +input.addData(1, 1, 1) --- End diff -- Makes sense, but can we set the parallelism to 1 instead? I worry that making all the elements the same is more likely to disguise a bug. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20435 cc @jose-torres --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20386 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20386 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86822/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20386 **[Test build #86822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86822/testReport)** for PR 20386 at commit [`86de2f0`](https://github.com/apache/spark/commit/86de2f0e6da1a82ea8bcb9b4b1d7a47e4ec0c7e3). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20438#discussion_r164802008 --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java --- @@ -236,9 +238,29 @@ public MapData getMap(int ordinal) { public abstract byte[] getBinary(int rowId); /** - * Returns the ordinal's child column vector. + * Returns the calendar interval type value for rowId. + * + * In Spark, calendar interval type value is basically an integer value representing the number of + * months in this interval, and a long value representing the number of microseconds in this + * interval. An interval type vector is the same as a struct type vector with 2 fields: `months` + * and `microseconds`. + * + * To support interval type, implementations must implement {@link #getChild(int)} and define 2 --- End diff -- It's a little annoying to type `calendar interval type` all the time... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20438#discussion_r164801042 --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java --- @@ -235,10 +237,30 @@ public MapData getMap(int ordinal) { */ public abstract byte[] getBinary(int rowId); + /** + * Returns the calendar interval type value for rowId. + * + * In Spark, calendar interval type value is basically an integer value representing the number of + * months in this interval, and a long value representing the number of microseconds in this + * interval. A interval type vector is same as a struct type vector with 2 fields: `months` and + * `microseconds`. + * + * To support interval type, implementations must implement {@link #getChild(int)} and define 2 + * child vectors: the first child vector is a int type vector, containing all the month values of + * all the interval values in this vector. The second child vector is a long type vector, + * containing all the microsecond values of all the interval values in this vector. + */ + public final CalendarInterval getInterval(int rowId) { +if (isNullAt(rowId)) return null; +final int months = getChild(0).getInt(rowId); --- End diff -- It's from the previous code, probably it tries to make the JVM happy and run the code faster. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20438 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20438 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86825/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20438 **[Test build #86825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86825/testReport)** for PR 20438 at commit [`2f23a1d`](https://github.com/apache/spark/commit/2f23a1d4a6f6968b1c1209b94ca340ca25cc67e1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20295 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86836/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20295 **[Test build #86836 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86836/testReport)** for PR 20295 at commit [`2399b77`](https://github.com/apache/spark/commit/2399b770551bcc16721af0199971b5b66536707b). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20295 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20295 **[Test build #86836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86836/testReport)** for PR 20295 at commit [`2399b77`](https://github.com/apache/spark/commit/2399b770551bcc16721af0199971b5b66536707b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20295 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/395/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20295 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20386 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20386 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86826/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20386 **[Test build #86826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86826/testReport)** for PR 20386 at commit [`d198671`](https://github.com/apache/spark/commit/d198671aa6794e76f606a364b479b3143bec2c19). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20434 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/394/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20434 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20434 **[Test build #86835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86835/testReport)** for PR 20434 at commit [`c64bdfa`](https://github.com/apache/spark/commit/c64bdfa919cbb61cef636519673d780a2f2b6923). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20434#discussion_r164791479 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -660,12 +660,10 @@ object SQLConf { val WHOLESTAGE_HUGE_METHOD_LIMIT = buildConf("spark.sql.codegen.hugeMethodLimit") .internal() .doc("The maximum bytecode size of a single compiled Java function generated by whole-stage " + - "codegen. When the compiled function exceeds this threshold, " + - "the whole-stage codegen is deactivated for this subtree of the current query plan. " + - s"The default value is ${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " + - "this is a limit in the OpenJDK JVM implementation.") --- End diff -- Did the update --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20386 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86823/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20386 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20386 **[Test build #86823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86823/testReport)** for PR 20386 at commit [`f72c86c`](https://github.com/apache/spark/commit/f72c86ce97ef7004c0a16b6fbe390308feda7759). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20434 @kiszk TPC-DS just shows the typical data analytics workloads. However, Spark SQL is being used for ETL like workloads. The regression happened in a complex pipeline of structured streaming workloads. Will do more investigation after 2.3 release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycode...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20432 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20432 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20432 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86818/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20432 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20432 **[Test build #86818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86818/testReport)** for PR 20432 at commit [`3fb3d78`](https://github.com/apache/spark/commit/3fb3d785a9b2497b6ec3b9ac9329db776568197c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20410: [SPARK-23234][ML][PYSPARK] Remove setting defaults on Ja...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20410 any more comments on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20295 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20295 **[Test build #86834 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86834/testReport)** for PR 20295 at commit [`2668251`](https://github.com/apache/spark/commit/266825167f0bf308c0b4213b1ef718a930a47c2b). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20295 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86834/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/20295 Rebased --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org