[GitHub] spark issue #22464: Revert [SPARK-19355][SPARK-25352]
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22464 **[Test build #96240 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96240/testReport)** for PR 22464 at commit [`8ee721c`](https://github.com/apache/spark/commit/8ee721c2923ba125e8d00610a7d9d489010022de). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait BaseLimitExec extends UnaryExecNode with CodegenSupport ` * `case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec ` * `case class GlobalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22469 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22419: [SPARK-23906][SQL] Add built-in UDF TRUNCATE(number)
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22419 **[Test build #96242 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96242/testReport)** for PR 22419 at commit [`c715694`](https://github.com/apache/spark/commit/c7156943a2a32ba57e67aa6d8fa7035a09847e07). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user szyszy commented on the issue: https://github.com/apache/spark/pull/20761 @srowen: Once again, thanks for the review. I fixed most of what you suggested, if not then I made a comment. Please check the updated code! Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22465: [SPARK-25457][SQL] IntegralDivide returns data type of t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22465 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22468 **[Test build #96262 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96262/testReport)** for PR 22468 at commit [`bc5f144`](https://github.com/apache/spark/commit/bc5f1445ff6ee45456020d6f0afdd91d14c844af). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22173 **[Test build #96264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96264/testReport)** for PR 22173 at commit [`e7b47e9`](https://github.com/apache/spark/commit/e7b47e9c37e42e8de251f9f91d9f85428ea7df73). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22465: [SPARK-25457][SQL] IntegralDivide returns data type of t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22465 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3246/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22467 **[Test build #96248 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96248/testReport)** for PR 22467 at commit [`8c67e0b`](https://github.com/apache/spark/commit/8c67e0b25746444c583ea4a8ece0f5aaa88afd25). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class HiveParquetMetastoreSuite extends ParquetPartitioningTest ` * `class HiveParquetSourceSuite extends ParquetPartitioningTest ` * `case class ParquetData(intField: Int, stringField: String)` * `case class ParquetDataWithKey(p: Int, intField: Int, stringField: String)` * `case class StructContainer(intStructField: Int, stringStructField: String)` * `case class ParquetDataWithComplexTypes(` * `case class ParquetDataWithKeyAndComplexTypes(` * `abstract class ParquetPartitioningTest extends QueryTest with SQLTestUtils with TestHiveSingleton ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22402 **[Test build #96244 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96244/testReport)** for PR 22402 at commit [`0c661a0`](https://github.com/apache/spark/commit/0c661a08e74fea90b025ad21fb9da6113ef70d4c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22465: [SPARK-25457][SQL] IntegralDivide returns data type of t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22465 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96243/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/22173#discussion_r218807868 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/ChunkFetchRequestHandler.java --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.server; + +import java.net.SocketAddress; + +import com.google.common.base.Throwables; +import io.netty.channel.Channel; +import io.netty.channel.ChannelFuture; +import io.netty.channel.ChannelFutureListener; +import io.netty.channel.ChannelHandlerContext; +import io.netty.channel.SimpleChannelInboundHandler; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.TransportClient; +import org.apache.spark.network.protocol.ChunkFetchFailure; +import org.apache.spark.network.protocol.ChunkFetchRequest; +import org.apache.spark.network.protocol.ChunkFetchSuccess; +import org.apache.spark.network.protocol.Encodable; + +import static org.apache.spark.network.util.NettyUtils.*; + + +/** + * A dedicated ChannelHandler for processing ChunkFetchRequest messages. When sending response + * of ChunkFetchRequest messages to the clients, the thread performing the I/O on the underlying + * channel could potentially be blocked due to disk contentions. If several hundreds of clients + * send ChunkFetchRequest to the server at the same time, it could potentially occupying all + * threads from TransportServer's default EventLoopGroup for waiting for disk reads before it + * can send the block data back to the client as part of the ChunkFetchSuccess messages. As a + * result, it would leave no threads left to process other RPC messages, which takes much less + * time to process, and could lead to client timing out on either performing SASL authentication, + * registering executors, or waiting for response for an OpenBlocks messages. + */ +public class ChunkFetchRequestHandler extends SimpleChannelInboundHandler { + private static final Logger logger = LoggerFactory.getLogger(ChunkFetchRequestHandler.class); + + private final TransportClient client; + private final StreamManager streamManager; + /** The max number of chunks being transferred and not finished yet. */ + private final long maxChunksBeingTransferred; + + public ChunkFetchRequestHandler( + TransportClient client, + StreamManager streamManager, + Long maxChunksBeingTransferred) { +this.client = client; +this.streamManager = streamManager; +this.maxChunksBeingTransferred = maxChunksBeingTransferred; + } + + @Override + public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) throws Exception { +logger.warn("Exception in connection from " + getRemoteAddress(ctx.channel()), cause); +ctx.close(); + } + + @Override + protected void channelRead0(ChannelHandlerContext ctx, final ChunkFetchRequest msg) +throws Exception { --- End diff -- indent this 2 more spaces --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22173 **[Test build #96256 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96256/testReport)** for PR 22173 at commit [`ec912d6`](https://github.com/apache/spark/commit/ec912d6ca4e6911de218dc48ea05eecf9f0ff23e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21451: [SPARK-24296][CORE] Replicate large blocks as a stream.
Github user squito commented on the issue: https://github.com/apache/spark/pull/21451 still looking -- will put comments on the jira so its more visible --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22464: Revert [SPARK-19355][SPARK-25352]
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22464 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96240/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22464: Revert [SPARK-19355][SPARK-25352]
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22464 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note ...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/22469 [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN subqueries behavior ## What changes were proposed in this pull request? The PR updates the migration guide in order to explain the changes introduced in the behavior of the IN operator with subqueries, in particular, the improved handling of struct attributes in these situations. ## How was this patch tested? NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-24341_followup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22469.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22469 commit fed3911554ded1fc3bf9211f489f0093ea1578f1 Author: Marco Gaido Date: 2018-09-19T14:00:18Z [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN subqueries behavior --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22469 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22469 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3243/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22469 **[Test build #96258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96258/testReport)** for PR 22469 at commit [`fed3911`](https://github.com/apache/spark/commit/fed3911554ded1fc3bf9211f489f0093ea1578f1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22419: [SPARK-23906][SQL] Add built-in UDF TRUNCATE(number)
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22419 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22419: [SPARK-23906][SQL] Add built-in UDF TRUNCATE(number)
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22419 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96242/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22469 **[Test build #96258 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96258/testReport)** for PR 22469 at commit [`fed3911`](https://github.com/apache/spark/commit/fed3911554ded1fc3bf9211f489f0093ea1578f1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3244/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22305 **[Test build #96260 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96260/testReport)** for PR 22305 at commit [`bb05ee0`](https://github.com/apache/spark/commit/bb05ee036e63c629fb8fb6225e53cd9340019397). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22469 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96258/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22173 **[Test build #96259 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96259/testReport)** for PR 22173 at commit [`573033c`](https://github.com/apache/spark/commit/573033c5b42abf9220b6bf656b4c2f04ea615ab7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22469 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22468 **[Test build #96261 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96261/testReport)** for PR 22468 at commit [`0ee6a00`](https://github.com/apache/spark/commit/0ee6a00082cecac9e8163a1a32215318f53325a1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22465: [SPARK-25457][SQL] IntegralDivide returns data type of t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22465 **[Test build #96263 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96263/testReport)** for PR 22465 at commit [`3f045c0`](https://github.com/apache/spark/commit/3f045c0512efe968aaf40861b5054af5da254ce3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22402 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22402 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96244/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22465: [SPARK-25457][SQL] IntegralDivide returns data type of t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22465 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22325: [SPARK-25318]. Add exception handling when wrappi...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/22325#discussion_r218857081 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -444,36 +444,34 @@ final class ShuffleBlockFetcherIterator( throwFetchFailedException(blockId, address, e) } - input = streamWrapper(blockId, in) - // Only copy the stream if it's wrapped by compression or encryption, also the size of - // block is small (the decompressed block is smaller than maxBytesInFlight) - if (detectCorrupt && !input.eq(in) && size < maxBytesInFlight / 3) { -val originalInput = input -val out = new ChunkedByteBufferOutputStream(64 * 1024, ByteBuffer.allocate) -try { + try { +input = streamWrapper(blockId, in) +// Only copy the stream if it's wrapped by compression or encryption, also the size of +// block is small (the decompressed block is smaller than maxBytesInFlight) +if (detectCorrupt && !input.eq(in) && size < maxBytesInFlight / 3) { + val out = new ChunkedByteBufferOutputStream(64 * 1024, ByteBuffer.allocate) // Decompress the whole block at once to detect any corruption, which could increase // the memory usage tne potential increase the chance of OOM. // TODO: manage the memory used here, and spill it into disk in case of OOM. Utils.copyStream(input, out) out.close() input = out.toChunkedByteBuffer.toInputStream(dispose = true) --- End diff -- We create a new `input` here, so the original input shall be closed to avoid memory leak. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22463: remove annotation @Experimental
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/22463 Where is the discussion on these utility methods no longer being Experimental? I'm not saying that they are not stable, but the Kafka 0.10 API in general being considered to be stable doesn't preclude some Kafka-related methods from still being other than stable, and promoting API to stable (i.e. cannot be changed without a major release) is a pretty big deal. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22462 **[Test build #96247 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96247/testReport)** for PR 22462 at commit [`d23c0ed`](https://github.com/apache/spark/commit/d23c0ede4a463e671adcdd3f88ef2457c82ab8da). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22462 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96247/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22462 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22460 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22460 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3250/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22460 **[Test build #96270 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96270/testReport)** for PR 22460 at commit [`89dcafe`](https://github.com/apache/spark/commit/89dcafe979a150f8d722fad72a575b67152ccb58). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r218865220 --- Diff: core/src/test/java/org/apache/spark/ExecutorPluginSuite.java --- @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import org.apache.spark.api.java.JavaSparkContext; + +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import static org.junit.Assert.*; + +public class ExecutorPluginSuite { + private static final String EXECUTOR_PLUGIN_CONF_NAME = "spark.executor.plugins"; + private static final String testBadPluginName = TestBadShutdownPlugin.class.getName(); + private static final String testPluginName = TestExecutorPlugin.class.getName(); + + // Static value modified by testing plugin to ensure plugin loaded correctly. + public static int numSuccessfulPlugins = 0; + + // Static value modified by testing plugin to verify plugins shut down properly. + public static int numSuccessfulTerminations = 0; + + private JavaSparkContext sc; + + @Before + public void setUp() { +sc = null; +numSuccessfulPlugins = 0; +numSuccessfulTerminations = 0; + } + + @After + public void tearDown() { +if (sc != null) { + sc.stop(); + sc = null; +} + } + + private SparkConf initializeSparkConf(String pluginNames) { +return new SparkConf() +.setMaster("local") +.setAppName("test") +.set(EXECUTOR_PLUGIN_CONF_NAME, pluginNames); + } + + @Test + public void testPluginClassDoesNotExist() { +SparkConf conf = initializeSparkConf("nonexistant.plugin"); +try { + sc = new JavaSparkContext(conf); + fail("No exception thrown for nonexistant plugin"); +} catch (Exception e) { + // We cannot catch ClassNotFoundException directly because Java doesn't think it'll be thrown + assertTrue(e.toString().startsWith("java.lang.ClassNotFoundException")); +} + } + + @Test + public void testAddPlugin() throws InterruptedException { +// Load the sample TestExecutorPlugin, which will change the value of numSuccessfulPlugins +SparkConf conf = initializeSparkConf(testPluginName); +sc = new JavaSparkContext(conf); +assertEquals(1, numSuccessfulPlugins); +sc.stop(); +sc = null; +assertEquals(1, numSuccessfulTerminations); + } + + @Test + public void testAddMultiplePlugins() throws InterruptedException { --- End diff -- super nit: shall we test whether we can load multiple different plugins? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r218861435 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -136,6 +136,26 @@ private[spark] class Executor( // for fetching remote cached RDD blocks, so need to make sure it uses the right classloader too. env.serializerManager.setDefaultClassLoader(replClassLoader) + private val executorPlugins: Seq[ExecutorPlugin] = { +val pluginNames = conf.get(EXECUTOR_PLUGINS) +if (pluginNames.nonEmpty) { + logDebug(s"Initializing the following plugins: ${pluginNames.mkString(", ")}") + + // Plugins need to load using a class loader that includes the executor's user classpath + val pluginList: Seq[ExecutorPlugin] = +Utils.withContextClassLoader(replClassLoader) { + val plugins = Utils.loadExtensions(classOf[ExecutorPlugin], pluginNames, conf) + plugins.foreach(_.init()) --- End diff -- nit: Might be good to log whether each `plugin.init()` succeeded. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22462 **[Test build #96249 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96249/testReport)** for PR 22462 at commit [`bc486d7`](https://github.com/apache/spark/commit/bc486d7e31e282a02c75961c040c2e00710f9a49). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22462 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96249/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22462 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite IllegalA...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22461 **[Test build #96245 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96245/testReport)** for PR 22461 at commit [`fb7b9d2`](https://github.com/apache/spark/commit/fb7b9d22812f789b970f8dae8a9cd3763c0eccff). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite IllegalA...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22461 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite IllegalA...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22461 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96245/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22325: [SPARK-25318]. Add exception handling when wrappi...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22325#discussion_r218872037 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -444,36 +444,34 @@ final class ShuffleBlockFetcherIterator( throwFetchFailedException(blockId, address, e) } - input = streamWrapper(blockId, in) - // Only copy the stream if it's wrapped by compression or encryption, also the size of - // block is small (the decompressed block is smaller than maxBytesInFlight) - if (detectCorrupt && !input.eq(in) && size < maxBytesInFlight / 3) { -val originalInput = input -val out = new ChunkedByteBufferOutputStream(64 * 1024, ByteBuffer.allocate) -try { + try { +input = streamWrapper(blockId, in) +// Only copy the stream if it's wrapped by compression or encryption, also the size of +// block is small (the decompressed block is smaller than maxBytesInFlight) +if (detectCorrupt && !input.eq(in) && size < maxBytesInFlight / 3) { + val out = new ChunkedByteBufferOutputStream(64 * 1024, ByteBuffer.allocate) // Decompress the whole block at once to detect any corruption, which could increase // the memory usage tne potential increase the chance of OOM. // TODO: manage the memory used here, and spill it into disk in case of OOM. Utils.copyStream(input, out) out.close() input = out.toChunkedByteBuffer.toInputStream(dispose = true) --- End diff -- @jiangxb1987 Thanks for the comment. Was that the purpose of "originalInput" val in the code before my change? That was closing in finally part though not before create an new input --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22413: [SPARK-25425][SQL] Extra options should override session...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22413 +1, thanks for fixing this, @dongjoon-hyun! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22325: [SPARK-25318]. Add exception handling when wrappi...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/22325#discussion_r218873184 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -444,36 +444,34 @@ final class ShuffleBlockFetcherIterator( throwFetchFailedException(blockId, address, e) } - input = streamWrapper(blockId, in) - // Only copy the stream if it's wrapped by compression or encryption, also the size of - // block is small (the decompressed block is smaller than maxBytesInFlight) - if (detectCorrupt && !input.eq(in) && size < maxBytesInFlight / 3) { -val originalInput = input -val out = new ChunkedByteBufferOutputStream(64 * 1024, ByteBuffer.allocate) -try { + try { +input = streamWrapper(blockId, in) +// Only copy the stream if it's wrapped by compression or encryption, also the size of +// block is small (the decompressed block is smaller than maxBytesInFlight) +if (detectCorrupt && !input.eq(in) && size < maxBytesInFlight / 3) { + val out = new ChunkedByteBufferOutputStream(64 * 1024, ByteBuffer.allocate) // Decompress the whole block at once to detect any corruption, which could increase // the memory usage tne potential increase the chance of OOM. // TODO: manage the memory used here, and spill it into disk in case of OOM. Utils.copyStream(input, out) out.close() input = out.toChunkedByteBuffer.toInputStream(dispose = true) --- End diff -- I'm not the original author of that, but I think so. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22450 I feel there are more places we need to fix for negative scale. I couldn't find any design doc for negative scale in Spark and I believe we supported it by accident. That said, fixing division is just fixing the specific case the user reported, which is not ideal. We should either officially support negative scale and fix all the cases, or officially forbid negative scale. However, neither of them can be made into a bug fix for branch 2.3 and 2.4. Instead, I'm proposing a different fix: un-officially forbids negative scale. Users can still create a decimal value with negative scale, but Spark itself should avoid generating such values. See https://github.com/apache/spark/pull/22470 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22470: [SPARK-25454][SQL] should not generate negative s...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/22470 [SPARK-25454][SQL] should not generate negative scale as possible as we can ## What changes were proposed in this pull request? An alternative fix of https://github.com/apache/spark/pull/22450 The issue is, Spark SQL accepts negative decimal scale by accident (I believe so), and have problems to deal with negative scale for some cases (e.g. decimal division, [decimal rounding](https://community.oracle.com/thread/966388)). The best solution is to officially support negative scale and make the behaviors following SQL standard, but this is hard to do and will take a long time. Or we can officially forbid negative scale, which is a breaking change and we can't do it as a bug fix. This PR proposes to avoid generating negative scale as possible as we can, to work around this issue for common cases. ## How was this patch tested? new tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark decimal Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22470.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22470 commit 3c05636b879b678cfcf72dfb515ea691317e470d Author: Wenchen Fan Date: 2018-09-19T15:36:49Z should not generate negative scale as possible as we can --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22470 **[Test build #96271 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96271/testReport)** for PR 22470 at commit [`3c05636`](https://github.com/apache/spark/commit/3c05636b879b678cfcf72dfb515ea691317e470d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22470 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22470 cc @mgaido91 @hvanhovell @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22470 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3251/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/22173#discussion_r218874415 --- Diff: common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java --- @@ -281,4 +284,44 @@ public Properties cryptoConf() { public long maxChunksBeingTransferred() { return conf.getLong("spark.shuffle.maxChunksBeingTransferred", Long.MAX_VALUE); } + + /** + * Check if it is a shuffle client + * and avoid creating unnecessary event loops + * in the TransportChannelHandler + */ + public boolean shuffleClient() { --- End diff -- I think since this is so specialized and really only used from one place (the external shuffle service client) perhaps putting it here isn't that useful. I think if we add a new constructor to the TransportContext with that optional parameter it might make more sense --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22402 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22469#discussion_r218877589 --- Diff: docs/sql-programming-guide.md --- @@ -1879,6 +1879,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see ## Upgrading From Spark SQL 2.3 to 2.4 + - Since Spark 2.4, when there is a struct field in front of the IN operator, the inner query must contain a struct field as well. In previous versions, instead, the fields of the struct were compared to the output of the inner query. Eg. if `a` is a `struct(a string, b int)`, in Spark 2.4 `a in (select (1 as a, 'a' as b) from range(1))` is a valid query, while `a in (select 1, 'a' from range(1))` is not. In previous version it was the opposite. --- End diff -- `IN operator` => `IN operator before a subquery` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22469 LGTM, thanks for adding it! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22460 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3276/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22138 retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22460 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22460 **[Test build #96316 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96316/testReport)** for PR 22460 at commit [`aa0505f`](https://github.com/apache/spark/commit/aa0505fe57d257f49843349e0656a55717f1dac1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22358 **[Test build #96314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96314/testReport)** for PR 22358 at commit [`0e5d0bc`](https://github.com/apache/spark/commit/0e5d0bc84c53356a28dce27b7acbcbab3ea7e106). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22358 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96314/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22358 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24355] Spark external shuffle server improvement ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22173 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24355] Spark external shuffle server improvement ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22173 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96293/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24355] Spark external shuffle server improvement ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22173 **[Test build #96293 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96293/testReport)** for PR 22173 at commit [`40cfbed`](https://github.com/apache/spark/commit/40cfbed70bd51e30ac451cb2204f34c7105fa15f). * This patch **fails from timeout after a configured wait of `400m`**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22462#discussion_r219032068 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/sources/StreamingDataSourceV2Suite.scala --- @@ -143,15 +185,18 @@ class StreamingDataSourceV2Suite extends StreamTest { Trigger.ProcessingTime(1000), Trigger.Continuous(1000)) - private def testPositiveCase(readFormat: String, writeFormat: String, trigger: Trigger) = { --- End diff -- Yup --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22443 thanks, merging to master! @dongjoon-hyun Can you create an umbrella JIRA for updating all the benchmark and take care of it? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22482: WIP - [SPARK-10816][SS] Support session window na...
GitHub user HeartSaVioR opened a pull request: https://github.com/apache/spark/pull/22482 WIP - [SPARK-10816][SS] Support session window natively ## What changes were proposed in this pull request? This patch proposes native support of session window, like Spark has been supporting for time window. Please refer the attached doc in [SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816) for more details on rationalization, concepts, and limitation, etc. In point of end users' view, only the change is addition of "session" SQL function. End users could define query with session window as replacing "window" function to "session" function, and "window" column to "session" column. After then the patch will provide same experience with time window. Internally, this patch will change the physical plan of aggregation a bit: if there's session function being used in query, it will sort the input rows as "grouping keys" + "session", and merge overlapped sessions into one with applying aggregations, so it's like a sort based aggregation but the unit of group is grouping keys + session. Due to handle late event, there's a case multiple session windows co-exist per key which are not yet to evict. This patch handles the case via borrowing state implementation from streaming join which can handle multiple values for given key. ## How was this patch tested? Many UTs are added to verify session window queries for both batch and streaming. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HeartSaVioR/spark SPARK-10816 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22482.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22482 commit a1af74611df7dd5b979fc1a288de96e0b3d415da Author: Jungtaek Lim Date: 2018-09-04T23:10:47Z WIP nothing worked, just recording the progress commit be502485047283e203933a4d78e3b580a0c567df Author: Jungtaek Lim Date: 2018-09-06T04:36:11Z WIP not working yet... lots of implementations needed commit 7c60c0ad922ddacf025ad4762b85d06ab7cb258f Author: Jungtaek Lim Date: 2018-09-06T13:31:08Z WIP Finished implementing UpdatingSessionIterator commit 4e8c260a6e6b73b9bcd347ca242b8e77aedf8d1e Author: Jungtaek Lim Date: 2018-09-07T08:35:32Z WIP add verification on precondition "rows in iterator are sorted by key" commit 39069ded62dc5836b0b0f7c8ec7fb8ce869e5292 Author: Jungtaek Lim Date: 2018-09-08T04:36:46Z Rename SymmetricHashJoinStateManager to MultiValuesStateManager * This will be also used from session window state as well commit c2716340e008000e1fcc5e4d3fcf9befa419ff77 Author: Jungtaek Lim Date: 2018-09-08T04:41:37Z Move package of UpdatingSessionIterator commit df4cffd5fd1ea82be509f1cd97e5fc3a7ef8acb6 Author: Jungtaek Lim Date: 2018-09-10T05:52:28Z WIP add MergingSortWithMultiValuesStateIterator, now integrating with stateful operators (WIP...) commit 79e32b918c3db41c7d6c1c1d55276d3f696746d5 Author: Jungtaek Lim Date: 2018-09-13T06:54:37Z WIP the first version of working one! Still have lots of TODOs and FIXMEs to go commit fb7aa17488e5753c5460f383e1b0f4bedca6dee8 Author: Jungtaek Lim Date: 2018-09-13T08:13:45Z Add more explanations commit 9f41b9d6e7960031c52603bd1da9aeca747e1dfb Author: Jungtaek Lim Date: 2018-09-13T08:49:01Z Silly bugfix & block session window for batch query as of now We can enable it but there're lots of approaches on aggregations in batch side... * AggUtils.planAggregateWithoutDistinct * AggUtils.planAggregateWithOneDistinct * RewriteDistinctAggregates * AggregateInPandasExec So unless we are sure which things to support, just block them for now... commit 0a62b1f0c274859061c0f3ab2c63450052985ac7 Author: Jungtaek Lim Date: 2018-09-13T09:28:34Z More works: majorly split out updating session to individual physical node * we will leverage such node for batch case if we want commit acb5a0c42641041ca3adae2c9f2293b4dfa837cf Author: Jungtaek Lim Date: 2018-09-13T09:38:00Z Fix a silly bug and also add check for session window against batch query commit 1b6502c92231b7aaa9d0d6f620a5bcc624b862ec Author: Jungtaek Lim Date: 2018-09-13T11:30:15Z WIP Fixed eviction on update mode commit fec9a8ae5c1d421322738bd474fcb5508421f51a Author: Jungtaek Lim Date: 2018-09-13T12:48:07Z WIP found root reason of broken UT... fixed it commit c87e4eebcc53c81328d52e4d4ea270bcede8b26e Author: Jungtaek Lim Date: 2018-09-13T12:50:31Z WIP remove printing "explain" on UTs commit
[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22408#discussion_r219039187 --- Diff: docs/sql-programming-guide.md --- @@ -1879,6 +1879,66 @@ working with timestamps in `pandas_udf`s to get the best performance, see ## Upgrading From Spark SQL 2.3 to 2.4 + - In Spark version 2.3 and earlier, the second parameter to array_contains function is implicitly promoted to the element type of first array type parameter. This type promotion can be lossy and may cause `array_contains` function to return wrong result. This problem has been addressed in 2.4 by employing a safer type promotion mechanism. This can cause some change in behavior and are illustrated in the table below. + + + +Query + + +Result Spark 2.3 or Prior + + +Result Spark 2.4 + + +Remarks + + + + +SELECT array_contains(array(1), 1.34D); + + +true + + +false + + +In Spark 2.4, both left and right parameters are promoted to array(double) and double type respectively. + + + + +SELECT array_contains(array(1), '1'); + + +true + + +AnalysisException is thrown since integer type can not be promoted to string type in a loss-less manner. --- End diff -- Ah then it's fine, we don't need to change anything here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22462 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22462 **[Test build #96325 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96325/testReport)** for PR 22462 at commit [`897cf69`](https://github.com/apache/spark/commit/897cf69a4b3c6eb07eb321c23644167c1bed211b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22462 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3281/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22467 **[Test build #96326 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96326/testReport)** for PR 22467 at commit [`11d61a4`](https://github.com/apache/spark/commit/11d61a414ee41449feb2db744657696d79db5560). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22467 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3282/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22467 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22325: [SPARK-25318]. Add exception handling when wrappi...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22325#discussion_r219021367 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -444,36 +445,36 @@ final class ShuffleBlockFetcherIterator( throwFetchFailedException(blockId, address, e) } - input = streamWrapper(blockId, in) - // Only copy the stream if it's wrapped by compression or encryption, also the size of - // block is small (the decompressed block is smaller than maxBytesInFlight) - if (detectCorrupt && !input.eq(in) && size < maxBytesInFlight / 3) { -val originalInput = input -val out = new ChunkedByteBufferOutputStream(64 * 1024, ByteBuffer.allocate) -try { + try { +input = streamWrapper(blockId, in) --- End diff -- in only need to be closed when we actually copy streams meaning that only when `(detectCorrupt && !input.eq(in) && size < maxBytesInFlight / 3)` is true. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22480: [SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests fail...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22480 cc @tdas and @BryanCutler --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22480: [SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests fail...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22480 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22480: [SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests fail...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22480 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3278/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22478: [SPARK-25472][SS] Don't have legitimate stops of streams...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22478 **[Test build #96308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96308/testReport)** for PR 22478 at commit [`3b8addb`](https://github.com/apache/spark/commit/3b8addb9cf02489978594505470fdd527a35c2a7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22460 **[Test build #96318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96318/testReport)** for PR 22460 at commit [`826ed76`](https://github.com/apache/spark/commit/826ed7613af3f0a0e4ceb92113d09c86af345096). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22460 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96318/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22460 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22475: [SPARK-4502][SQL] Rename to spark.sql.optimizer.n...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22475 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/7 long thread, are we all good with this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21649: [SPARK-23648][R][SQL]Adds more types for hint in SparkR
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21649 merged to master, thx --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22480: [SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests fail...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22480 +1 for adding the note --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22460 **[Test build #96327 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96327/testReport)** for PR 22460 at commit [`3bdb38a`](https://github.com/apache/spark/commit/3bdb38aec74b08b135aa5976982c20f74aae9736). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22460 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22460 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96316/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22138 **[Test build #96317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96317/testReport)** for PR 22138 at commit [`ddd4f2f`](https://github.com/apache/spark/commit/ddd4f2fc38c42dd1b781b0e3df46432bb6829e7b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22460 **[Test build #96316 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96316/testReport)** for PR 22460 at commit [`aa0505f`](https://github.com/apache/spark/commit/aa0505fe57d257f49843349e0656a55717f1dac1). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org