[GitHub] [spark] zjf2012 commented on issue #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor
zjf2012 commented on issue #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor URL: https://github.com/apache/spark/pull/23560#issuecomment-487828436 @vanzin , I tried to give some documentation about this PR. Please help review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zjf2012 commented on issue #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor
zjf2012 commented on issue #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor URL: https://github.com/apache/spark/pull/23560#issuecomment-487828500 @carsonwang , please help review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zjf2012 commented on a change in pull request #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor
zjf2012 commented on a change in pull request #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor URL: https://github.com/apache/spark/pull/23560#discussion_r279620060 ## File path: core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala ## @@ -194,12 +195,29 @@ private[netty] class Dispatcher(nettyEnv: NettyRpcEnv, numUsableCores: Int) exte endpoints.containsKey(name) } - /** Thread pool used for dispatching messages. */ - private val threadpool: ThreadPoolExecutor = { + private def getNumOfThreads(conf: SparkConf): Int = { val availableCores = if (numUsableCores > 0) numUsableCores else Runtime.getRuntime.availableProcessors() -val numThreads = nettyEnv.conf.get(RPC_NETTY_DISPATCHER_NUM_THREADS) +// module configuration Review comment: the comment is removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zjf2012 commented on a change in pull request #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor
zjf2012 commented on a change in pull request #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor URL: https://github.com/apache/spark/pull/23560#discussion_r279620007 ## File path: core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala ## @@ -194,12 +195,29 @@ private[netty] class Dispatcher(nettyEnv: NettyRpcEnv, numUsableCores: Int) exte endpoints.containsKey(name) } - /** Thread pool used for dispatching messages. */ - private val threadpool: ThreadPoolExecutor = { + private def getNumOfThreads(conf: SparkConf): Int = { val availableCores = if (numUsableCores > 0) numUsableCores else Runtime.getRuntime.availableProcessors() -val numThreads = nettyEnv.conf.get(RPC_NETTY_DISPATCHER_NUM_THREADS) +// module configuration +val modNumThreads = conf.get(RPC_NETTY_DISPATCHER_NUM_THREADS) .getOrElse(math.max(2, availableCores)) +// get right role Review comment: the comment is removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zjf2012 commented on a change in pull request #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor
zjf2012 commented on a change in pull request #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor URL: https://github.com/apache/spark/pull/23560#discussion_r279619960 ## File path: core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala ## @@ -194,12 +195,29 @@ private[netty] class Dispatcher(nettyEnv: NettyRpcEnv, numUsableCores: Int) exte endpoints.containsKey(name) } - /** Thread pool used for dispatching messages. */ - private val threadpool: ThreadPoolExecutor = { + private def getNumOfThreads(conf: SparkConf): Int = { val availableCores = if (numUsableCores > 0) numUsableCores else Runtime.getRuntime.availableProcessors() -val numThreads = nettyEnv.conf.get(RPC_NETTY_DISPATCHER_NUM_THREADS) +// module configuration +val modNumThreads = conf.get(RPC_NETTY_DISPATCHER_NUM_THREADS) .getOrElse(math.max(2, availableCores)) +// get right role +val executorId = conf.get(EXECUTOR_ID).getOrElse("") Review comment: @vanzin , thanks for the clean code. code is rewritten. please help review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor
AmplabJenkins removed a comment on issue #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor URL: https://github.com/apache/spark/pull/23560#issuecomment-487828009 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10317/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor
AmplabJenkins commented on issue #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor URL: https://github.com/apache/spark/pull/23560#issuecomment-487828006 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor
AmplabJenkins commented on issue #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor URL: https://github.com/apache/spark/pull/23560#issuecomment-487828009 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10317/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor
AmplabJenkins removed a comment on issue #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor URL: https://github.com/apache/spark/pull/23560#issuecomment-487828006 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zjf2012 commented on a change in pull request #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor
zjf2012 commented on a change in pull request #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor URL: https://github.com/apache/spark/pull/23560#discussion_r279619817 ## File path: core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala ## @@ -47,11 +48,21 @@ private[netty] class NettyRpcEnv( host: String, securityManager: SecurityManager, numUsableCores: Int) extends RpcEnv(conf) with Logging { + // try to get specific threads configurations of driver and executor + val executorId = conf.get(EXECUTOR_ID).getOrElse("") + // neither driver nor executor if executor id is not set Review comment: the comment removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zjf2012 commented on a change in pull request #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor
zjf2012 commented on a change in pull request #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor URL: https://github.com/apache/spark/pull/23560#discussion_r279619771 ## File path: core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala ## @@ -47,11 +48,21 @@ private[netty] class NettyRpcEnv( host: String, securityManager: SecurityManager, numUsableCores: Int) extends RpcEnv(conf) with Logging { + // try to get specific threads configurations of driver and executor + val executorId = conf.get(EXECUTOR_ID).getOrElse("") + // neither driver nor executor if executor id is not set + val role = executorId match { +case "" => None +case SparkContext.DRIVER_IDENTIFIER => Some("driver") +// any other non-empty values since executor must has "spark.executor.id" set Review comment: the comment removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor
SparkQA commented on issue #23560: [SPARK-26632][Core] Separate Thread Configurations of Driver and Executor URL: https://github.com/apache/spark/pull/23560#issuecomment-487827369 **[Test build #105023 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105023/testReport)** for PR 23560 at commit [`cb27a2c`](https://github.com/apache/spark/commit/cb27a2c54ce65aefb7571ccfdd18481cc7c05375). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery
SparkQA commented on issue #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery URL: https://github.com/apache/spark/pull/24344#issuecomment-487825638 **[Test build #105022 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105022/testReport)** for PR 24344 at commit [`5b8f60a`](https://github.com/apache/spark/commit/5b8f60a19a5600e5269de61704547371a3773a7b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24493: Revert "[SPARK-24601][SPARK-27051][BACKPORT][CORE] Update to Jackson 2.9.8
SparkQA commented on issue #24493: Revert "[SPARK-24601][SPARK-27051][BACKPORT][CORE] Update to Jackson 2.9.8 URL: https://github.com/apache/spark/pull/24493#issuecomment-487825639 **[Test build #105021 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105021/testReport)** for PR 24493 at commit [`b867208`](https://github.com/apache/spark/commit/b8672084b69a550e13f2cac2941a02a60fac8482). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery
AmplabJenkins removed a comment on issue #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery URL: https://github.com/apache/spark/pull/24344#issuecomment-487825392 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10316/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24493: Revert "[SPARK-24601][SPARK-27051][BACKPORT][CORE] Update to Jackson 2.9.8
AmplabJenkins removed a comment on issue #24493: Revert "[SPARK-24601][SPARK-27051][BACKPORT][CORE] Update to Jackson 2.9.8 URL: https://github.com/apache/spark/pull/24493#issuecomment-487825378 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery
AmplabJenkins removed a comment on issue #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery URL: https://github.com/apache/spark/pull/24344#issuecomment-487825391 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sandeep-katta commented on issue #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml
sandeep-katta commented on issue #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml URL: https://github.com/apache/spark/pull/24489#issuecomment-487825516 no we can't, reason is any runtime conf set using set command will not be available in hadoopconf, it is only present in sqlConf This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24493: Revert "[SPARK-24601][SPARK-27051][BACKPORT][CORE] Update to Jackson 2.9.8
AmplabJenkins removed a comment on issue #24493: Revert "[SPARK-24601][SPARK-27051][BACKPORT][CORE] Update to Jackson 2.9.8 URL: https://github.com/apache/spark/pull/24493#issuecomment-487825381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10315/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24493: Revert "[SPARK-24601][SPARK-27051][BACKPORT][CORE] Update to Jackson 2.9.8
AmplabJenkins commented on issue #24493: Revert "[SPARK-24601][SPARK-27051][BACKPORT][CORE] Update to Jackson 2.9.8 URL: https://github.com/apache/spark/pull/24493#issuecomment-487825378 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24493: Revert "[SPARK-24601][SPARK-27051][BACKPORT][CORE] Update to Jackson 2.9.8
AmplabJenkins commented on issue #24493: Revert "[SPARK-24601][SPARK-27051][BACKPORT][CORE] Update to Jackson 2.9.8 URL: https://github.com/apache/spark/pull/24493#issuecomment-487825381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10315/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery
AmplabJenkins commented on issue #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery URL: https://github.com/apache/spark/pull/24344#issuecomment-487825391 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery
AmplabJenkins commented on issue #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery URL: https://github.com/apache/spark/pull/24344#issuecomment-487825392 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10316/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ScrapCodes commented on issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be closed if any task is using it - adds inuse tracking.
ScrapCodes commented on issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be closed if any task is using it - adds inuse tracking. URL: https://github.com/apache/spark/pull/19096#issuecomment-487825329 @zsxwing Ping ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile opened a new pull request #24493: Revert "[SPARK-24601][SPARK-27051][BACKPORT][CORE] Update to Jackson 2.9.8
gatorsmile opened a new pull request #24493: Revert "[SPARK-24601][SPARK-27051][BACKPORT][CORE] Update to Jackson 2.9.8 URL: https://github.com/apache/spark/pull/24493 ## What changes were proposed in this pull request? This reverts commit 6f394a20bf49f67b4d6329a1c25171c8024a2fae. In general, we need to be very cautious about the Jackson upgrade in the patch releases, especially when this upgrade could break the existing behaviors of the external packages or data sources, and generate different results after the upgrade. The external packages and data sources need to change their source code to keep the original behaviors. The upgrade requires more discussions before releasing it, I think. In the previous PR https://github.com/apache/spark/pull/22071, we turned off `spark.master.rest.enabled` by default and added the following claim in our security doc: > The Rest Submission Server and the MesosClusterDispatcher do not support authentication. You should ensure that all network access to the REST API & MesosClusterDispatcher (port 6066 and 7077 respectively by default) are restricted to hosts that are trusted to submit jobs. We need to understand whether this Jackson CVE applies to Spark. Before officially releasing it, we need more inputs from all of you. Currently, I would suggest to revert this upgrade from the upcoming 2.4.3 release, which is trying to fix the accidental default Scala version changes in pre-built artifacts. ## How was this patch tested? N/A This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #24442: [SPARK-27547][SQL] fix DataFrame self-join problems
cloud-fan commented on a change in pull request #24442: [SPARK-27547][SQL] fix DataFrame self-join problems URL: https://github.com/apache/spark/pull/24442#discussion_r279616103 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/analysis/ResolveDatasetColumnReference.scala ## @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.analysis + +import scala.collection.mutable +import scala.util.Try + +import org.apache.spark.sql.Dataset +import org.apache.spark.sql.catalyst.AliasIdentifier +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.MetadataBuilder + +/** + * Resolves the Dataset column reference by traversing the query plan and finding the plan subtree + * of the Dataset that the column reference belongs to. + * + * Dataset column reference is simply an [[AttributeReference]] that is returned by `Dataset#col`. + * Most of time we don't need to do anything special, as [[AttributeReference]] can point to + * the column precisely. However, in case of self-join, the analyzer generates + * [[AttributeReference]] with new expr IDs for the right side plan of the join. If the Dataset + * column reference points to a column in the right side plan of a self-join, we need to replace it + * with the corresponding newly generated [[AttributeReference]]. + */ +class ResolveDatasetColumnReference(conf: SQLConf) extends Rule[LogicalPlan] { + + // Dataset column reference is an `AttributeReference` with 2 special metadata. + private def isColumnReference(a: AttributeReference): Boolean = { +a.metadata.contains(Dataset.ID_PREFIX) && a.metadata.contains(Dataset.COL_POS_PREFIX) + } + + private case class ColumnReference(datasetId: Long, colPos: Int) + + private def toColumnReference(a: AttributeReference): ColumnReference = { +ColumnReference( + a.metadata.getLong(Dataset.ID_PREFIX), + a.metadata.getLong(Dataset.COL_POS_PREFIX).toInt) + } + + private def stripColumnReferenceMetadata(a: AttributeReference): AttributeReference = { +val metadataWithoutId = new MetadataBuilder() + .withMetadata(a.metadata) + .remove(Dataset.ID_PREFIX) + .remove(Dataset.COL_POS_PREFIX) + .build() +a.withMetadata(metadataWithoutId) + } + + override def apply(plan: LogicalPlan): LogicalPlan = { +if (!conf.getConf(SQLConf.RESOLVE_DATASET_COLUMN_REFERENCE)) return plan + +// We always remove the special metadata from `AttributeReference` at the end of this rule, so +// Dataset column reference only exists in the root node via Dataset transformations like +// `Dataset#select`. +val colRefs = plan.expressions.flatMap(_.collect { + case a: AttributeReference if isColumnReference(a) => toColumnReference(a) +}) + +if (colRefs.isEmpty) { + plan +} else { + // Keeps the mapping between the column reference and the actual column it points to. This + // will be used to replace the column references with actual columns in the root node later. + val colRefToActualCol = new mutable.HashMap[ColumnReference, AttributeReference]() + // Keeps the column references that points to more than one actual columns. We will not + // replace these ambiguous column references and leave them as they were. + val ambiguousColRefs = new mutable.HashSet[ColumnReference]() Review comment: think about outer joins. Columns from different join sides are always different columns. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
[GitHub] [spark] francis0407 commented on a change in pull request #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery
francis0407 commented on a change in pull request #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery URL: https://github.com/apache/spark/pull/24344#discussion_r279615839 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala ## @@ -118,6 +224,19 @@ case class PlanSubqueries(sparkSession: SparkSession) extends Rule[SparkPlan] { ScalarSubquery( SubqueryExec(s"subquery${subquery.exprId.id}", executedPlan), subquery.exprId) + case subquery: expressions.Exists => Review comment: Ok, I will open anther one for it after this PR, and remove it here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487823491 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487823494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105019/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487823491 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487823494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105019/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
SparkQA removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487809138 **[Test build #105019 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105019/testReport)** for PR 24490 at commit [`5708086`](https://github.com/apache/spark/commit/5708086b6f2248bc7ce568d3a2b020af62c6b3ee). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery
cloud-fan commented on a change in pull request #24344: [SPARK-27440][SQL] Optimize uncorrelated predicate subquery URL: https://github.com/apache/spark/pull/24344#discussion_r279615361 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala ## @@ -118,6 +224,19 @@ case class PlanSubqueries(sparkSession: SparkSession) extends Rule[SparkPlan] { ScalarSubquery( SubqueryExec(s"subquery${subquery.exprId.id}", executedPlan), subquery.exprId) + case subquery: expressions.Exists => Review comment: We need `RewriteUncorrelatedSubquery`, but it should be a separated PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
SparkQA commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487823242 **[Test build #105019 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105019/testReport)** for PR 24490 at commit [`5708086`](https://github.com/apache/spark/commit/5708086b6f2248bc7ce568d3a2b020af62c6b3ee). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml
cloud-fan commented on issue #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml URL: https://github.com/apache/spark/pull/24489#issuecomment-487823066 shall we just make `HiveSerDe.getDefaultStorage` accept hadoop conf? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #14918: [SPARK-17360][PYSPARK] Support generator in createDataFrame
HyukjinKwon commented on issue #14918: [SPARK-17360][PYSPARK] Support generator in createDataFrame URL: https://github.com/apache/spark/pull/14918#issuecomment-487821872 You can already do all the thins above via other APIs like `rdd.mapPartition`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xor007 commented on issue #14918: [SPARK-17360][PYSPARK] Support generator in createDataFrame
xor007 commented on issue #14918: [SPARK-17360][PYSPARK] Support generator in createDataFrame URL: https://github.com/apache/spark/pull/14918#issuecomment-487821340 > Do we have any usecases or benchmarks for cases where this would be helpful? Yes my huge use case which I am surprised a lot of people in industry don't have is **massive data mining**: - You have a lot of files on the internet (for instance text from a large collection of webpages) - You are able to write a python generator that goes through the files to find and ouput sentences containing the word "covfefe": I have seen a python generator go through 90G of such a real collection of 11000 files within minutes(they where downloaded) - You want to create a dataframe of all those sentences and the actual collection of those sentences ends up being less than 20Mb You could create a Dataset from the generator. Now that I have written this it seems I can run flatmap on the file with the generator as the transformation. But something like Dataframe.from_generator in spark would be nice. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24492: [SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6
AmplabJenkins removed a comment on issue #24492: [SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6 URL: https://github.com/apache/spark/pull/24492#issuecomment-487819635 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105017/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24492: [SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6
AmplabJenkins commented on issue #24492: [SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6 URL: https://github.com/apache/spark/pull/24492#issuecomment-487819634 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24492: [SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6
AmplabJenkins removed a comment on issue #24492: [SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6 URL: https://github.com/apache/spark/pull/24492#issuecomment-487819634 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24492: [SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6
AmplabJenkins commented on issue #24492: [SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6 URL: https://github.com/apache/spark/pull/24492#issuecomment-487819635 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105017/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24492: [SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6
SparkQA removed a comment on issue #24492: [SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6 URL: https://github.com/apache/spark/pull/24492#issuecomment-487805183 **[Test build #105017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105017/testReport)** for PR 24492 at commit [`1ae8592`](https://github.com/apache/spark/commit/1ae85922ddb6f6533f1e632cb6b93ed34e9b8cff). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24492: [SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6
SparkQA commented on issue #24492: [SPARK-27601][BUILD] Upgrade stream-lib to 2.9.6 URL: https://github.com/apache/spark/pull/24492#issuecomment-487819464 **[Test build #105017 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105017/testReport)** for PR 24492 at commit [`1ae8592`](https://github.com/apache/spark/commit/1ae85922ddb6f6533f1e632cb6b93ed34e9b8cff). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sandeep-katta commented on a change in pull request #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml
sandeep-katta commented on a change in pull request #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml URL: https://github.com/apache/spark/pull/24489#discussion_r279611160 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala ## @@ -152,4 +152,9 @@ class SparkSessionBuilderSuite extends SparkFunSuite with BeforeAndAfterEach { session.sparkContext.hadoopConfiguration.unset(mySpecialKey) } } + + test("SPARK-27555: Spark SessionState.conf should load hive-site.xml ") { +val session = SparkSession.builder().master("local").getOrCreate() +assert(session.sessionState.conf.getConfString("hive.in.test") == "true") Review comment: "hive.in.test" is present in hive-site.xml which is loaded into hadoop conf, so this test asserts presence of hive conf in sessionState.conf, more over we cannot use hive conf inside spark-defaults.conf This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24469: [SPARK-27576][SQL] table capability to skip the output column resolution
dongjoon-hyun commented on a change in pull request #24469: [SPARK-27576][SQL] table capability to skip the output column resolution URL: https://github.com/apache/spark/pull/24469#discussion_r279609273 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala ## @@ -44,6 +44,8 @@ case class DataSourceV2Relation( override def name: String = table.name() + override def requireSchemaMatch: Boolean = !table.supports(TableCapability.ACCEPT_ANY_SCHEMA) Review comment: This also can be toggled into a more natural(positive) way. ```scala - override def requireSchemaMatch: Boolean = !table.supports(TableCapability.ACCEPT_ANY_SCHEMA) + override def ignoreSchemaMatch: Boolean = table.supports(TableCapability.ACCEPT_ANY_SCHEMA) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24469: [SPARK-27576][SQL] table capability to skip the output column resolution
dongjoon-hyun commented on a change in pull request #24469: [SPARK-27576][SQL] table capability to skip the output column resolution URL: https://github.com/apache/spark/pull/24469#discussion_r279609273 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala ## @@ -44,6 +44,8 @@ case class DataSourceV2Relation( override def name: String = table.name() + override def requireSchemaMatch: Boolean = !table.supports(TableCapability.ACCEPT_ANY_SCHEMA) Review comment: This also can be switched in a more natural(positive) way. ```scala - override def requireSchemaMatch: Boolean = !table.supports(TableCapability.ACCEPT_ANY_SCHEMA) + override def ignoreSchemaMatch: Boolean = table.supports(TableCapability.ACCEPT_ANY_SCHEMA) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yuchaoran2011 closed pull request #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentry-secured cluster
yuchaoran2011 closed pull request #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentry-secured cluster URL: https://github.com/apache/spark/pull/21398 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24469: [SPARK-27576][SQL] table capability to skip the output column resolution
dongjoon-hyun commented on a change in pull request #24469: [SPARK-27576][SQL] table capability to skip the output column resolution URL: https://github.com/apache/spark/pull/24469#discussion_r279609139 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NamedRelation.scala ## @@ -21,4 +21,7 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan trait NamedRelation extends LogicalPlan { def name: String + + // When true, the schema of input data must match the schema of this relation, during write. + def requireSchemaMatch: Boolean = true Review comment: Can we use `def ignoreSchemaMatch: Boolean = false` because we have only one instance and it's negative form. ```scala - !table.requireSchemaMatch || { + table.ignoreSchemaMatch || { ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml
cloud-fan commented on a change in pull request #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml URL: https://github.com/apache/spark/pull/24489#discussion_r279608344 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala ## @@ -152,4 +152,9 @@ class SparkSessionBuilderSuite extends SparkFunSuite with BeforeAndAfterEach { session.sparkContext.hadoopConfiguration.unset(mySpecialKey) } } + + test("SPARK-27555: Spark SessionState.conf should load hive-site.xml ") { +val session = SparkSession.builder().master("local").getOrCreate() +assert(session.sessionState.conf.getConfString("hive.in.test") == "true") Review comment: I'm not sure we want to do this. We load `hive-site.xml` into hadoop conf for hive compatibility. But `hive-site.xml` is not a native/official way to load Spark configs, so this conf should not appear in `SessionStage.conf`. Please use the `spark-defaults.conf`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml
cloud-fan commented on a change in pull request #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml URL: https://github.com/apache/spark/pull/24489#discussion_r279607994 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala ## @@ -152,4 +152,9 @@ class SparkSessionBuilderSuite extends SparkFunSuite with BeforeAndAfterEach { session.sparkContext.hadoopConfiguration.unset(mySpecialKey) } } + + test("SPARK-27555: Spark SessionState.conf should load hive-site.xml ") { +val session = SparkSession.builder().master("local").getOrCreate() +assert(session.sessionState.conf.getConfString("hive.in.test") == "true") Review comment: I'm not sure we want to do this. We load `hive-site.xml` into hadoop conf for hive compatibility. But `hive-site.xml` is not a native/official way to load Spark configs, so this conf should not appear in `SessionStage.conf`. Please use the `spark-defaults.conf`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24469: [SPARK-27576][SQL] table capability to skip the output column resolution
SparkQA commented on issue #24469: [SPARK-27576][SQL] table capability to skip the output column resolution URL: https://github.com/apache/spark/pull/24469#issuecomment-487812291 **[Test build #105020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105020/testReport)** for PR 24469 at commit [`65cefd3`](https://github.com/apache/spark/commit/65cefd368943bf58b4f841490902329cb0326d70). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24469: [SPARK-27576][SQL] table capability to skip the output column resolution
AmplabJenkins removed a comment on issue #24469: [SPARK-27576][SQL] table capability to skip the output column resolution URL: https://github.com/apache/spark/pull/24469#issuecomment-487812090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10314/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24469: [SPARK-27576][SQL] table capability to skip the output column resolution
AmplabJenkins removed a comment on issue #24469: [SPARK-27576][SQL] table capability to skip the output column resolution URL: https://github.com/apache/spark/pull/24469#issuecomment-487812086 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24469: [SPARK-27576][SQL] table capability to skip the output column resolution
AmplabJenkins commented on issue #24469: [SPARK-27576][SQL] table capability to skip the output column resolution URL: https://github.com/apache/spark/pull/24469#issuecomment-487812090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10314/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24469: [SPARK-27576][SQL] table capability to skip the output column resolution
AmplabJenkins commented on issue #24469: [SPARK-27576][SQL] table capability to skip the output column resolution URL: https://github.com/apache/spark/pull/24469#issuecomment-487812086 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24469: [SPARK-27576][SQL] table capability to skip the output column resolution
dongjoon-hyun commented on issue #24469: [SPARK-27576][SQL] table capability to skip the output column resolution URL: https://github.com/apache/spark/pull/24469#issuecomment-487811729 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] 10110346 commented on a change in pull request #24446: [SPARK-27552][SQL]The configuration `hive.exec.stagingdir` is invalid on Windows OS
10110346 commented on a change in pull request #24446: [SPARK-27552][SQL]The configuration `hive.exec.stagingdir` is invalid on Windows OS URL: https://github.com/apache/spark/pull/24446#discussion_r279604518 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala ## @@ -219,7 +219,7 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { val fs: FileSystem = inputPath.getFileSystem(hadoopConf) var stagingPathName: String = if (inputPathName.indexOf(stagingDir) == -1) { -new Path(inputPathName, stagingDir).toString +new Path(inputPathName, stagingDir).toUri.getPath Review comment: Can we change this codes: ``` val inputPathUri: URI = inputPath.toUri val inputPathName: String = inputPathUri.getPath ``` to: `val inputPathName: String = inputPath.toString` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope
AmplabJenkins commented on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope URL: https://github.com/apache/spark/pull/24070#issuecomment-487810532 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope
AmplabJenkins removed a comment on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope URL: https://github.com/apache/spark/pull/24070#issuecomment-487810532 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope
AmplabJenkins removed a comment on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope URL: https://github.com/apache/spark/pull/24070#issuecomment-487810537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105012/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope
AmplabJenkins commented on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope URL: https://github.com/apache/spark/pull/24070#issuecomment-487810537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105012/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks when executors are lost
Ngone51 commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks when executors are lost URL: https://github.com/apache/spark/pull/24462#issuecomment-487810457 I think @attilapiros has a good point here. And now I'm wondering that how does your(@bsidhom) pluggable shuffle manager achives a shuffle process (map & reduce) with `ExternalShuffleService` disabled ? And I have similar question for @jealous . Does your shuffle manager still works with `ExternalShuffleService` disabled ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope
SparkQA removed a comment on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope URL: https://github.com/apache/spark/pull/24070#issuecomment-487790534 **[Test build #105012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105012/testReport)** for PR 24070 at commit [`1e0e5f1`](https://github.com/apache/spark/commit/1e0e5f1c4f8fbe3500fa1302654b12adf0421e50). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope
SparkQA commented on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope URL: https://github.com/apache/spark/pull/24070#issuecomment-487810300 **[Test build #105012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105012/testReport)** for PR 24070 at commit [`1e0e5f1`](https://github.com/apache/spark/commit/1e0e5f1c4f8fbe3500fa1302654b12adf0421e50). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24456: [SPARK-27557][DOC] Add copy button to Python API docs for easier copying of code-blocks
AmplabJenkins removed a comment on issue #24456: [SPARK-27557][DOC] Add copy button to Python API docs for easier copying of code-blocks URL: https://github.com/apache/spark/pull/24456#issuecomment-487810103 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24456: [SPARK-27557][DOC] Add copy button to Python API docs for easier copying of code-blocks
AmplabJenkins commented on issue #24456: [SPARK-27557][DOC] Add copy button to Python API docs for easier copying of code-blocks URL: https://github.com/apache/spark/pull/24456#issuecomment-487810103 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24456: [SPARK-27557][DOC] Add copy button to Python API docs for easier copying of code-blocks
AmplabJenkins commented on issue #24456: [SPARK-27557][DOC] Add copy button to Python API docs for easier copying of code-blocks URL: https://github.com/apache/spark/pull/24456#issuecomment-487810105 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105015/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24456: [SPARK-27557][DOC] Add copy button to Python API docs for easier copying of code-blocks
AmplabJenkins removed a comment on issue #24456: [SPARK-27557][DOC] Add copy button to Python API docs for easier copying of code-blocks URL: https://github.com/apache/spark/pull/24456#issuecomment-487810105 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105015/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24456: [SPARK-27557][DOC] Add copy button to Python API docs for easier copying of code-blocks
SparkQA removed a comment on issue #24456: [SPARK-27557][DOC] Add copy button to Python API docs for easier copying of code-blocks URL: https://github.com/apache/spark/pull/24456#issuecomment-487794454 **[Test build #105015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105015/testReport)** for PR 24456 at commit [`50bd0e3`](https://github.com/apache/spark/commit/50bd0e3d0d50c8a89acb26b7c4bebc992db9cb8d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487809694 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24456: [SPARK-27557][DOC] Add copy button to Python API docs for easier copying of code-blocks
SparkQA commented on issue #24456: [SPARK-27557][DOC] Add copy button to Python API docs for easier copying of code-blocks URL: https://github.com/apache/spark/pull/24456#issuecomment-487809886 **[Test build #105015 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105015/testReport)** for PR 24456 at commit [`50bd0e3`](https://github.com/apache/spark/commit/50bd0e3d0d50c8a89acb26b7c4bebc992db9cb8d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487809696 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10313/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24446: [SPARK-27552][SQL]The configuration `hive.exec.stagingdir` is invalid on Windows OS
dongjoon-hyun commented on a change in pull request #24446: [SPARK-27552][SQL]The configuration `hive.exec.stagingdir` is invalid on Windows OS URL: https://github.com/apache/spark/pull/24446#discussion_r279603516 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala ## @@ -219,7 +219,7 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { val fs: FileSystem = inputPath.getFileSystem(hadoopConf) var stagingPathName: String = if (inputPathName.indexOf(stagingDir) == -1) { -new Path(inputPathName, stagingDir).toString +new Path(inputPathName, stagingDir).toUri.getPath Review comment: +1 for @HyukjinKwon 's comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487809696 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10313/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487809694 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wanghui93 commented on issue #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml
wanghui93 commented on issue #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml URL: https://github.com/apache/spark/pull/24489#issuecomment-487809368 > I think it's done in `SharedState`? The configuration parameters in hive-site.xml are loaded into session.sparkContext.hadoopConfiguration parameters by SharedState, not copy to session.sparkContext.conf. And sqlConf just copy session.sparkContext.conf. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
SparkQA commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487809138 **[Test build #105019 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105019/testReport)** for PR 24490 at commit [`5708086`](https://github.com/apache/spark/commit/5708086b6f2248bc7ce568d3a2b020af62c6b3ee). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wanghui93 commented on issue #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml
wanghui93 commented on issue #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml URL: https://github.com/apache/spark/pull/24489#issuecomment-487808648 I think sqlConf should load at least hive-site.xml since spark need 'hive.default.fileformat' which comes from hive-site.xml. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] s1ck commented on a change in pull request #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
s1ck commented on a change in pull request #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#discussion_r279602483 ## File path: pom.xml ## @@ -106,6 +106,9 @@ external/kafka-0-10-assembly external/kafka-0-10-sql external/avro +graph/api +graph/cypher +graph/graph Review comment: Thanks for the hint. We wanted to add the modules for Python testing together with https://issues.apache.org/jira/browse/SPARK-27306, https://issues.apache.org/jira/browse/SPARK-27307 and https://issues.apache.org/jira/browse/SPARK-27308 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml
cloud-fan commented on issue #24489: [SPARK-27555][SQL] sqlConf of SessionState should load hive-site.xml URL: https://github.com/apache/spark/pull/24489#issuecomment-487808568 I think it's done in `SharedState`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope
AmplabJenkins removed a comment on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope URL: https://github.com/apache/spark/pull/24070#issuecomment-487808439 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope
AmplabJenkins removed a comment on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope URL: https://github.com/apache/spark/pull/24070#issuecomment-487808441 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105013/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope
AmplabJenkins commented on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope URL: https://github.com/apache/spark/pull/24070#issuecomment-487808439 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope
AmplabJenkins commented on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope URL: https://github.com/apache/spark/pull/24070#issuecomment-487808441 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105013/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope
SparkQA commented on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope URL: https://github.com/apache/spark/pull/24070#issuecomment-487808238 **[Test build #105013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105013/testReport)** for PR 24070 at commit [`dee0dfc`](https://github.com/apache/spark/commit/dee0dfcc5e22829cf5fb18f05d517bac81926bd3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope
SparkQA removed a comment on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope URL: https://github.com/apache/spark/pull/24070#issuecomment-487792537 **[Test build #105013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105013/testReport)** for PR 24070 at commit [`dee0dfc`](https://github.com/apache/spark/commit/dee0dfcc5e22829cf5fb18f05d517bac81926bd3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487807845 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105018/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
SparkQA commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487807836 **[Test build #105018 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105018/testReport)** for PR 24490 at commit [`02e803a`](https://github.com/apache/spark/commit/02e803aa9b139eb6a2b95ab0f057a675d2b7e0b4). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DummySuite extends SparkFunSuite ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487807842 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487807845 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105018/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
SparkQA removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487807550 **[Test build #105018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105018/testReport)** for PR 24490 at commit [`02e803a`](https://github.com/apache/spark/commit/02e803aa9b139eb6a2b95ab0f057a675d2b7e0b4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487807842 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
SparkQA commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487807550 **[Test build #105018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105018/testReport)** for PR 24490 at commit [`02e803a`](https://github.com/apache/spark/commit/02e803aa9b139eb6a2b95ab0f057a675d2b7e0b4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487807369 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10312/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487807365 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487807369 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10312/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487807365 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #24491: [SPARK-26936][MINOR][FOLLOWUP] Don't need the JobConf anymore, it seems
dongjoon-hyun closed pull request #24491: [SPARK-26936][MINOR][FOLLOWUP] Don't need the JobConf anymore, it seems URL: https://github.com/apache/spark/pull/24491 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
AmplabJenkins removed a comment on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487740127 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies
dongjoon-hyun commented on issue #24490: [SPARK-27300][GRAPH] Add Spark Graph modules and dependencies URL: https://github.com/apache/spark/pull/24490#issuecomment-487806911 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org