[GitHub] [spark] HeartSaVioR commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
HeartSaVioR commented on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-735051667 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
SparkQA removed a comment on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-735033751 **[Test build #131891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131891/testReport)** for PR 27649 at commit [`6406e36`](https://github.com/apache/spark/commit/6406e36eb34377983aaf113495ca16b1553317a3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
SparkQA removed a comment on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-735050130 **[Test build #131896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131896/testReport)** for PR 29966 at commit [`7f878c2`](https://github.com/apache/spark/commit/7f878c2f675470fbeae12bdce2bfaf26971a16f4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AmplabJenkins commented on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-735050901 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
AmplabJenkins commented on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-735050607 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AmplabJenkins commented on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-735050580 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
SparkQA commented on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-735050577 **[Test build #131896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131896/testReport)** for PR 29966 at commit [`7f878c2`](https://github.com/apache/spark/commit/7f878c2f675470fbeae12bdce2bfaf26971a16f4). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
SparkQA commented on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-735050459 **[Test build #131891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131891/testReport)** for PR 27649 at commit [`6406e36`](https://github.com/apache/spark/commit/6406e36eb34377983aaf113495ca16b1553317a3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
SparkQA commented on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-735050130 **[Test build #131896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131896/testReport)** for PR 29966 at commit [`7f878c2`](https://github.com/apache/spark/commit/7f878c2f675470fbeae12bdce2bfaf26971a16f4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite
AmplabJenkins removed a comment on pull request #30525: URL: https://github.com/apache/spark/pull/30525#issuecomment-735049891 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
AmplabJenkins removed a comment on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-735049890 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
AmplabJenkins commented on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-735049890 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite
AmplabJenkins commented on pull request #30525: URL: https://github.com/apache/spark/pull/30525#issuecomment-735049891 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu edited a comment on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AngersZh edited a comment on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-735049807 gentle ping @maropu @xkrogen Sorry for my later reply since busy work, have updated to merge code using DependencyUtils, since it's origin access privilege is `deploy` package, but now it's. not only. used in `deploy` so remove to `utils` package. Also add some UT about query format This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AngersZh commented on pull request #29966: URL: https://github.com/apache/spark/pull/29966#issuecomment-735049807 gentle ping @maropu @xkrogen Sorry for my late reply since busy work, have updated to merge code using DependencyUtils, since it's origin access privilege is `deploy` package, but now it's. not only. used in `deploy` so remove to `utils` package This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AngersZh commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r531971599 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging { metadata.toString } + /** + * Download Ivy URIs dependent jars. + * + * @param uri Ivy uri need to be downloaded. + * @return Comma separated string list of URIs of downloaded jars + */ + def resolveMavenDependencies(uri: URI): String = { +val Seq(repositories, ivyRepoPath, ivySettingsPath) = + Seq( +"spark.jars.repositories", +"spark.jars.ivy", +"spark.jars.ivySettings" + ).map(sys.props.get(_).orNull) +// Create the IvySettings, either load from file or build defaults +val ivySettings = Option(ivySettingsPath) match { + case Some(path) => +SparkSubmitUtils.loadIvySettings(path, Option(repositories), Option(ivyRepoPath)) + + case None => +SparkSubmitUtils.buildIvySettings(Option(repositories), Option(ivyRepoPath)) +} +SparkSubmitUtils.resolveMavenCoordinates(uri.getAuthority, ivySettings, + parseExcludeList(uri.getQuery), parseTransitive(uri.getQuery)) + } + + private def parseURLQueryParameter(queryString: String, queryTag: String): Array[String] = { +if (queryString == null || queryString.isEmpty) { + Array.empty[String] +} else { + val mapTokens = queryString.split("&") + assert(mapTokens.forall(_.split("=").length == 2), "Invalid query string: " + queryString) Review comment: Added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
SparkQA removed a comment on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-735033722 **[Test build #131890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131890/testReport)** for PR 28363 at commit [`686fc6d`](https://github.com/apache/spark/commit/686fc6d216c07cdbb7829f690734bdb12309314b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
SparkQA commented on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-735049358 **[Test build #131890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131890/testReport)** for PR 28363 at commit [`686fc6d`](https://github.com/apache/spark/commit/686fc6d216c07cdbb7829f690734bdb12309314b). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite
SparkQA removed a comment on pull request #30525: URL: https://github.com/apache/spark/pull/30525#issuecomment-735041615 **[Test build #131894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131894/testReport)** for PR 30525 at commit [`13bcfe2`](https://github.com/apache/spark/commit/13bcfe2794d84d9f2211625b06b38dc8dc204bd6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite
SparkQA commented on pull request #30525: URL: https://github.com/apache/spark/pull/30525#issuecomment-735049084 **[Test build #131894 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131894/testReport)** for PR 30525 at commit [`13bcfe2`](https://github.com/apache/spark/commit/13bcfe2794d84d9f2211625b06b38dc8dc204bd6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AngersZh commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r531944677 ## File path: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ## @@ -159,6 +161,13 @@ class SessionResourceLoader(session: SparkSession) extends FunctionResourceLoade } } + protected def resolveJars(path: String): List[String] = { +new Path(path).toUri.getScheme match { Review comment: Updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite
AmplabJenkins removed a comment on pull request #30525: URL: https://github.com/apache/spark/pull/30525#issuecomment-735046241 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact
SparkQA removed a comment on pull request #30524: URL: https://github.com/apache/spark/pull/30524#issuecomment-735038674 **[Test build #131893 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131893/testReport)** for PR 30524 at commit [`788ff08`](https://github.com/apache/spark/commit/788ff080d5b66c5aaa709b1f578afeee463cb89e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact
AmplabJenkins removed a comment on pull request #30524: URL: https://github.com/apache/spark/pull/30524#issuecomment-735046501 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AngersZh commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r531943937 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging { metadata.toString } + /** + * Download Ivy URIs dependent jars. + * + * @param uri Ivy uri need to be downloaded. + * @return Comma separated string list of URIs of downloaded jars + */ + def resolveMavenDependencies(uri: URI): String = { +val Seq(repositories, ivyRepoPath, ivySettingsPath) = + Seq( +"spark.jars.repositories", +"spark.jars.ivy", +"spark.jars.ivySettings" + ).map(sys.props.get(_).orNull) +// Create the IvySettings, either load from file or build defaults +val ivySettings = Option(ivySettingsPath) match { + case Some(path) => +SparkSubmitUtils.loadIvySettings(path, Option(repositories), Option(ivyRepoPath)) + + case None => +SparkSubmitUtils.buildIvySettings(Option(repositories), Option(ivyRepoPath)) +} +SparkSubmitUtils.resolveMavenCoordinates(uri.getAuthority, ivySettings, + parseExcludeList(uri.getQuery), parseTransitive(uri.getQuery)) + } + + private def parseURLQueryParameter(queryString: String, queryTag: String): Array[String] = { +if (queryString == null || queryString.isEmpty) { + Array.empty[String] +} else { + val mapTokens = queryString.split("&") + assert(mapTokens.forall(_.split("=").length == 2), "Invalid query string: " + queryString) + mapTokens.map(_.split("=")).map(kv => (kv(0), kv(1))).filter(_._1 == queryTag).map(_._2) +} + } + + /** + * Parse excluded list in ivy URL. When download ivy URL jar, Spark won't download transitive jar + * in excluded list. + * + * @param queryString Ivy URI query part string. + * @return Exclude list which contains grape parameters of exclude. + * Example: Input: exclude=org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http + * Output: [org.mortbay.jetty:jetty, org.eclipse.jetty:jetty-http] + */ + private def parseExcludeList(queryString: String): Array[String] = { +parseURLQueryParameter(queryString, "exclude") + .flatMap { excludeString => +val excludes: Array[String] = excludeString.split(",") +assert(excludes.forall(_.split(":").length == 2), + "Invalid exclude string: expected 'org:module,org:module,..', found " + excludeString) +excludes + } + } + + /** + * Parse transitive parameter in ivy URL, default value is false. + * + * @param queryString Ivy URI query part string. + * @return Exclude list which contains grape parameters of transitive. + * Example: Input: exclude=org.mortbay.jetty:jetty=true + * Output: true + */ + private def parseTransitive(queryString: String): Boolean = { +val transitive = parseURLQueryParameter(queryString, "transitive") +if (transitive.isEmpty) { + false +} else { + transitive.last.toBoolean Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite
SparkQA commented on pull request #30515: URL: https://github.com/apache/spark/pull/30515#issuecomment-735046829 **[Test build #131895 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131895/testReport)** for PR 30515 at commit [`ee1c976`](https://github.com/apache/spark/commit/ee1c976f2002ce004a132322722251ce7345b55d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AngersZh commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r531939805 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging { metadata.toString } + /** + * Download Ivy URIs dependent jars. + * + * @param uri Ivy uri need to be downloaded. + * @return Comma separated string list of URIs of downloaded jars + */ + def resolveMavenDependencies(uri: URI): String = { +val Seq(repositories, ivyRepoPath, ivySettingsPath) = + Seq( +"spark.jars.repositories", +"spark.jars.ivy", +"spark.jars.ivySettings" + ).map(sys.props.get(_).orNull) +// Create the IvySettings, either load from file or build defaults +val ivySettings = Option(ivySettingsPath) match { + case Some(path) => +SparkSubmitUtils.loadIvySettings(path, Option(repositories), Option(ivyRepoPath)) Review comment: > Some of this logic is duplicated from `DependencyUtils.resolveMavenDependencies`, seems it will be better if we can unify? `DependencyUtils` seems like a more suitable place for this logic anyway. > > With the addition of `Utils.resolveMavenDependencies` we have three identically-named methods in 3 different utility classes... I worry this will become very confusing. Better to consolidate. Yea, we can merge it and updated now, how about current change This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi commented on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite
gaborgsomogyi commented on pull request #30515: URL: https://github.com/apache/spark/pull/30515#issuecomment-735046601 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact
AmplabJenkins commented on pull request #30524: URL: https://github.com/apache/spark/pull/30524#issuecomment-735046501 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact
SparkQA commented on pull request #30524: URL: https://github.com/apache/spark/pull/30524#issuecomment-735046405 **[Test build #131893 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131893/testReport)** for PR 30524 at commit [`788ff08`](https://github.com/apache/spark/commit/788ff080d5b66c5aaa709b1f578afeee463cb89e). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite
AmplabJenkins commented on pull request #30525: URL: https://github.com/apache/spark/pull/30525#issuecomment-735046241 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite
AmplabJenkins removed a comment on pull request #30515: URL: https://github.com/apache/spark/pull/30515#issuecomment-735045817 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite
AmplabJenkins commented on pull request #30515: URL: https://github.com/apache/spark/pull/30515#issuecomment-735045817 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact
AmplabJenkins removed a comment on pull request #30524: URL: https://github.com/apache/spark/pull/30524#issuecomment-735045648 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite
SparkQA removed a comment on pull request #30515: URL: https://github.com/apache/spark/pull/30515#issuecomment-735038417 **[Test build #131892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131892/testReport)** for PR 30515 at commit [`ee1c976`](https://github.com/apache/spark/commit/ee1c976f2002ce004a132322722251ce7345b55d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite
AmplabJenkins removed a comment on pull request #30515: URL: https://github.com/apache/spark/pull/30515#issuecomment-735045649 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite
SparkQA commented on pull request #30515: URL: https://github.com/apache/spark/pull/30515#issuecomment-735045721 **[Test build #131892 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131892/testReport)** for PR 30515 at commit [`ee1c976`](https://github.com/apache/spark/commit/ee1c976f2002ce004a132322722251ce7345b55d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact
AmplabJenkins commented on pull request #30524: URL: https://github.com/apache/spark/pull/30524#issuecomment-735045648 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite
AmplabJenkins commented on pull request #30515: URL: https://github.com/apache/spark/pull/30515#issuecomment-735045649 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AngersZh commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r531921325 ## File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ## @@ -1348,6 +1348,7 @@ private[spark] object SparkSubmitUtils { coordinates: String, ivySettings: IvySettings, exclusions: Seq[String] = Nil, + transitive: Boolean = true, Review comment: Done and `transitive` moved to front of `exclusions` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on pull request #29066: [SPARK-23889][SQL] DataSourceV2: required sorting and clustering for writes
aokolnychyi commented on pull request #29066: URL: https://github.com/apache/spark/pull/29066#issuecomment-735044798 also cc @dbtsai @dongjoon-hyun, it would be great to get your input on this one after the holidays. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on pull request #29066: [SPARK-23889][SQL] DataSourceV2: required sorting and clustering for writes
aokolnychyi commented on pull request #29066: URL: https://github.com/apache/spark/pull/29066#issuecomment-735044653 I also have a prototype for this logic in micro-batch streaming. I added dedicated plans which I think we were missing for a while. Right now, `MicroBatchExecution` replaces `WriteToMicroBatchDataSource` with `WriteToDataSourceV2`, a deprecated node, in `runBatch`. I don't think we use that path anywhere except tests but it would be great to get that done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AngersZh commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r531917374 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging { metadata.toString } + /** + * Download Ivy URIs dependent jars. + * + * @param uri Ivy uri need to be downloaded. + * @return Comma separated string list of URIs of downloaded jars Review comment: > Should we be returning a `List[String]` instead of `String` (here and in `SparkSubmitUtils`)? It seems odd to have `SparkSubmitUtils` do a `mkString` to convert a list to string, then re-convert back to a list later. It's a nice suggestion。 Here return `String` since `SparkSubmit.resolveMavenCoordinates` return `String` if we change these method will change a lot and it's not related to current pr, how about start a new pr after or before current pr to refactor these code about return `Array[String]`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API
viirya commented on a change in pull request #30521: URL: https://github.com/apache/spark/pull/30521#discussion_r531918276 ## File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ## @@ -304,46 +308,68 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) { * @since 3.1.0 */ @throws[TimeoutException] - def saveAsTable(tableName: String): StreamingQuery = { -this.source = SOURCE_NAME_TABLE + def table(tableName: String): StreamingQuery = { this.tableName = tableName -startInternal(None) - } - private def startInternal(path: Option[String]): StreamingQuery = { -if (source.toLowerCase(Locale.ROOT) == DDLUtils.HIVE_PROVIDER) { - throw new AnalysisException("Hive data source can only be used with tables, you can not " + -"write files of Hive data source directly.") -} +import df.sparkSession.sessionState.analyzer.CatalogAndIdentifier -if (source == SOURCE_NAME_TABLE) { - assertNotPartitioned(SOURCE_NAME_TABLE) +import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ +val originalMultipartIdentifier = df.sparkSession.sessionState.sqlParser + .parseMultipartIdentifier(tableName) +val CatalogAndIdentifier(catalog, identifier) = originalMultipartIdentifier - import df.sparkSession.sessionState.analyzer.CatalogAndIdentifier +// Currently we don't create a logical streaming writer node in logical plan, so cannot rely +// on analyzer to resolve it. Directly lookup only for temp view to provide clearer message. +// TODO (SPARK-27484): we should add the writing node before the plan is analyzed. +if (df.sparkSession.sessionState.catalog.isTempView(originalMultipartIdentifier)) { + throw new AnalysisException(s"Temporary view $tableName doesn't support streaming write") +} +if (!catalog.asTableCatalog.tableExists(identifier)) { import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ - val originalMultipartIdentifier = df.sparkSession.sessionState.sqlParser -.parseMultipartIdentifier(tableName) - val CatalogAndIdentifier(catalog, identifier) = originalMultipartIdentifier - - // Currently we don't create a logical streaming writer node in logical plan, so cannot rely - // on analyzer to resolve it. Directly lookup only for temp view to provide clearer message. - // TODO (SPARK-27484): we should add the writing node before the plan is analyzed. - if (df.sparkSession.sessionState.catalog.isTempView(originalMultipartIdentifier)) { -throw new AnalysisException(s"Temporary view $tableName doesn't support streaming write") - } + val cmd = CreateTableStatement( +originalMultipartIdentifier, +df.schema.asNullable, +partitioningColumns.getOrElse(Nil).asTransforms.toSeq, +None, +Map.empty[String, String], +Some(source), +Map("createBy" -> "DataStreamWriterAPI"), Review comment: What do you plan to do with this special `createBy` option? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path
AngersZh commented on a change in pull request #29966: URL: https://github.com/apache/spark/pull/29966#discussion_r531917374 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging { metadata.toString } + /** + * Download Ivy URIs dependent jars. + * + * @param uri Ivy uri need to be downloaded. + * @return Comma separated string list of URIs of downloaded jars Review comment: > Should we be returning a `List[String]` instead of `String` (here and in `SparkSubmitUtils`)? It seems odd to have `SparkSubmitUtils` do a `mkString` to convert a list to string, then re-convert back to a list later. Here return `String` since `SparkSubmit.resolveMavenCoordinates` return `String` if we change these method will change a lot and it's not related to current. pr, how about start a new pr after or before current pr to refactor these code about return `Array[String]`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29066: [SPARK-23889][SQL] DataSourceV2: required sorting and clustering for writes
AmplabJenkins removed a comment on pull request #29066: URL: https://github.com/apache/spark/pull/29066#issuecomment-734175289 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on a change in pull request #29066: [SPARK-23889][SQL] DataSourceV2: required sorting and clustering for writes
aokolnychyi commented on a change in pull request #29066: URL: https://github.com/apache/spark/pull/29066#discussion_r531891972 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -189,6 +189,13 @@ abstract class Optimizer(catalogManager: CatalogManager) // plan may contain nodes that do not report stats. Anything that uses stats must run after // this batch. Batch("Early Filter and Projection Push-Down", Once, earlyScanPushDownRules: _*) :+ +// This batch contains rules that should be applied to writes early. For example, +// we have to construct a logical write early so that we can inject needed repartition/sort +// operators to satisfy data source distribution and ordering requirements. +// Expression optimizations must be run before this batch so that we have optimal +// expressions when we construct writes. At the same time, rules that dedup repartition and +// sort operators must by run afterwards. +Batch("Early Writes", Once, earlyWriteRules: _*) :+ Review comment: I think they may be more rules like this in the future, not just writes. I am definitely +1 on making this more flexible. Let me do this in a separate PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite
SparkQA commented on pull request #30525: URL: https://github.com/apache/spark/pull/30525#issuecomment-735041615 **[Test build #131894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131894/testReport)** for PR 30525 at commit [`13bcfe2`](https://github.com/apache/spark/commit/13bcfe2794d84d9f2211625b06b38dc8dc204bd6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum opened a new pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite
wangyum opened a new pull request #30525: URL: https://github.com/apache/spark/pull/30525 ### What changes were proposed in this pull request? This pr refactor HivePartitionFilteringSuite. ### Why are the changes needed? To make it easy to maintain. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
AmplabJenkins removed a comment on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-735037962 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
AmplabJenkins removed a comment on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-735038074 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
AmplabJenkins removed a comment on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-735038424 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact
SparkQA commented on pull request #30524: URL: https://github.com/apache/spark/pull/30524#issuecomment-735038674 **[Test build #131893 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131893/testReport)** for PR 30524 at commit [`788ff08`](https://github.com/apache/spark/commit/788ff080d5b66c5aaa709b1f578afeee463cb89e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact
viirya commented on pull request #30524: URL: https://github.com/apache/spark/pull/30524#issuecomment-735038509 cc @sunchao @HyukjinKwon @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya opened a new pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact
viirya opened a new pull request #30524: URL: https://github.com/apache/spark/pull/30524 ### What changes were proposed in this pull request? This patch proposes to use classifier attribute to construct artifact path instead of type. ### Why are the changes needed? `resolveDependencyPaths` now takes artifact type to decide to add "-tests" postfix. However, the path pattern of ivy in `resolveMavenCoordinates` is `[organization]_[artifact][revision](-[classifier]).[ext]`. We should use classifier instead of type to construct file path. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test. Manual test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
AmplabJenkins commented on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-735038424 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite
SparkQA commented on pull request #30515: URL: https://github.com/apache/spark/pull/30515#issuecomment-735038417 **[Test build #131892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131892/testReport)** for PR 30515 at commit [`ee1c976`](https://github.com/apache/spark/commit/ee1c976f2002ce004a132322722251ce7345b55d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on a change in pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite
sarutak commented on a change in pull request #30515: URL: https://github.com/apache/spark/pull/30515#discussion_r531848189 ## File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala ## @@ -24,15 +24,21 @@ import com.spotify.docker.client.messages.{ContainerConfig, HostConfig} import org.apache.spark.sql.execution.datasources.jdbc.connection.SecureConnectionProvider import org.apache.spark.tags.DockerTest +/** + * To run this test suite for a specific version (e.g., mariadb:10.5.8): + * {{{ + * MARIADB_DOCKER_IMAGE_NAME=mariadb:10.5.8 + * ./build/sbt -Pdocker-integration-tests + * "testOnly org.apache.spark.sql.jdbc.MariaDBKrb2IntegrationSuite" Review comment: Oops. I'll fix it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
AmplabJenkins commented on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-735038074 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
AmplabJenkins commented on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-735037962 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API
HeartSaVioR edited a comment on pull request #30521: URL: https://github.com/apache/spark/pull/30521#issuecomment-735034126 In streaming query we only do append - there's no other options for creating table if we handle it. I don't think it's a difficult requirement for end users to create table in prior, hence I'd in favor of dealing with existing table only. That's also why I'm actually in favor of `insertIntoTable` instead of `saveAsTable`. Furthermore, I see we're still putting lots of efforts in V1 table (most likely file (streaming) sink), instead of finding the reason we can't migrate file (streaming) sink to V2 and resolving it. (Probably #29066 would help unblocking it?) I roughly remember we said external data sources leveraging streaming V1 sink is not a support range, and V1 sink lacks the functionalities tied with the output mode - you are not even able to do truncate even the mode is "complete". I'm not sure this is a good direction. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
HeartSaVioR edited a comment on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-735034868 cc. @tdas @zsxwing @gaborgsomogyi @viirya @xuanyuanking Just a final reminder. I'll merge this in early next week if there's no further comments, according to the feedback from dev@ mailing list. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
HeartSaVioR edited a comment on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-735034882 cc. @tdas @zsxwing @gaborgsomogyi @viirya @xuanyuanking Just a final reminder. I'll merge this in early next week if there's no further comments, according to the feedback from dev@ mailing list. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
HeartSaVioR edited a comment on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-735034882 cc. @tdas @zsxwing @gaborgsomogyi @viirya @xuanyuanking Just a final reminder. I'll merge this in early next week according to the feedback from dev@ mailing list, if there's no further comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
HeartSaVioR commented on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-735034882 cc. @tdas @zsxwing @gaborgsomogyi @viirya @xuanyuanking Just a final reminder. I'll merge this in early next week according to the feedback from dev@ mailing list. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
HeartSaVioR commented on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-735034868 cc. @tdas @zsxwing @gaborgsomogyi @viirya @xuanyuanking Just a final reminder. I'll merge this in early next week according to the feedback from dev@ mailing list. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API
HeartSaVioR edited a comment on pull request #30521: URL: https://github.com/apache/spark/pull/30521#issuecomment-735034126 In streaming query we only do append - there's no other options for creating table if we handle it. I don't think it's a difficult requirement for end users to create table in prior, hence I'd in favor of dealing with existing table only. That's also why I'm actually in favor of `insertIntoTable` instead of `saveAsTable`. Furthermore, I see we're still putting lots of efforts in V1 table (most likely file (streaming) sink), instead of finding the reason we can't migrate file (streaming) sink to V2 and resolving it. (Probably #29066 would help unblocking it?) V1 sink lacks the functionalities tied with the output mode - you are not even able to do truncate even the mode is "complete". I'm not sure this is a good direction. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API
HeartSaVioR edited a comment on pull request #30521: URL: https://github.com/apache/spark/pull/30521#issuecomment-735034126 In streaming query we only do append - there's no other options for creating table if we handle it. I don't think it's a difficult requirement for end users to create table in prior, hence I'd in favor of dealing with existing table only. That's also why I'm actually in favor of `insertIntoTable` instead of `saveAsTable`. Furthermore, I see we're still putting lots of efforts in V1 table (most likely file (streaming) sink), instead of finding the reason we can't migrate file (streaming) sink to V2 and resolving it. V1 sink lacks the functionalities tied with the output mode - you are not even able to do truncate even the mode is "complete". I'm not sure this is a good direction. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API
HeartSaVioR edited a comment on pull request #30521: URL: https://github.com/apache/spark/pull/30521#issuecomment-735034126 In streaming query we only do append - there's no other options for creating table if we handle it. I don't think it's a difficult requirement for end users to create table in prior, hence I'd in favor of dealing with existing table only. That's also why I'm actually in favor of `insertIntoTable` instead of `saveAsTable`. (In V1 sink you are not even able to do truncate even the mode is "complete".) Furthermore, I see we're still putting lots of efforts in V1 table (most likely file (streaming) sink), instead of finding the reason we can't migrate file (streaming) sink to V2 and resolving it. I'm not sure this is a good direction. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API
HeartSaVioR edited a comment on pull request #30521: URL: https://github.com/apache/spark/pull/30521#issuecomment-735034126 In streaming query we only do append. There's no overwrite, truncate, partition based operations, etc. I don't think it's a difficult requirement for end users to create table in prior, hence I'd in favor of dealing with existing table only. That's also why I'm actually in favor of `insertIntoTable` instead of `saveAsTable`. Furthermore, I see we're still putting lots of efforts in V1 table (most likely file (streaming) sink), instead of finding the reason we can't migrate file (streaming) sink to V2 and resolving it. I'm not sure this is a good direction. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API
HeartSaVioR commented on pull request #30521: URL: https://github.com/apache/spark/pull/30521#issuecomment-735034126 In streaming query we only do append. There's no overwrite, truncate, partition based operations, etc. I don't think it's a difficult requirement for end users to create table in prior, hence I'd in favor of dealing with existing table only. That's also why I'm actually in favor of `insertIntoTable` instead of `saveAsTable`. Furthermore, I see we're still putting lots of efforts in V1 table (most likely file (streaming) sink), instead of finding the reason we can't migrate file (streaming) sink to V2. I'm not sure this is a good direction. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
SparkQA commented on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-735033751 **[Test build #131891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131891/testReport)** for PR 27649 at commit [`6406e36`](https://github.com/apache/spark/commit/6406e36eb34377983aaf113495ca16b1553317a3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
SparkQA commented on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-735033722 **[Test build #131890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131890/testReport)** for PR 28363 at commit [`686fc6d`](https://github.com/apache/spark/commit/686fc6d216c07cdbb7829f690734bdb12309314b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
SparkQA commented on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-735033393 **[Test build #131889 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131889/testReport)** for PR 29893 at commit [`17d0564`](https://github.com/apache/spark/commit/17d056467751785664b6fd5c89c602f7c3e07e94). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API
HeartSaVioR commented on a change in pull request #30521: URL: https://github.com/apache/spark/pull/30521#discussion_r531826778 ## File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ## @@ -304,46 +308,68 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) { * @since 3.1.0 */ @throws[TimeoutException] - def saveAsTable(tableName: String): StreamingQuery = { -this.source = SOURCE_NAME_TABLE + def table(tableName: String): StreamingQuery = { Review comment: Probably this could be the another opportunity we can revisit #29767 - I think DataFrameWriter and DataStreamWriter is not same, and it's a bit odd to make DataStreamWriter fit to DataFrameWriter. The former has lots of methods and some methods trigger action. The latter only allowed (before #29767) `start` method to trigger action. #29767 broke this and let two methods trigger action. (My initial proposal kept `start` method to only trigger action, as you remember.) Once we revisit this I think we should revisit this as well. Thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API
HeartSaVioR commented on a change in pull request #30521: URL: https://github.com/apache/spark/pull/30521#discussion_r531826778 ## File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ## @@ -304,46 +308,68 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) { * @since 3.1.0 */ @throws[TimeoutException] - def saveAsTable(tableName: String): StreamingQuery = { -this.source = SOURCE_NAME_TABLE + def table(tableName: String): StreamingQuery = { Review comment: Probably this could be the another opportunity we can revisit #29767 - I think DataFrameWriter and DataStreamWriter is not same, and it's a bit odd to make DataStreamWriter fit to DataFrameWriter. The former has lots of methods and some methods trigger action. The latter only allowed (before #29767) `start` method to trigger action. #29767 broke this and let two methods trigger action. Once we revisit this I think we should revisit this as well. Thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API
HeartSaVioR commented on a change in pull request #30521: URL: https://github.com/apache/spark/pull/30521#discussion_r531825312 ## File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ## @@ -304,46 +308,68 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) { * @since 3.1.0 */ @throws[TimeoutException] - def saveAsTable(tableName: String): StreamingQuery = { -this.source = SOURCE_NAME_TABLE + def table(tableName: String): StreamingQuery = { Review comment: Did you read all comments there? The name `table` doesn't provide any meaning of "action". It's more natural to understand `table` as syntax sugar of `format("table")` as we provided for `foreach`. I'm -1 to make the change. If we want to rename, it should bring back my initial proposal, letting `table` as semantically same as `format("table")` with table name. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
HeartSaVioR commented on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-735030374 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
HeartSaVioR commented on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-735030266 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
yaooqinn commented on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-735030004 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join
HeartSaVioR commented on pull request #30395: URL: https://github.com/apache/spark/pull/30395#issuecomment-735028698 FYI, during reviewing I found confusing method names: `setupWindowedJoinWithRangeCondition` / `setupWindowedSelfJoin`. We should remove `Windowed` there, but not from this PR as these methods exist. @c21 Would you mind submitting a new MINOR PR for this? Otherwise I'll do it instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #30395: [SPARK-32863][SS] Full outer stream-stream join
HeartSaVioR commented on a change in pull request #30395: URL: https://github.com/apache/spark/pull/30395#discussion_r531818134 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala ## @@ -411,11 +411,12 @@ class UnsupportedOperationsSuite extends SparkFunSuite with SQLHelper { // Full outer joins: only batch-batch is allowed Review comment: This comment should be no longer valid - you may want to elaborate here for the reason of `streamStreamSupported = false`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gliptak commented on pull request #26194: [SPARK-29536][PYTHON] Upgrade cloudpickle to 1.1.1 to support Python 3.8
gliptak commented on pull request #26194: URL: https://github.com/apache/spark/pull/26194#issuecomment-735027828 https://github.com/capitalone/datacompy/pull/88 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
SparkQA removed a comment on pull request #30403: URL: https://github.com/apache/spark/pull/30403#issuecomment-734994326 **[Test build #131888 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131888/testReport)** for PR 30403 at commit [`7e788ce`](https://github.com/apache/spark/commit/7e788cea5c2dd03f71ee30b0106e04d7f036f30f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
AmplabJenkins commented on pull request #30403: URL: https://github.com/apache/spark/pull/30403#issuecomment-735024962 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
SparkQA commented on pull request #30403: URL: https://github.com/apache/spark/pull/30403#issuecomment-735024762 **[Test build #131888 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131888/testReport)** for PR 30403 at commit [`7e788ce`](https://github.com/apache/spark/commit/7e788cea5c2dd03f71ee30b0106e04d7f036f30f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #29066: [SPARK-23889][SQL] DataSourceV2: required sorting and clustering for writes
HeartSaVioR commented on pull request #29066: URL: https://github.com/apache/spark/pull/29066#issuecomment-735021901 Great to see this making progress. It'd be nice if we include this in Spark 3.1.0, so that custom V2 writers could leverage this instead of sticking with V1 writer. (Personally I think it worths to ensure including this in 3.1.0. This shouldn't drag any more, discussion happened years ago, and V2 writer also exists years ago and has been lacking such essential functionality.) I tried to take a look when the PR was WIP, but I felt I'm not quite qualified to review and finally approve this. I'll try taking a look soon, but it'd be nice if some others can review this. cc. @cloud-fan @viirya Would you mind taking a look at this to make this in 3.1.0? Thanks in advance. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #29729: [SPARK-32032][SS] Avoid infinite wait in driver because of KafkaConsumer.poll(long) API
HeartSaVioR commented on pull request #29729: URL: https://github.com/apache/spark/pull/29729#issuecomment-735020039 I see there's no major change in both `KafkaOffsetReaderConsumer` and `KafkaOffsetReaderAdmin`. For this case I'd be OK to refactor here as well, though I'd just change `KafkaOffserReader` to trait and have a companion object `KafkaOffsetReaderConsumer` which provides implementation depending on the config. We no longer need to require wrapper then. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #29446: [WIP][SPARK-32628][SQL] Use bloom filter to improve dynamicPartitionPruning
github-actions[bot] closed pull request #29446: URL: https://github.com/apache/spark/pull/29446 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #29413: [SPARK-32597][CORE] Tune Event Drop in Async Event Queue
github-actions[bot] closed pull request #29413: URL: https://github.com/apache/spark/pull/29413 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30522: [SPARK-33578][CORE] enableHiveSupport is invalid after sparkContext t…
HyukjinKwon commented on a change in pull request #30522: URL: https://github.com/apache/spark/pull/30522#discussion_r531811066 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -710,6 +710,11 @@ class SparkContext(config: SparkConf) extends Logging { } } + /** Set spark conf */ + def setSparkConf(sparkConf: SparkConf): Unit = { Review comment: Spark config is supposed to be immutable. I don't think we should allow this in Spark context. If you should set a static SQL config, you should either stop and start the context again or set it initially when you create a Spark context. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30522: [SPARK-33578][CORE] enableHiveSupport is invalid after sparkContext t…
HyukjinKwon commented on a change in pull request #30522: URL: https://github.com/apache/spark/pull/30522#discussion_r531811066 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -710,6 +710,11 @@ class SparkContext(config: SparkConf) extends Logging { } } + /** Set spark conf */ + def setSparkConf(sparkConf: SparkConf): Unit = { Review comment: Spark cknfig is supposed to be immutable. I don't think we should allow this in Spark context. If you should set a static SQL config, you should either stop and start the context again or set it initially when you create a Spark context. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
AmplabJenkins removed a comment on pull request #30403: URL: https://github.com/apache/spark/pull/30403#issuecomment-735009844 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
AmplabJenkins commented on pull request #30403: URL: https://github.com/apache/spark/pull/30403#issuecomment-735009844 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite
maropu commented on a change in pull request #30515: URL: https://github.com/apache/spark/pull/30515#discussion_r531805474 ## File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala ## @@ -24,15 +24,21 @@ import com.spotify.docker.client.messages.{ContainerConfig, HostConfig} import org.apache.spark.sql.execution.datasources.jdbc.connection.SecureConnectionProvider import org.apache.spark.tags.DockerTest +/** + * To run this test suite for a specific version (e.g., mariadb:10.5.8): + * {{{ + * MARIADB_DOCKER_IMAGE_NAME=mariadb:10.5.8 + * ./build/sbt -Pdocker-integration-tests + * "testOnly org.apache.spark.sql.jdbc.MariaDBKrb2IntegrationSuite" Review comment: wrong class name: `MariaDBKrb2IntegrationSuite ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30412: [SPARK-33480][SQL] Support char/varchar type
AmplabJenkins removed a comment on pull request #30412: URL: https://github.com/apache/spark/pull/30412#issuecomment-734930827 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables
SparkQA commented on pull request #30403: URL: https://github.com/apache/spark/pull/30403#issuecomment-734994326 **[Test build #131888 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131888/testReport)** for PR 30403 at commit [`7e788ce`](https://github.com/apache/spark/commit/7e788cea5c2dd03f71ee30b0106e04d7f036f30f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30430: [SPARK-33503][SQL] Refactor SortOrder class to allow multiple childrens
AmplabJenkins removed a comment on pull request #30430: URL: https://github.com/apache/spark/pull/30430#issuecomment-734957534 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30430: [SPARK-33503][SQL] Refactor SortOrder class to allow multiple childrens
AmplabJenkins commented on pull request #30430: URL: https://github.com/apache/spark/pull/30430#issuecomment-734957534 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30430: [SPARK-33503][SQL] Refactor SortOrder class to allow multiple childrens
SparkQA removed a comment on pull request #30430: URL: https://github.com/apache/spark/pull/30430#issuecomment-734880395 **[Test build #131886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131886/testReport)** for PR 30430 at commit [`815396e`](https://github.com/apache/spark/commit/815396e21bbe5f371c7603a46a8a59c764d6cefd). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org