[GitHub] [spark] HeartSaVioR commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-11-27 Thread GitBox


HeartSaVioR commented on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-735051667


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-11-27 Thread GitBox


SparkQA removed a comment on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-735033751


   **[Test build #131891 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131891/testReport)**
 for PR 27649 at commit 
[`6406e36`](https://github.com/apache/spark/commit/6406e36eb34377983aaf113495ca16b1553317a3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-27 Thread GitBox


SparkQA removed a comment on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-735050130


   **[Test build #131896 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131896/testReport)**
 for PR 29966 at commit 
[`7f878c2`](https://github.com/apache/spark/commit/7f878c2f675470fbeae12bdce2bfaf26971a16f4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-735050901







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-735050607







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-735050580







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-27 Thread GitBox


SparkQA commented on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-735050577


   **[Test build #131896 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131896/testReport)**
 for PR 29966 at commit 
[`7f878c2`](https://github.com/apache/spark/commit/7f878c2f675470fbeae12bdce2bfaf26971a16f4).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-11-27 Thread GitBox


SparkQA commented on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-735050459


   **[Test build #131891 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131891/testReport)**
 for PR 27649 at commit 
[`6406e36`](https://github.com/apache/spark/commit/6406e36eb34377983aaf113495ca16b1553317a3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-27 Thread GitBox


SparkQA commented on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-735050130


   **[Test build #131896 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131896/testReport)**
 for PR 29966 at commit 
[`7f878c2`](https://github.com/apache/spark/commit/7f878c2f675470fbeae12bdce2bfaf26971a16f4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite

2020-11-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30525:
URL: https://github.com/apache/spark/pull/30525#issuecomment-735049891







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-11-27 Thread GitBox


AmplabJenkins removed a comment on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-735049890







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-735049890







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #30525:
URL: https://github.com/apache/spark/pull/30525#issuecomment-735049891







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu edited a comment on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-27 Thread GitBox


AngersZh edited a comment on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-735049807


   gentle ping @maropu  @xkrogen Sorry for my later reply since busy work, have 
updated to merge code using DependencyUtils, since it's origin access privilege 
is  `deploy` package, but now it's. not only. used in `deploy`  so remove to 
`utils` package. Also add some UT about query format



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-27 Thread GitBox


AngersZh commented on pull request #29966:
URL: https://github.com/apache/spark/pull/29966#issuecomment-735049807


   gentle ping @maropu  @xkrogen Sorry for my late reply since busy work, have 
updated to merge code using DependencyUtils, since it's origin access privilege 
is  `deploy` package, but now it's. not only. used in `deploy`  so remove to 
`utils` package



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-27 Thread GitBox


AngersZh commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r531971599



##
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##
@@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging {
 metadata.toString
   }
 
+  /**
+   * Download Ivy URIs dependent jars.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Comma separated string list of URIs of downloaded jars
+   */
+  def resolveMavenDependencies(uri: URI): String = {
+val Seq(repositories, ivyRepoPath, ivySettingsPath) =
+  Seq(
+"spark.jars.repositories",
+"spark.jars.ivy",
+"spark.jars.ivySettings"
+  ).map(sys.props.get(_).orNull)
+// Create the IvySettings, either load from file or build defaults
+val ivySettings = Option(ivySettingsPath) match {
+  case Some(path) =>
+SparkSubmitUtils.loadIvySettings(path, Option(repositories), 
Option(ivyRepoPath))
+
+  case None =>
+SparkSubmitUtils.buildIvySettings(Option(repositories), 
Option(ivyRepoPath))
+}
+SparkSubmitUtils.resolveMavenCoordinates(uri.getAuthority, ivySettings,
+  parseExcludeList(uri.getQuery), parseTransitive(uri.getQuery))
+  }
+
+  private def parseURLQueryParameter(queryString: String, queryTag: String): 
Array[String] = {
+if (queryString == null || queryString.isEmpty) {
+  Array.empty[String]
+} else {
+  val mapTokens = queryString.split("&")
+  assert(mapTokens.forall(_.split("=").length == 2), "Invalid query 
string: " + queryString)

Review comment:
   Added





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-11-27 Thread GitBox


SparkQA removed a comment on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-735033722


   **[Test build #131890 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131890/testReport)**
 for PR 28363 at commit 
[`686fc6d`](https://github.com/apache/spark/commit/686fc6d216c07cdbb7829f690734bdb12309314b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-11-27 Thread GitBox


SparkQA commented on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-735049358


   **[Test build #131890 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131890/testReport)**
 for PR 28363 at commit 
[`686fc6d`](https://github.com/apache/spark/commit/686fc6d216c07cdbb7829f690734bdb12309314b).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite

2020-11-27 Thread GitBox


SparkQA removed a comment on pull request #30525:
URL: https://github.com/apache/spark/pull/30525#issuecomment-735041615


   **[Test build #131894 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131894/testReport)**
 for PR 30525 at commit 
[`13bcfe2`](https://github.com/apache/spark/commit/13bcfe2794d84d9f2211625b06b38dc8dc204bd6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite

2020-11-27 Thread GitBox


SparkQA commented on pull request #30525:
URL: https://github.com/apache/spark/pull/30525#issuecomment-735049084


   **[Test build #131894 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131894/testReport)**
 for PR 30525 at commit 
[`13bcfe2`](https://github.com/apache/spark/commit/13bcfe2794d84d9f2211625b06b38dc8dc204bd6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-27 Thread GitBox


AngersZh commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r531944677



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala
##
@@ -159,6 +161,13 @@ class SessionResourceLoader(session: SparkSession) extends 
FunctionResourceLoade
 }
   }
 
+  protected def resolveJars(path: String): List[String] = {
+new Path(path).toUri.getScheme match {

Review comment:
   Updated





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite

2020-11-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30525:
URL: https://github.com/apache/spark/pull/30525#issuecomment-735046241







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact

2020-11-27 Thread GitBox


SparkQA removed a comment on pull request #30524:
URL: https://github.com/apache/spark/pull/30524#issuecomment-735038674


   **[Test build #131893 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131893/testReport)**
 for PR 30524 at commit 
[`788ff08`](https://github.com/apache/spark/commit/788ff080d5b66c5aaa709b1f578afeee463cb89e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact

2020-11-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30524:
URL: https://github.com/apache/spark/pull/30524#issuecomment-735046501







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-27 Thread GitBox


AngersZh commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r531943937



##
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##
@@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging {
 metadata.toString
   }
 
+  /**
+   * Download Ivy URIs dependent jars.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Comma separated string list of URIs of downloaded jars
+   */
+  def resolveMavenDependencies(uri: URI): String = {
+val Seq(repositories, ivyRepoPath, ivySettingsPath) =
+  Seq(
+"spark.jars.repositories",
+"spark.jars.ivy",
+"spark.jars.ivySettings"
+  ).map(sys.props.get(_).orNull)
+// Create the IvySettings, either load from file or build defaults
+val ivySettings = Option(ivySettingsPath) match {
+  case Some(path) =>
+SparkSubmitUtils.loadIvySettings(path, Option(repositories), 
Option(ivyRepoPath))
+
+  case None =>
+SparkSubmitUtils.buildIvySettings(Option(repositories), 
Option(ivyRepoPath))
+}
+SparkSubmitUtils.resolveMavenCoordinates(uri.getAuthority, ivySettings,
+  parseExcludeList(uri.getQuery), parseTransitive(uri.getQuery))
+  }
+
+  private def parseURLQueryParameter(queryString: String, queryTag: String): 
Array[String] = {
+if (queryString == null || queryString.isEmpty) {
+  Array.empty[String]
+} else {
+  val mapTokens = queryString.split("&")
+  assert(mapTokens.forall(_.split("=").length == 2), "Invalid query 
string: " + queryString)
+  mapTokens.map(_.split("=")).map(kv => (kv(0), kv(1))).filter(_._1 == 
queryTag).map(_._2)
+}
+  }
+
+  /**
+   * Parse excluded list in ivy URL. When download ivy URL jar, Spark won't 
download transitive jar
+   * in excluded list.
+   *
+   * @param queryString Ivy URI query part string.
+   * @return Exclude list which contains grape parameters of exclude.
+   * Example: Input:  
exclude=org.mortbay.jetty:jetty,org.eclipse.jetty:jetty-http
+   * Output:  [org.mortbay.jetty:jetty, org.eclipse.jetty:jetty-http]
+   */
+  private def parseExcludeList(queryString: String): Array[String] = {
+parseURLQueryParameter(queryString, "exclude")
+  .flatMap { excludeString =>
+val excludes: Array[String] = excludeString.split(",")
+assert(excludes.forall(_.split(":").length == 2),
+  "Invalid exclude string: expected 'org:module,org:module,..', found 
" + excludeString)
+excludes
+  }
+  }
+
+  /**
+   * Parse transitive parameter in ivy URL, default value is false.
+   *
+   * @param queryString Ivy URI query part string.
+   * @return Exclude list which contains grape parameters of transitive.
+   * Example: Input:  exclude=org.mortbay.jetty:jetty=true
+   * Output:  true
+   */
+  private def parseTransitive(queryString: String): Boolean = {
+val transitive = parseURLQueryParameter(queryString, "transitive")
+if (transitive.isEmpty) {
+  false
+} else {
+  transitive.last.toBoolean

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-27 Thread GitBox


SparkQA commented on pull request #30515:
URL: https://github.com/apache/spark/pull/30515#issuecomment-735046829


   **[Test build #131895 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131895/testReport)**
 for PR 30515 at commit 
[`ee1c976`](https://github.com/apache/spark/commit/ee1c976f2002ce004a132322722251ce7345b55d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-27 Thread GitBox


AngersZh commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r531939805



##
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##
@@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging {
 metadata.toString
   }
 
+  /**
+   * Download Ivy URIs dependent jars.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Comma separated string list of URIs of downloaded jars
+   */
+  def resolveMavenDependencies(uri: URI): String = {
+val Seq(repositories, ivyRepoPath, ivySettingsPath) =
+  Seq(
+"spark.jars.repositories",
+"spark.jars.ivy",
+"spark.jars.ivySettings"
+  ).map(sys.props.get(_).orNull)
+// Create the IvySettings, either load from file or build defaults
+val ivySettings = Option(ivySettingsPath) match {
+  case Some(path) =>
+SparkSubmitUtils.loadIvySettings(path, Option(repositories), 
Option(ivyRepoPath))

Review comment:
   > Some of this logic is duplicated from 
`DependencyUtils.resolveMavenDependencies`, seems it will be better if we can 
unify? `DependencyUtils` seems like a more suitable place for this logic anyway.
   > 
   > With the addition of `Utils.resolveMavenDependencies` we have three 
identically-named methods in 3 different utility classes... I worry this will 
become very confusing. Better to consolidate.
   
   Yea, we can merge it and updated now, how about current change





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gaborgsomogyi commented on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-27 Thread GitBox


gaborgsomogyi commented on pull request #30515:
URL: https://github.com/apache/spark/pull/30515#issuecomment-735046601


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #30524:
URL: https://github.com/apache/spark/pull/30524#issuecomment-735046501







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact

2020-11-27 Thread GitBox


SparkQA commented on pull request #30524:
URL: https://github.com/apache/spark/pull/30524#issuecomment-735046405


   **[Test build #131893 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131893/testReport)**
 for PR 30524 at commit 
[`788ff08`](https://github.com/apache/spark/commit/788ff080d5b66c5aaa709b1f578afeee463cb89e).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #30525:
URL: https://github.com/apache/spark/pull/30525#issuecomment-735046241







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30515:
URL: https://github.com/apache/spark/pull/30515#issuecomment-735045817







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #30515:
URL: https://github.com/apache/spark/pull/30515#issuecomment-735045817







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact

2020-11-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30524:
URL: https://github.com/apache/spark/pull/30524#issuecomment-735045648







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-27 Thread GitBox


SparkQA removed a comment on pull request #30515:
URL: https://github.com/apache/spark/pull/30515#issuecomment-735038417


   **[Test build #131892 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131892/testReport)**
 for PR 30515 at commit 
[`ee1c976`](https://github.com/apache/spark/commit/ee1c976f2002ce004a132322722251ce7345b55d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30515:
URL: https://github.com/apache/spark/pull/30515#issuecomment-735045649







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-27 Thread GitBox


SparkQA commented on pull request #30515:
URL: https://github.com/apache/spark/pull/30515#issuecomment-735045721


   **[Test build #131892 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131892/testReport)**
 for PR 30515 at commit 
[`ee1c976`](https://github.com/apache/spark/commit/ee1c976f2002ce004a132322722251ce7345b55d).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #30524:
URL: https://github.com/apache/spark/pull/30524#issuecomment-735045648







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #30515:
URL: https://github.com/apache/spark/pull/30515#issuecomment-735045649







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-27 Thread GitBox


AngersZh commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r531921325



##
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##
@@ -1348,6 +1348,7 @@ private[spark] object SparkSubmitUtils {
   coordinates: String,
   ivySettings: IvySettings,
   exclusions: Seq[String] = Nil,
+  transitive: Boolean = true,

Review comment:
   Done and `transitive` moved to front of `exclusions`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] aokolnychyi commented on pull request #29066: [SPARK-23889][SQL] DataSourceV2: required sorting and clustering for writes

2020-11-27 Thread GitBox


aokolnychyi commented on pull request #29066:
URL: https://github.com/apache/spark/pull/29066#issuecomment-735044798


   also cc @dbtsai @dongjoon-hyun, it would be great to get your input on this 
one after the holidays.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] aokolnychyi commented on pull request #29066: [SPARK-23889][SQL] DataSourceV2: required sorting and clustering for writes

2020-11-27 Thread GitBox


aokolnychyi commented on pull request #29066:
URL: https://github.com/apache/spark/pull/29066#issuecomment-735044653


   I also have a prototype for this logic in micro-batch streaming. I added 
dedicated plans which I think we were missing for a while. Right now, 
`MicroBatchExecution` replaces `WriteToMicroBatchDataSource` with 
`WriteToDataSourceV2`, a deprecated node, in `runBatch`. I don't think we use 
that path anywhere except tests but it would be great to get that done.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-27 Thread GitBox


AngersZh commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r531917374



##
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##
@@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging {
 metadata.toString
   }
 
+  /**
+   * Download Ivy URIs dependent jars.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Comma separated string list of URIs of downloaded jars

Review comment:
   > Should we be returning a `List[String]` instead of `String` (here and 
in `SparkSubmitUtils`)? It seems odd to have `SparkSubmitUtils` do a `mkString` 
to convert a list to string, then re-convert back to a list later.
   
   It's a nice suggestion。
   Here return `String` since `SparkSubmit.resolveMavenCoordinates` return 
`String` if we change these method will change a lot and it's not related to 
current pr, how about start a new pr after or before current pr to refactor 
these code about return `Array[String]`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

2020-11-27 Thread GitBox


viirya commented on a change in pull request #30521:
URL: https://github.com/apache/spark/pull/30521#discussion_r531918276



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
##
@@ -304,46 +308,68 @@ final class DataStreamWriter[T] private[sql](ds: 
Dataset[T]) {
* @since 3.1.0
*/
   @throws[TimeoutException]
-  def saveAsTable(tableName: String): StreamingQuery = {
-this.source = SOURCE_NAME_TABLE
+  def table(tableName: String): StreamingQuery = {
 this.tableName = tableName
-startInternal(None)
-  }
 
-  private def startInternal(path: Option[String]): StreamingQuery = {
-if (source.toLowerCase(Locale.ROOT) == DDLUtils.HIVE_PROVIDER) {
-  throw new AnalysisException("Hive data source can only be used with 
tables, you can not " +
-"write files of Hive data source directly.")
-}
+import df.sparkSession.sessionState.analyzer.CatalogAndIdentifier
 
-if (source == SOURCE_NAME_TABLE) {
-  assertNotPartitioned(SOURCE_NAME_TABLE)
+import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
+val originalMultipartIdentifier = df.sparkSession.sessionState.sqlParser
+  .parseMultipartIdentifier(tableName)
+val CatalogAndIdentifier(catalog, identifier) = originalMultipartIdentifier
 
-  import df.sparkSession.sessionState.analyzer.CatalogAndIdentifier
+// Currently we don't create a logical streaming writer node in logical 
plan, so cannot rely
+// on analyzer to resolve it. Directly lookup only for temp view to 
provide clearer message.
+// TODO (SPARK-27484): we should add the writing node before the plan is 
analyzed.
+if 
(df.sparkSession.sessionState.catalog.isTempView(originalMultipartIdentifier)) {
+  throw new AnalysisException(s"Temporary view $tableName doesn't support 
streaming write")
+}
 
+if (!catalog.asTableCatalog.tableExists(identifier)) {
   import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
-  val originalMultipartIdentifier = df.sparkSession.sessionState.sqlParser
-.parseMultipartIdentifier(tableName)
-  val CatalogAndIdentifier(catalog, identifier) = 
originalMultipartIdentifier
-
-  // Currently we don't create a logical streaming writer node in logical 
plan, so cannot rely
-  // on analyzer to resolve it. Directly lookup only for temp view to 
provide clearer message.
-  // TODO (SPARK-27484): we should add the writing node before the plan is 
analyzed.
-  if 
(df.sparkSession.sessionState.catalog.isTempView(originalMultipartIdentifier)) {
-throw new AnalysisException(s"Temporary view $tableName doesn't 
support streaming write")
-  }
+  val cmd = CreateTableStatement(
+originalMultipartIdentifier,
+df.schema.asNullable,
+partitioningColumns.getOrElse(Nil).asTransforms.toSeq,
+None,
+Map.empty[String, String],
+Some(source),
+Map("createBy" -> "DataStreamWriterAPI"),

Review comment:
   What do you plan to do with this special `createBy` option?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29966: [SPARK-33084][CORE][SQL] Add jar support ivy path

2020-11-27 Thread GitBox


AngersZh commented on a change in pull request #29966:
URL: https://github.com/apache/spark/pull/29966#discussion_r531917374



##
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##
@@ -2980,6 +2980,77 @@ private[spark] object Utils extends Logging {
 metadata.toString
   }
 
+  /**
+   * Download Ivy URIs dependent jars.
+   *
+   * @param uri Ivy uri need to be downloaded.
+   * @return Comma separated string list of URIs of downloaded jars

Review comment:
   > Should we be returning a `List[String]` instead of `String` (here and 
in `SparkSubmitUtils`)? It seems odd to have `SparkSubmitUtils` do a `mkString` 
to convert a list to string, then re-convert back to a list later.
   
   Here return `String` since `SparkSubmit.resolveMavenCoordinates` return 
`String` if we change these method will change a lot and it's not related to 
current. pr, how about start a new pr after or before current pr to refactor 
these code 
about return `Array[String]`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29066: [SPARK-23889][SQL] DataSourceV2: required sorting and clustering for writes

2020-11-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29066:
URL: https://github.com/apache/spark/pull/29066#issuecomment-734175289







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] aokolnychyi commented on a change in pull request #29066: [SPARK-23889][SQL] DataSourceV2: required sorting and clustering for writes

2020-11-27 Thread GitBox


aokolnychyi commented on a change in pull request #29066:
URL: https://github.com/apache/spark/pull/29066#discussion_r531891972



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##
@@ -189,6 +189,13 @@ abstract class Optimizer(catalogManager: CatalogManager)
 // plan may contain nodes that do not report stats. Anything that uses 
stats must run after
 // this batch.
 Batch("Early Filter and Projection Push-Down", Once, 
earlyScanPushDownRules: _*) :+
+// This batch contains rules that should be applied to writes early. For 
example,
+// we have to construct a logical write early so that we can inject needed 
repartition/sort
+// operators to satisfy data source distribution and ordering requirements.
+// Expression optimizations must be run before this batch so that we have 
optimal
+// expressions when we construct writes. At the same time, rules that 
dedup repartition and
+// sort operators must by run afterwards.
+Batch("Early Writes", Once, earlyWriteRules: _*) :+

Review comment:
   I think they may be more rules like this in the future, not just writes. 
I am definitely +1 on making this more flexible. Let me do this in a separate 
PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite

2020-11-27 Thread GitBox


SparkQA commented on pull request #30525:
URL: https://github.com/apache/spark/pull/30525#issuecomment-735041615


   **[Test build #131894 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131894/testReport)**
 for PR 30525 at commit 
[`13bcfe2`](https://github.com/apache/spark/commit/13bcfe2794d84d9f2211625b06b38dc8dc204bd6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum opened a new pull request #30525: [SPARK-33581][SQL][TEST] Refactor HivePartitionFilteringSuite

2020-11-27 Thread GitBox


wangyum opened a new pull request #30525:
URL: https://github.com/apache/spark/pull/30525


   ### What changes were proposed in this pull request?
   
   This pr refactor HivePartitionFilteringSuite.
   
   ### Why are the changes needed?
   
   To make it easy to maintain.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   N/A
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-11-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-735037962







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-11-27 Thread GitBox


AmplabJenkins removed a comment on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-735038074







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-11-27 Thread GitBox


AmplabJenkins removed a comment on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-735038424







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact

2020-11-27 Thread GitBox


SparkQA commented on pull request #30524:
URL: https://github.com/apache/spark/pull/30524#issuecomment-735038674


   **[Test build #131893 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131893/testReport)**
 for PR 30524 at commit 
[`788ff08`](https://github.com/apache/spark/commit/788ff080d5b66c5aaa709b1f578afeee463cb89e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact

2020-11-27 Thread GitBox


viirya commented on pull request #30524:
URL: https://github.com/apache/spark/pull/30524#issuecomment-735038509


   cc @sunchao @HyukjinKwon @dongjoon-hyun 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya opened a new pull request #30524: [SPARK-33580][CORE] resolveDependencyPaths should use classifier attribute of artifact

2020-11-27 Thread GitBox


viirya opened a new pull request #30524:
URL: https://github.com/apache/spark/pull/30524


   
   
   ### What changes were proposed in this pull request?
   
   
   This patch proposes to use classifier attribute to construct artifact path 
instead of type.
   
   ### Why are the changes needed?
   
   
   `resolveDependencyPaths` now takes artifact type to decide to add "-tests" 
postfix. However, the path pattern of ivy in `resolveMavenCoordinates` is 
`[organization]_[artifact][revision](-[classifier]).[ext]`. We should use 
classifier instead of type to construct file path.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Unit test. Manual test.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-735038424







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-27 Thread GitBox


SparkQA commented on pull request #30515:
URL: https://github.com/apache/spark/pull/30515#issuecomment-735038417


   **[Test build #131892 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131892/testReport)**
 for PR 30515 at commit 
[`ee1c976`](https://github.com/apache/spark/commit/ee1c976f2002ce004a132322722251ce7345b55d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on a change in pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-27 Thread GitBox


sarutak commented on a change in pull request #30515:
URL: https://github.com/apache/spark/pull/30515#discussion_r531848189



##
File path: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala
##
@@ -24,15 +24,21 @@ import com.spotify.docker.client.messages.{ContainerConfig, 
HostConfig}
 import 
org.apache.spark.sql.execution.datasources.jdbc.connection.SecureConnectionProvider
 import org.apache.spark.tags.DockerTest
 
+/**
+ * To run this test suite for a specific version (e.g., mariadb:10.5.8):
+ * {{{
+ *   MARIADB_DOCKER_IMAGE_NAME=mariadb:10.5.8
+ * ./build/sbt -Pdocker-integration-tests
+ * "testOnly org.apache.spark.sql.jdbc.MariaDBKrb2IntegrationSuite"

Review comment:
   Oops. I'll fix it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-735038074







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-735037962







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

2020-11-27 Thread GitBox


HeartSaVioR edited a comment on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-735034126


   In streaming query we only do append - there's no other options for creating 
table if we handle it. I don't think it's a difficult requirement for end users 
to create table in prior, hence I'd in favor of dealing with existing table 
only. That's also why I'm actually in favor of `insertIntoTable` instead of 
`saveAsTable`.
   
   Furthermore, I see we're still putting lots of efforts in V1 table (most 
likely file (streaming) sink), instead of finding the reason we can't migrate 
file (streaming) sink to V2 and resolving it. (Probably #29066 would help 
unblocking it?) I roughly remember we said external data sources leveraging 
streaming V1 sink is not a support range, and V1 sink lacks the functionalities 
tied with the output mode - you are not even able to do truncate even the mode 
is "complete".
   
   I'm not sure this is a good direction.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-11-27 Thread GitBox


HeartSaVioR edited a comment on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-735034868


   cc. @tdas @zsxwing @gaborgsomogyi @viirya @xuanyuanking 
   
   Just a final reminder. I'll merge this in early next week if there's no 
further comments, according to the feedback from dev@ mailing list.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-11-27 Thread GitBox


HeartSaVioR edited a comment on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-735034882


   cc. @tdas @zsxwing @gaborgsomogyi @viirya @xuanyuanking 
   
   Just a final reminder. I'll merge this in early next week if there's no 
further comments, according to the feedback from dev@ mailing list.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-11-27 Thread GitBox


HeartSaVioR edited a comment on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-735034882


   cc. @tdas @zsxwing @gaborgsomogyi @viirya @xuanyuanking 
   
   Just a final reminder. I'll merge this in early next week according to the 
feedback from dev@ mailing list, if there's no further comments.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-11-27 Thread GitBox


HeartSaVioR commented on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-735034882


   cc. @tdas @zsxwing @gaborgsomogyi @viirya @xuanyuanking 
   
   Just a final reminder. I'll merge this in early next week according to the 
feedback from dev@ mailing list.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-11-27 Thread GitBox


HeartSaVioR commented on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-735034868


   cc. @tdas @zsxwing @gaborgsomogyi @viirya @xuanyuanking 
   
   Just a final reminder. I'll merge this in early next week according to the 
feedback from dev@ mailing list.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

2020-11-27 Thread GitBox


HeartSaVioR edited a comment on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-735034126


   In streaming query we only do append - there's no other options for creating 
table if we handle it. I don't think it's a difficult requirement for end users 
to create table in prior, hence I'd in favor of dealing with existing table 
only. That's also why I'm actually in favor of `insertIntoTable` instead of 
`saveAsTable`.
   
   Furthermore, I see we're still putting lots of efforts in V1 table (most 
likely file (streaming) sink), instead of finding the reason we can't migrate 
file (streaming) sink to V2 and resolving it. (Probably #29066 would help 
unblocking it?) V1 sink lacks the functionalities tied with the output mode - 
you are not even able to do truncate even the mode is "complete".
   
   I'm not sure this is a good direction.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

2020-11-27 Thread GitBox


HeartSaVioR edited a comment on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-735034126


   In streaming query we only do append - there's no other options for creating 
table if we handle it. I don't think it's a difficult requirement for end users 
to create table in prior, hence I'd in favor of dealing with existing table 
only. That's also why I'm actually in favor of `insertIntoTable` instead of 
`saveAsTable`.
   
   Furthermore, I see we're still putting lots of efforts in V1 table (most 
likely file (streaming) sink), instead of finding the reason we can't migrate 
file (streaming) sink to V2 and resolving it. V1 sink lacks the functionalities 
tied with the output mode - you are not even able to do truncate even the mode 
is "complete".
   
   I'm not sure this is a good direction.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

2020-11-27 Thread GitBox


HeartSaVioR edited a comment on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-735034126


   In streaming query we only do append - there's no other options for creating 
table if we handle it. I don't think it's a difficult requirement for end users 
to create table in prior, hence I'd in favor of dealing with existing table 
only. That's also why I'm actually in favor of `insertIntoTable` instead of 
`saveAsTable`. (In V1 sink you are not even able to do truncate even the mode 
is "complete".)
   
   Furthermore, I see we're still putting lots of efforts in V1 table (most 
likely file (streaming) sink), instead of finding the reason we can't migrate 
file (streaming) sink to V2 and resolving it. I'm not sure this is a good 
direction.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

2020-11-27 Thread GitBox


HeartSaVioR edited a comment on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-735034126


   In streaming query we only do append. There's no overwrite, truncate, 
partition based operations, etc. I don't think it's a difficult requirement for 
end users to create table in prior, hence I'd in favor of dealing with existing 
table only. That's also why I'm actually in favor of `insertIntoTable` instead 
of `saveAsTable`.
   
   Furthermore, I see we're still putting lots of efforts in V1 table (most 
likely file (streaming) sink), instead of finding the reason we can't migrate 
file (streaming) sink to V2 and resolving it. I'm not sure this is a good 
direction.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

2020-11-27 Thread GitBox


HeartSaVioR commented on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-735034126


   In streaming query we only do append. There's no overwrite, truncate, 
partition based operations, etc. I don't think it's a difficult requirement for 
end users to create table in prior, hence I'd in favor of dealing with existing 
table only. That's also why I'm actually in favor of `insertIntoTable` instead 
of `saveAsTable`.
   
   Furthermore, I see we're still putting lots of efforts in V1 table (most 
likely file (streaming) sink), instead of finding the reason we can't migrate 
file (streaming) sink to V2. I'm not sure this is a good direction.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-11-27 Thread GitBox


SparkQA commented on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-735033751


   **[Test build #131891 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131891/testReport)**
 for PR 27649 at commit 
[`6406e36`](https://github.com/apache/spark/commit/6406e36eb34377983aaf113495ca16b1553317a3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-11-27 Thread GitBox


SparkQA commented on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-735033722


   **[Test build #131890 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131890/testReport)**
 for PR 28363 at commit 
[`686fc6d`](https://github.com/apache/spark/commit/686fc6d216c07cdbb7829f690734bdb12309314b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-11-27 Thread GitBox


SparkQA commented on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-735033393


   **[Test build #131889 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131889/testReport)**
 for PR 29893 at commit 
[`17d0564`](https://github.com/apache/spark/commit/17d056467751785664b6fd5c89c602f7c3e07e94).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

2020-11-27 Thread GitBox


HeartSaVioR commented on a change in pull request #30521:
URL: https://github.com/apache/spark/pull/30521#discussion_r531826778



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
##
@@ -304,46 +308,68 @@ final class DataStreamWriter[T] private[sql](ds: 
Dataset[T]) {
* @since 3.1.0
*/
   @throws[TimeoutException]
-  def saveAsTable(tableName: String): StreamingQuery = {
-this.source = SOURCE_NAME_TABLE
+  def table(tableName: String): StreamingQuery = {

Review comment:
   Probably this could be the another opportunity we can revisit #29767 - I 
think DataFrameWriter and DataStreamWriter is not same, and it's a bit odd to 
make DataStreamWriter fit to DataFrameWriter. The former has lots of methods 
and some methods trigger action. The latter only allowed (before #29767) 
`start` method to trigger action. #29767 broke this and let two methods trigger 
action. (My initial proposal kept `start` method to only trigger action, as you 
remember.)
   
   Once we revisit this I think we should revisit this as well. Thoughts?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

2020-11-27 Thread GitBox


HeartSaVioR commented on a change in pull request #30521:
URL: https://github.com/apache/spark/pull/30521#discussion_r531826778



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
##
@@ -304,46 +308,68 @@ final class DataStreamWriter[T] private[sql](ds: 
Dataset[T]) {
* @since 3.1.0
*/
   @throws[TimeoutException]
-  def saveAsTable(tableName: String): StreamingQuery = {
-this.source = SOURCE_NAME_TABLE
+  def table(tableName: String): StreamingQuery = {

Review comment:
   Probably this could be the another opportunity we can revisit #29767 - I 
think DataFrameWriter and DataStreamWriter is not same, and it's a bit odd to 
make DataStreamWriter fit to DataFrameWriter. The former has lots of methods 
and some methods trigger action. The latter only allowed (before #29767) 
`start` method to trigger action. #29767 broke this and let two methods trigger 
action.
   
   Once we revisit this I think we should revisit this as well. Thoughts?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

2020-11-27 Thread GitBox


HeartSaVioR commented on a change in pull request #30521:
URL: https://github.com/apache/spark/pull/30521#discussion_r531825312



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
##
@@ -304,46 +308,68 @@ final class DataStreamWriter[T] private[sql](ds: 
Dataset[T]) {
* @since 3.1.0
*/
   @throws[TimeoutException]
-  def saveAsTable(tableName: String): StreamingQuery = {
-this.source = SOURCE_NAME_TABLE
+  def table(tableName: String): StreamingQuery = {

Review comment:
   Did you read all comments there?
   
   The name `table` doesn't provide any meaning of "action". It's more natural 
to understand `table` as syntax sugar of `format("table")` as we provided for 
`foreach`.
   
   I'm -1 to make the change. If we want to rename, it should bring back my 
initial proposal, letting `table` as semantically same as `format("table")` 
with table name.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-11-27 Thread GitBox


HeartSaVioR commented on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-735030374


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-11-27 Thread GitBox


HeartSaVioR commented on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-735030266


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-11-27 Thread GitBox


yaooqinn commented on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-735030004


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

2020-11-27 Thread GitBox


HeartSaVioR commented on pull request #30395:
URL: https://github.com/apache/spark/pull/30395#issuecomment-735028698


   FYI, during reviewing I found confusing method names: 
`setupWindowedJoinWithRangeCondition` / `setupWindowedSelfJoin`. We should 
remove `Windowed` there, but not from this PR as these methods exist. 
   
   @c21 Would you mind submitting a new MINOR PR for this? Otherwise I'll do it 
instead.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #30395: [SPARK-32863][SS] Full outer stream-stream join

2020-11-27 Thread GitBox


HeartSaVioR commented on a change in pull request #30395:
URL: https://github.com/apache/spark/pull/30395#discussion_r531818134



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
##
@@ -411,11 +411,12 @@ class UnsupportedOperationsSuite extends SparkFunSuite 
with SQLHelper {
 
   // Full outer joins: only batch-batch is allowed

Review comment:
   This comment should be no longer valid - you may want to elaborate here 
for the reason of  `streamStreamSupported = false`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gliptak commented on pull request #26194: [SPARK-29536][PYTHON] Upgrade cloudpickle to 1.1.1 to support Python 3.8

2020-11-27 Thread GitBox


gliptak commented on pull request #26194:
URL: https://github.com/apache/spark/pull/26194#issuecomment-735027828


   https://github.com/capitalone/datacompy/pull/88



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables

2020-11-27 Thread GitBox


SparkQA removed a comment on pull request #30403:
URL: https://github.com/apache/spark/pull/30403#issuecomment-734994326


   **[Test build #131888 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131888/testReport)**
 for PR 30403 at commit 
[`7e788ce`](https://github.com/apache/spark/commit/7e788cea5c2dd03f71ee30b0106e04d7f036f30f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #30403:
URL: https://github.com/apache/spark/pull/30403#issuecomment-735024962







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables

2020-11-27 Thread GitBox


SparkQA commented on pull request #30403:
URL: https://github.com/apache/spark/pull/30403#issuecomment-735024762


   **[Test build #131888 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131888/testReport)**
 for PR 30403 at commit 
[`7e788ce`](https://github.com/apache/spark/commit/7e788cea5c2dd03f71ee30b0106e04d7f036f30f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #29066: [SPARK-23889][SQL] DataSourceV2: required sorting and clustering for writes

2020-11-27 Thread GitBox


HeartSaVioR commented on pull request #29066:
URL: https://github.com/apache/spark/pull/29066#issuecomment-735021901


   Great to see this making progress. It'd be nice if we include this in Spark 
3.1.0, so that custom V2 writers could leverage this instead of sticking with 
V1 writer.
   (Personally I think it worths to ensure including this in 3.1.0. This 
shouldn't drag any more, discussion happened years ago, and V2 writer also 
exists years ago and has been lacking such essential functionality.)
   
   I tried to take a look when the PR was WIP, but I felt I'm not quite 
qualified to review and finally approve this. I'll try taking a look soon, but 
it'd be nice if some others can review this.
   
   cc. @cloud-fan @viirya Would you mind taking a look at this to make this in 
3.1.0? Thanks in advance.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #29729: [SPARK-32032][SS] Avoid infinite wait in driver because of KafkaConsumer.poll(long) API

2020-11-27 Thread GitBox


HeartSaVioR commented on pull request #29729:
URL: https://github.com/apache/spark/pull/29729#issuecomment-735020039


   I see there's no major change in both `KafkaOffsetReaderConsumer` and 
`KafkaOffsetReaderAdmin`. For this case I'd be OK to refactor here as well, 
though I'd just change `KafkaOffserReader` to trait and have a companion object 
`KafkaOffsetReaderConsumer` which provides implementation depending on the 
config. We no longer need to require wrapper then.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #29446: [WIP][SPARK-32628][SQL] Use bloom filter to improve dynamicPartitionPruning

2020-11-27 Thread GitBox


github-actions[bot] closed pull request #29446:
URL: https://github.com/apache/spark/pull/29446


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #29413: [SPARK-32597][CORE] Tune Event Drop in Async Event Queue

2020-11-27 Thread GitBox


github-actions[bot] closed pull request #29413:
URL: https://github.com/apache/spark/pull/29413


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30522: [SPARK-33578][CORE] enableHiveSupport is invalid after sparkContext t…

2020-11-27 Thread GitBox


HyukjinKwon commented on a change in pull request #30522:
URL: https://github.com/apache/spark/pull/30522#discussion_r531811066



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -710,6 +710,11 @@ class SparkContext(config: SparkConf) extends Logging {
 }
   }
 
+  /** Set spark conf */
+  def setSparkConf(sparkConf: SparkConf): Unit = {

Review comment:
   Spark config is supposed to be immutable. I don't think we should allow 
this in Spark context. If you should set a static SQL config, you should either 
stop and start the context again or set it initially when you create a Spark 
context.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30522: [SPARK-33578][CORE] enableHiveSupport is invalid after sparkContext t…

2020-11-27 Thread GitBox


HyukjinKwon commented on a change in pull request #30522:
URL: https://github.com/apache/spark/pull/30522#discussion_r531811066



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -710,6 +710,11 @@ class SparkContext(config: SparkConf) extends Logging {
 }
   }
 
+  /** Set spark conf */
+  def setSparkConf(sparkConf: SparkConf): Unit = {

Review comment:
   Spark cknfig is supposed to be immutable. I don't think we should allow 
this in Spark context. If you should set a static SQL config, you should either 
stop and start the context again or set it initially when you create a Spark 
context.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables

2020-11-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30403:
URL: https://github.com/apache/spark/pull/30403#issuecomment-735009844







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #30403:
URL: https://github.com/apache/spark/pull/30403#issuecomment-735009844







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #30515: [SPARK-33570][SQL][TESTS] Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-27 Thread GitBox


maropu commented on a change in pull request #30515:
URL: https://github.com/apache/spark/pull/30515#discussion_r531805474



##
File path: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala
##
@@ -24,15 +24,21 @@ import com.spotify.docker.client.messages.{ContainerConfig, 
HostConfig}
 import 
org.apache.spark.sql.execution.datasources.jdbc.connection.SecureConnectionProvider
 import org.apache.spark.tags.DockerTest
 
+/**
+ * To run this test suite for a specific version (e.g., mariadb:10.5.8):
+ * {{{
+ *   MARIADB_DOCKER_IMAGE_NAME=mariadb:10.5.8
+ * ./build/sbt -Pdocker-integration-tests
+ * "testOnly org.apache.spark.sql.jdbc.MariaDBKrb2IntegrationSuite"

Review comment:
   wrong class name: `MariaDBKrb2IntegrationSuite `





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30412: [SPARK-33480][SQL] Support char/varchar type

2020-11-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30412:
URL: https://github.com/apache/spark/pull/30412#issuecomment-734930827







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30403: [SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables

2020-11-27 Thread GitBox


SparkQA commented on pull request #30403:
URL: https://github.com/apache/spark/pull/30403#issuecomment-734994326


   **[Test build #131888 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131888/testReport)**
 for PR 30403 at commit 
[`7e788ce`](https://github.com/apache/spark/commit/7e788cea5c2dd03f71ee30b0106e04d7f036f30f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30430: [SPARK-33503][SQL] Refactor SortOrder class to allow multiple childrens

2020-11-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30430:
URL: https://github.com/apache/spark/pull/30430#issuecomment-734957534







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30430: [SPARK-33503][SQL] Refactor SortOrder class to allow multiple childrens

2020-11-27 Thread GitBox


AmplabJenkins commented on pull request #30430:
URL: https://github.com/apache/spark/pull/30430#issuecomment-734957534







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30430: [SPARK-33503][SQL] Refactor SortOrder class to allow multiple childrens

2020-11-27 Thread GitBox


SparkQA removed a comment on pull request #30430:
URL: https://github.com/apache/spark/pull/30430#issuecomment-734880395


   **[Test build #131886 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131886/testReport)**
 for PR 30430 at commit 
[`815396e`](https://github.com/apache/spark/commit/815396e21bbe5f371c7603a46a8a59c764d6cefd).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >