[GitHub] [spark] HyukjinKwon commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax

2020-10-27 Thread GitBox


HyukjinKwon commented on a change in pull request #28026:
URL: https://github.com/apache/spark/pull/28026#discussion_r513197469



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##
@@ -295,18 +295,61 @@ private[sql] object CatalogV2Util {
 catalog.name().equalsIgnoreCase(CatalogManager.SESSION_CATALOG_NAME)
   }
 
-  def convertTableProperties(
+  def convertTableProperties(c: CreateTableStatement): Map[String, String] = {
+convertTableProperties(
+  c.properties, c.options, c.serde, c.location, c.comment, c.provider, 
c.external)
+  }
+
+  def convertTableProperties(c: CreateTableAsSelectStatement): Map[String, 
String] = {
+convertTableProperties(
+  c.properties, c.options, c.serde, c.location, c.comment, c.provider, 
c.external)
+  }
+
+  def convertTableProperties(r: ReplaceTableStatement): Map[String, String] = {
+convertTableProperties(r.properties, r.options, r.serde, r.location, 
r.comment, r.provider)
+  }
+
+  def convertTableProperties(r: ReplaceTableAsSelectStatement): Map[String, 
String] = {
+convertTableProperties(r.properties, r.options, r.serde, r.location, 
r.comment, r.provider)
+  }
+
+  private def convertTableProperties(
   properties: Map[String, String],
   options: Map[String, String],
+  serdeInfo: Option[SerdeInfo],
   location: Option[String],
   comment: Option[String],
-  provider: Option[String]): Map[String, String] = {
-properties ++ options ++
+  provider: Option[String],
+  external: Boolean = false): Map[String, String] = {
+properties ++
+  options ++ // to make the transition to the "option." prefix easier, add 
both
+  options.map { case (key, value) => TableCatalog.OPTION_PREFIX + key -> 
value } ++
+  convertToProperties(serdeInfo) ++
+  (if (external) Map(TableCatalog.PROP_EXTERNAL -> "true") else Map.empty) 
++

Review comment:
   Making separate JIRAs and PRs:
   
   - make it easier to port/revert/manage, e.g.) reverting a specific change.
   - easier to track the discussions specific to smaller scope, e.g.) the 
current discussion thread looks already very long to read
   - +1653 and −1125 change is large. It makes easily people to miss key things 
during a review.
   - this `CREATE TABLE` syntax is pretty much an entry point of users. Here is 
where we need more calls for review. I would encourage to make it easier to 
review so many people can review closely to prevent any surprise by simply 
overlooking.
   
   If I am not mistaken, it is usually considered as a good practice to split 
if that can logically be split. I also encourage to split PRs in general. If we 
should not split, I think the argument should be about why it should not be 
split (instead of why it should be split).
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


HeartSaVioR edited a comment on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-717716130


   Isn't it declared as "incorrect" behavior in discussion thread in dev. 
mailing list? If that's not a bug what exactly we fix for master branch? I 
think it's a kind of serious bug as Spark "ignores" the end users' intention 
and silently fails back. Suppose the custom session catalog does audit then it 
silently skips auditing and continue to do the operation.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717714917


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34953/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-717714930


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34955/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-27 Thread GitBox


SparkQA commented on pull request #26935:
URL: https://github.com/apache/spark/pull/26935#issuecomment-717716376


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34957/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


HeartSaVioR commented on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-717716130


   Isn't it declared as "incorrect" behavior in discussion thread in dev. 
mailing list? If that's not a bug what exactly we fix for master branch? I 
think it's a kind of serious bug as Spark "ignores" the end users' intention 
and silently fails back. Suppose the custom session catalog does audit then it 
silently skips auditing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-717714926


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30147:
URL: https://github.com/apache/spark/pull/30147#issuecomment-717708872







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717714910


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


SparkQA removed a comment on pull request #30147:
URL: https://github.com/apache/spark/pull/30147#issuecomment-717631159


   **[Test build #130343 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130343/testReport)**
 for PR 30147 at commit 
[`d50c691`](https://github.com/apache/spark/commit/d50c6911c76ff62286c3bef571d5a488b1f7b5ec).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

2020-10-27 Thread GitBox


cloud-fan commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717715392


   @viirya @maropu what do you think about backporting? Shall we backport 
without the new rule or with it?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

2020-10-27 Thread GitBox


cloud-fan closed pull request #30093:
URL: https://github.com/apache/spark/pull/30093


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717714903


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34953/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


SparkQA commented on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-717714913


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34955/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717714910







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-717714926







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

2020-10-27 Thread GitBox


cloud-fan commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717715051


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13

2020-10-27 Thread GitBox


SparkQA commented on pull request #30166:
URL: https://github.com/apache/spark/pull/30166#issuecomment-717713671


   **[Test build #130355 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130355/testReport)**
 for PR 30166 at commit 
[`d0decd8`](https://github.com/apache/spark/commit/d0decd8994f67c0085ef2f71c03d4f04da4fe64c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #30079: [SPARK-33174][SQL] Migrate DROP TABLE to use UnresolvedTableOrView to resolve the identifier

2020-10-27 Thread GitBox


cloud-fan closed pull request #30079:
URL: https://github.com/apache/spark/pull/30079


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #30079: [SPARK-33174][SQL] Migrate DROP TABLE to use UnresolvedTableOrView to resolve the identifier

2020-10-27 Thread GitBox


cloud-fan commented on pull request #30079:
URL: https://github.com/apache/spark/pull/30079#issuecomment-717712906


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30079: [SPARK-33174][SQL] Migrate DROP TABLE to use UnresolvedTableOrView to resolve the identifier

2020-10-27 Thread GitBox


cloud-fan commented on a change in pull request #30079:
URL: https://github.com/apache/spark/pull/30079#discussion_r513193419



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -367,9 +367,17 @@ class ResolveSessionCatalog(
   orCreate = c.orCreate)
   }
 
+case DropTable(
+r @ ResolvedTable(_, _, _: V1Table), ifExists, purge) if 
isSessionCatalog(r.catalog) =>
+  DropTableCommand(r.identifier.asTableIdentifier, ifExists, isView = 
false, purge = purge)
+
 // v1 DROP TABLE supports temp view.
-case DropTableStatement(TempViewOrV1Table(name), ifExists, purge) =>
-  DropTableCommand(name.asTableIdentifier, ifExists, isView = false, purge 
= purge)
+case DropTable(r: ResolvedView, ifExists, purge) =>
+  if (!r.isTemp) {

Review comment:
   Currently, it's duplicated with the check in `DropTableCommand`. I think 
it's OK as we want to move to v2 commands eventually.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717711223


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34954/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13

2020-10-27 Thread GitBox


HyukjinKwon commented on pull request #30166:
URL: https://github.com/apache/spark/pull/30166#issuecomment-717711256


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29247: [SPARK-32446][SHS] Add new executor metrics summary REST APIs

2020-10-27 Thread GitBox


SparkQA commented on pull request #29247:
URL: https://github.com/apache/spark/pull/29247#issuecomment-717710315


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34956/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


SparkQA commented on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-717709003


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34955/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #30147:
URL: https://github.com/apache/spark/pull/30147#issuecomment-717708872







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


SparkQA commented on pull request #30147:
URL: https://github.com/apache/spark/pull/30147#issuecomment-717708073


   **[Test build #130343 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130343/testReport)**
 for PR 30147 at commit 
[`d50c691`](https://github.com/apache/spark/commit/d50c6911c76ff62286c3bef571d5a488b1f7b5ec).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


HyukjinKwon edited a comment on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-717706506


   Hm, is it something we should port back? I think it's hard to call it a bug. 
Maintenance release shouldn't have such behaviour changes in general according 
to semver though I got that DSv2 is still experimental but this doesn't look 
something necessary to port back.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717706418


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34953/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


HyukjinKwon commented on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-717706506


   Hm, is it something we should port back? I think it's hard to call it a bug. 
Maintenance release shouldn't have such behaviour changes in general according 
to semver though I got that DSv2 is still experimental.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-27 Thread GitBox


otterc commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r513185051



##
File path: 
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##
@@ -94,6 +95,9 @@
   static final String STOP_ON_FAILURE_KEY = "spark.yarn.shuffle.stopOnFailure";
   private static final boolean DEFAULT_STOP_ON_FAILURE = false;
 
+  // Used by shuffle merge manager to create merged shuffle files.
+  protected static final String APP_BASE_RELATIVE_PATH = 
"usercache/%s/appcache/%s/";

Review comment:
   Also I think we don't need a separate `registerApplication` API anymore. 
It's just creating an empty `appsPathInfo` for the appId and adds it to the 
`appsPathInfo`. This could be done in `registerExecutor`. Even with 
`ExternalShuffleBlockResolver` there isn't any registerApplication API. 
   My next commit will remove this as well. 
   cc. @Victsm 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-27 Thread GitBox


SparkQA commented on pull request #26935:
URL: https://github.com/apache/spark/pull/26935#issuecomment-717702143


   **[Test build #130354 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130354/testReport)**
 for PR 26935 at commit 
[`95028e2`](https://github.com/apache/spark/commit/95028e2a70a3a32f6b6c10cef432dbb2eb5bed58).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-27 Thread GitBox


HeartSaVioR commented on pull request #26935:
URL: https://github.com/apache/spark/pull/26935#issuecomment-717701730


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-27 Thread GitBox


otterc commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r513182393



##
File path: 
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##
@@ -279,6 +287,7 @@ public void 
initializeApplication(ApplicationInitializationContext context) {
 } catch (Exception e) {
   logger.error("Exception when initializing application {}", appId, e);
 }
+shuffleMergeManager.registerApplication(appId, context.getUser());

Review comment:
   This is a mistake. My next commit will have the fix for this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-27 Thread GitBox


otterc commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r513180854



##
File path: 
common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java
##
@@ -0,0 +1,462 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.Arrays;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Throwables;
+import com.google.common.collect.ImmutableMap;
+
+import org.apache.commons.io.FileUtils;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.junit.Assert.*;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.MapConfigProvider;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * Tests for {@link RemoteBlockPushResolver}.
+ */
+public class RemoteBlockPushResolverSuite {
+
+  private static final Logger log = 
LoggerFactory.getLogger(RemoteBlockPushResolverSuite.class);
+  private final String MERGE_DIR_RELATIVE_PATH = "usercache/%s/appcache/%s/";
+  private final String TEST_USER = "testUser";
+  private final String TEST_APP = "testApp";
+  private final String BLOCK_MANAGER_DIR = "blockmgr-193d8401";
+
+  private TransportConf conf;
+  private RemoteBlockPushResolver pushResolver;
+  private String[] localDirs;
+
+  @Before
+  public void before() throws IOException {
+localDirs = new 
String[]{Paths.get("target/l1").toAbsolutePath().toString(),
+  Paths.get("target/l2").toAbsolutePath().toString()};
+cleanupLocalDirs();
+MapConfigProvider provider = new MapConfigProvider(
+  ImmutableMap.of("spark.shuffle.server.minChunkSizeInMergedShuffleFile", 
"4"));
+conf = new TransportConf("shuffle", provider);
+pushResolver = new RemoteBlockPushResolver(conf, MERGE_DIR_RELATIVE_PATH);
+registerApplication(TEST_APP, TEST_USER);
+registerExecutor(TEST_APP, prepareBlockManagerLocalDirs(TEST_APP, 
TEST_USER, localDirs));
+  }
+
+  @After
+  public void after() {
+try {
+  cleanupLocalDirs();
+  removeApplication(TEST_APP);
+} catch (Exception e) {
+  // don't fail if clean up doesn't succeed.
+  log.debug("Error while tearing down", e);
+}
+  }
+
+  private void cleanupLocalDirs() throws IOException {
+for (String local : localDirs) {
+  FileUtils.deleteDirectory(new File(local));
+}
+  }
+
+  @Test(expected = RuntimeException.class)
+  public void testNoIndexFile() {
+try {
+  pushResolver.getMergedBlockMeta(TEST_APP, 0, 0);
+} catch (Throwable t) {
+  assertTrue(t.getMessage().startsWith("Merged shuffle index file"));
+  Throwables.propagate(t);
+}
+  }
+
+  @Test
+  public void testBasicBlockMerge() throws IOException {
+PushBlockStream[] pushBlocks = new PushBlockStream[] {
+  new PushBlockStream(TEST_APP, "shuffle_0_0_0", 0),
+  new PushBlockStream(TEST_APP, "shuffle_0_1_0", 0),
+};
+ByteBuffer[] blocks = new ByteBuffer[]{
+  ByteBuffer.wrap(new byte[4]),
+  ByteBuffer.wrap(new byte[5])
+};
+pushBlockHelper(TEST_APP, pushBlocks, blocks);
+MergedBlockMeta blockMeta = pushResolver.getMergedBlockMeta(TEST_APP, 0, 
0);
+validateChunks(TEST_APP, 0, 0, blockMeta, new int[]{4, 5}, new 
int[][]{{0}, {1}});
+  }
+
+  @Test
+  public void testDividingMergedBlocksIntoChunks() throws IOException {
+PushBlockStream[] pushBlocks = new PushBlockStream[] {
+  new PushBlockStream(TEST_APP, "shuffle_0_0_0", 0),
+  new PushBlockStream(TEST_APP, "shuffle_0_1_0", 0),
+  new PushBlockStream(TEST_APP, "shuffle_0_2_0", 0),
+  new PushBlockStream(TEST_APP, "shuffle_0_3_0", 0),
+};
+

[GitHub] [spark] HeartSaVioR commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


HeartSaVioR commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513180738



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala
##
@@ -89,10 +89,15 @@ case class OffsetSeqMetadata(
 
 object OffsetSeqMetadata extends Logging {
   private implicit val format = Serialization.formats(NoTypeHints)
+  /**
+   * These configs are related to streaming query execution and should not be 
changed across
+   * batches of a streaming query. The values of these configs are persisted 
into the offset
+   * log in the checkpoint position.
+   */
   private val relevantSQLConfs = Seq(
 SHUFFLE_PARTITIONS, STATE_STORE_PROVIDER_CLASS, 
STREAMING_MULTIPLE_WATERMARK_POLICY,
 FLATMAPGROUPSWITHSTATE_STATE_FORMAT_VERSION, 
STREAMING_AGGREGATION_STATE_FORMAT_VERSION,
-STREAMING_JOIN_STATE_FORMAT_VERSION)
+STREAMING_JOIN_STATE_FORMAT_VERSION, STATE_STORE_COMPRESSION_CODEC)

Review comment:
   I'd also recommend to add the default value ("lz4") in 
`relevantSQLConfDefaultValues` to make sure we don't make any possible mistake. 
(That's a sort of defensive programming though.)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29247: [SPARK-32446][SHS] Add new executor metrics summary REST APIs

2020-10-27 Thread GitBox


SparkQA commented on pull request #29247:
URL: https://github.com/apache/spark/pull/29247#issuecomment-717694392


   **[Test build #130353 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130353/testReport)**
 for PR 29247 at commit 
[`1854e74`](https://github.com/apache/spark/commit/1854e7465d5309a382e118bfe3fe7ada5023ef52).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


SparkQA commented on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-717694405


   **[Test build #130352 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130352/testReport)**
 for PR 30167 at commit 
[`f6550d0`](https://github.com/apache/spark/commit/f6550d07f201e4f23cdc40e049aa7f550eabdf2b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29247: [SPARK-32446][SHS] Add new executor metrics summary REST APIs

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29247:
URL: https://github.com/apache/spark/pull/29247#issuecomment-664032815


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #29247: [SPARK-32446][SHS] Add new executor metrics summary REST APIs

2020-10-27 Thread GitBox


gengliangwang commented on pull request #29247:
URL: https://github.com/apache/spark/pull/29247#issuecomment-717693646







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR opened a new pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


HeartSaVioR opened a new pull request #30167:
URL: https://github.com/apache/spark/pull/30167


   ### What changes were proposed in this pull request?
   
   This patch proposes to change the behavior on failing fast when Spark fails 
to instantiate configured v2 session catalog.
   
   ### Why are the changes needed?
   
   The Spark behavior is against the intention of the end users - if end users 
configure session catalog which Spark would fail to initialize, Spark would 
swallow the error with only logging the error message and silently use the 
default catalog implementation.
   
   This follows the voices on [discussion 
thread](https://lists.apache.org/thread.html/rdfa22a5ebdc4ac66e2c5c8ff0cd9d750e8a1690cd6fb456d119c2400%40%3Cdev.spark.apache.org%3E)
 in dev mailing list.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. After the PR Spark will fail immediately if Spark fails to instantiate 
configured session catalog.
   
   ### How was this patch tested?
   
   New UT added.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


HeartSaVioR commented on pull request #30147:
URL: https://github.com/apache/spark/pull/30147#issuecomment-717692702


   I'll submit a PR for 3.0 branch as well.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717692390


   **[Test build #130351 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130351/testReport)**
 for PR 30162 at commit 
[`86f0924`](https://github.com/apache/spark/commit/86f09243816112112addb8cf18c97118a983f434).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


viirya commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513177486



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala
##
@@ -89,10 +89,15 @@ case class OffsetSeqMetadata(
 
 object OffsetSeqMetadata extends Logging {
   private implicit val format = Serialization.formats(NoTypeHints)
+  /**
+   * These configs are related to streaming query execution and should not be 
changed across
+   * batches of a streaming query. The values of these configs are persisted 
into the offset
+   * log in the checkpoint position.
+   */

Review comment:
   Added some comments to notify readers.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30148: [SPARK-33244][SQL] Unify the code paths for spark.table and spark.read.table

2020-10-27 Thread GitBox


HeartSaVioR commented on pull request #30148:
URL: https://github.com/apache/spark/pull/30148#issuecomment-717690813


   Let's fix it to remove any confusion then - let's ensure both 
`spark.read.table` and `spark.table` can't deal with streaming table (even it 
is from temp view) so end users need to deal with `spark.readStream.table`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717689524


   @HeartSaVioR Make sense. Looks like we need to put the config into 
`relevantSQLConfs`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


HeartSaVioR commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717687791


   @dongjoon-hyun @viirya 
   The problem isn't something about changing the config during the single run. 
The problem is something about changing the config during the new run with 
checkpoint. That's why we should put the config for the first time, and always 
read the value from checkpoint and ignore further modification.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717685222


   > 
https://github.com/apache/spark/blob/fcf8aa59b5025dde9b4af36953146894659967e2/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L92-L115
   > 
   > Please look into how `relevantSQLConfs` is handled in OffsetSeqMetadata.
   
   Hmm, as we use default value as lz4, do we still need to put new config into 
here?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717684847


   > @viirya and @HeartSaVioR .
   > Shall we put the new config into `StaticSQLConf.scala` instead of 
`SQLConf.scala`? I think that is enough.
   
   Yes, that is also what I think. Will update later.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30166:
URL: https://github.com/apache/spark/pull/30166#issuecomment-717682965


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130346/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


HeartSaVioR commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717683158


   
https://github.com/apache/spark/blob/fcf8aa59b5025dde9b4af36953146894659967e2/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L92-L115
   
   Please look into how `relevantSQLConfs` is handled in OffsetSeqMetadata.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


dongjoon-hyun commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513168650



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -1324,6 +1324,16 @@ object SQLConf {
 .intConf
 .createWithDefault(2)
 
+  val STATE_STORE_COMPRESSION_CODEC =
+buildConf("spark.sql.streaming.stateStore.compression.codec")
+  .internal()
+  .doc("The codec used to compress delta and snapshot files generated by 
StateStore. " +
+"By default, Spark provides four codecs: lz4, lzf, snappy, and zstd. 
You can also " +
+"use fully qualified class names to specify the codec. Default codec 
is lz4.")
+  .version("3.1.0")
+  .stringConf
+  .createWithDefault("lz4")

Review comment:
   This doesn't change the default value. I believe we can add the 
benchmark result as a follow-up, @HeartSaVioR .





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #30166:
URL: https://github.com/apache/spark/pull/30166#issuecomment-717682955







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


dongjoon-hyun commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513168650



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -1324,6 +1324,16 @@ object SQLConf {
 .intConf
 .createWithDefault(2)
 
+  val STATE_STORE_COMPRESSION_CODEC =
+buildConf("spark.sql.streaming.stateStore.compression.codec")
+  .internal()
+  .doc("The codec used to compress delta and snapshot files generated by 
StateStore. " +
+"By default, Spark provides four codecs: lz4, lzf, snappy, and zstd. 
You can also " +
+"use fully qualified class names to specify the codec. Default codec 
is lz4.")
+  .version("3.1.0")
+  .stringConf
+  .createWithDefault("lz4")

Review comment:
   This doesn't change the default value. I believe we can add the 
benchmark result as a follow-up.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30166:
URL: https://github.com/apache/spark/pull/30166#issuecomment-717682955


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


dongjoon-hyun commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717682587


   @viirya and @HeartSaVioR .
   Shall we put the new config into `StaticSQLConf.scala` instead of 
`SQLConf.scala`? I think that is enough.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13

2020-10-27 Thread GitBox


SparkQA commented on pull request #30166:
URL: https://github.com/apache/spark/pull/30166#issuecomment-717682590


   **[Test build #130346 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130346/testReport)**
 for PR 30166 at commit 
[`d0decd8`](https://github.com/apache/spark/commit/d0decd8994f67c0085ef2f71c03d4f04da4fe64c).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13

2020-10-27 Thread GitBox


SparkQA removed a comment on pull request #30166:
URL: https://github.com/apache/spark/pull/30166#issuecomment-717645869


   **[Test build #130346 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130346/testReport)**
 for PR 30166 at commit 
[`d0decd8`](https://github.com/apache/spark/commit/d0decd8994f67c0085ef2f71c03d4f04da4fe64c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


HeartSaVioR commented on pull request #30147:
URL: https://github.com/apache/spark/pull/30147#issuecomment-717681783


   Thanks for reviewing and merging!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less th

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717681498







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schem

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717681498







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size

2020-10-27 Thread GitBox


SparkQA commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717681485


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34951/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schem

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717680723







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less th

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717680723







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717674157


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717674160


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34950/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size

2020-10-27 Thread GitBox


SparkQA commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717680708


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34952/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size

2020-10-27 Thread GitBox


SparkQA commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717675078


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34952/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717674157







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


SparkQA commented on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717674144


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34950/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


cloud-fan closed pull request #30147:
URL: https://github.com/apache/spark/pull/30147


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox


cloud-fan commented on pull request #30147:
URL: https://github.com/apache/spark/pull/30147#issuecomment-717673091


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size

2020-10-27 Thread GitBox


SparkQA commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717672685


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34951/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29677:
URL: https://github.com/apache/spark/pull/29677#issuecomment-717668190


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34949/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29677:
URL: https://github.com/apache/spark/pull/29677#issuecomment-717668186


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #29677:
URL: https://github.com/apache/spark/pull/29677#issuecomment-717668186







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements

2020-10-27 Thread GitBox


SparkQA commented on pull request #29677:
URL: https://github.com/apache/spark/pull/29677#issuecomment-717668172


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34949/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r513153232



##
File path: sql/core/src/test/resources/sql-tests/results/window.sql.out
##
@@ -479,6 +479,38 @@ Anthony Bow6627Gerard Bondur
 Leslie Thompson5186Gerard Bondur
 
 
+-- !query
+SELECT
+employee_name,
+salary,
+nth_value(employee_name, 2) OVER (
+  ORDER BY salary DESC
+  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) second_highest_salary
+FROM
+basic_pays
+ORDER BY salary DESC
+-- !query schema
+struct
+-- !query output
+Larry Bott 11798   NULL
+Gerard Bondur  11472   Gerard Bondur
+Pamela Castillo11303   Gerard Bondur

Review comment:
   The output comes from `hiveResultString`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


SparkQA commented on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-71702


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34950/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox


viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717666167


   > either 1) make it as a configuration but prevent the value to be changed 
after the query starts (like we do in state store formats)
   
   Btw, out of curiosity, I check two state format configs 
`spark.sql.streaming.join.stateFormatVersion` and 
`spark.sql.streaming.aggregation.stateFormatVersion`.  Their docs just 
explicitly claims that `state format version shouldn't be modified after 
running.`, and I don't find related code to prevent the values to be changed 
after the query starts. Maybe I miss it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size

2020-10-27 Thread GitBox


SparkQA commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717664437


   **[Test build #130350 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130350/testReport)**
 for PR 30156 at commit 
[`19f`](https://github.com/apache/spark/commit/19f6af09fa9d612592ecb86b442efcd6f738).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30166:
URL: https://github.com/apache/spark/pull/30166#issuecomment-717662046


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34948/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30166:
URL: https://github.com/apache/spark/pull/30166#issuecomment-717662042


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size

2020-10-27 Thread GitBox


SparkQA commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717662099


   **[Test build #130349 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130349/testReport)**
 for PR 30156 at commit 
[`db7d53a`](https://github.com/apache/spark/commit/db7d53a096149c91cb4376ae890bad1eaa2db679).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13

2020-10-27 Thread GitBox


SparkQA commented on pull request #30166:
URL: https://github.com/apache/spark/pull/30166#issuecomment-717662031


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34948/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less

2020-10-27 Thread GitBox


AngersZh commented on a change in pull request #30156:
URL: https://github.com/apache/spark/pull/30156#discussion_r513148800



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2748,6 +2748,16 @@ object SQLConf {
   .checkValue(_ > 0, "The timeout value must be positive")
   .createWithDefault(10L)
 
+  val LEGACY_SCRIPT_TRANSFORM_PAD_NULL =
+buildConf("spark.sql.legacy.transformationPadNullWhenValueLessThenSchema")
+  .internal()
+  .doc("Whether pad null value when transformation output value size less 
then schema size." +
+"When true, we pad NULL value to keep same behavior with hive." +
+"When false, we keep original behavior to throw 
`ArrayIndexOutOfBoundsException`")

Review comment:
   > `we` -> `Spark`
   
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #30166:
URL: https://github.com/apache/spark/pull/30166#issuecomment-717662042







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less

2020-10-27 Thread GitBox


AngersZh commented on a change in pull request #30156:
URL: https://github.com/apache/spark/pull/30156#discussion_r513148844



##
File path: docs/sql-migration-guide.md
##
@@ -49,6 +49,8 @@ license: |
   - In Spark 3.1, we remove the built-in Hive 1.2. You need to migrate your 
custom SerDes to Hive 2.3. See 
[HIVE-15167](https://issues.apache.org/jira/browse/HIVE-15167) for more details.
   
   - In Spark 3.1, loading and saving of timestamps from/to parquet files fails 
if the timestamps are before 1900-01-01 00:00:00Z, and loaded (saved) as the 
INT96 type. In Spark 3.0, the actions don't fail but might lead to shifting of 
the input timestamps due to rebasing from/to Julian to/from Proleptic Gregorian 
calendar. To restore the behavior before Spark 3.1, you can set 
`spark.sql.legacy.parquet.int96RebaseModeInRead` or/and 
`spark.sql.legacy.parquet.int96RebaseModeInWrite` to `LEGACY`.
+  
+  - In Spark 3.1, when 
`spark.sql.legacy.transformationPadNullWhenValueLessThenSchema` is true, Spark 
will pad NULL value when scrip transformation's output value size less then 
schema size in default-serde mode. If false, we will keep original behavior to 
throw `ArrayIndexOutOfBoundsException`.

Review comment:
   > `scrip` -> `script`. Could we a bit more elaborate about 
"default-serde mode"?
   > 
   > `we will keep original behavior to throw ...` -> `Spark will keep original 
behavior to throw ...`
   
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements

2020-10-27 Thread GitBox


SparkQA commented on pull request #29677:
URL: https://github.com/apache/spark/pull/29677#issuecomment-717661324


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34949/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13

2020-10-27 Thread GitBox


SparkQA commented on pull request #30166:
URL: https://github.com/apache/spark/pull/30166#issuecomment-717659086


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34948/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717658158







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717658158







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


SparkQA removed a comment on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717606710


   **[Test build #130342 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130342/testReport)**
 for PR 30139 at commit 
[`53b3c0b`](https://github.com/apache/spark/commit/53b3c0bb90d4b22e4994ea44a659026c4ba0c93c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


SparkQA commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717657234


   **[Test build #130342 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130342/testReport)**
 for PR 30139 at commit 
[`53b3c0b`](https://github.com/apache/spark/commit/53b3c0bb90d4b22e4994ea44a659026c4ba0c93c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #26935:
URL: https://github.com/apache/spark/pull/26935#issuecomment-717652952


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34947/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents

2020-10-27 Thread GitBox


maropu commented on pull request #30165:
URL: https://github.com/apache/spark/pull/30165#issuecomment-717653139


   Thanks! Merged to master/3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #26935:
URL: https://github.com/apache/spark/pull/26935#issuecomment-717652945







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


SparkQA commented on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717652916


   **[Test build #130348 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130348/testReport)**
 for PR 29800 at commit 
[`1c0e82b`](https://github.com/apache/spark/commit/1c0e82ba0ab95af7871b4b16f8be1e0662d49c78).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu closed pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents

2020-10-27 Thread GitBox


maropu closed pull request #30165:
URL: https://github.com/apache/spark/pull/30165


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #26935:
URL: https://github.com/apache/spark/pull/26935#issuecomment-717652945


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >