date:20201028

[GitHub] [spark] AmplabJenkins commented on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



AmplabJenkins commented on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718370669







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718370669







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



SparkQA removed a comment on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718295002


   **[Test build #130389 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130389/testReport)**
 for PR 30177 at commit 
[`ac42502`](https://github.com/apache/spark/commit/ac42502576d428325126a38aac8c179dcbc26f88).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



SparkQA commented on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718369946


   **[Test build #130389 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130389/testReport)**
 for PR 30177 at commit 
[`ac42502`](https://github.com/apache/spark/commit/ac42502576d428325126a38aac8c179dcbc26f88).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class ContextAwareIterator[IN](iter: Iterator[IN], context: 
TaskContext) extends Iterator[IN] `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29247: [SPARK-32446][SHS] Add new executor metrics summary REST APIs

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #29247:
URL: https://github.com/apache/spark/pull/29247#issuecomment-718367667







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29247: [SPARK-32446][SHS] Add new executor metrics summary REST APIs

2020-10-28 Thread GitBox



AmplabJenkins commented on pull request #29247:
URL: https://github.com/apache/spark/pull/29247#issuecomment-718367667







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29247: [SPARK-32446][SHS] Add new executor metrics summary REST APIs

2020-10-28 Thread GitBox



SparkQA commented on pull request #29247:
URL: https://github.com/apache/spark/pull/29247#issuecomment-718367645


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34997/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



dongjoon-hyun commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513980401



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCompatibilitySuite.scala
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.io.CompressionCodec
+import org.apache.spark.sql.catalyst.plans.PlanTestBase
+import org.apache.spark.sql.catalyst.streaming.InternalOutputModes.Update
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.functions.count
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.streaming.StreamTest
+import org.apache.spark.util.Utils
+
+class StateStoreCompatibilitySuite extends StreamTest with 
StateStoreCodecsTest {
+   testWithAllCodec(

Review comment:
   Thanks, @HeartSaVioR .





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718363821







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



AmplabJenkins commented on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718363821







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



SparkQA removed a comment on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718283783


   **[Test build #130388 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130388/testReport)**
 for PR 30177 at commit 
[`b34805c`](https://github.com/apache/spark/commit/b34805c9e32fc43011d789e5209ca8f83e646502).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



SparkQA commented on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718363117


   **[Test build #130388 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130388/testReport)**
 for PR 30177 at commit 
[`b34805c`](https://github.com/apache/spark/commit/b34805c9e32fc43011d789e5209ca8f83e646502).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718362027







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

2020-10-28 Thread GitBox



SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718362021


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34996/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

2020-10-28 Thread GitBox



AmplabJenkins commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718362027







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29247: [SPARK-32446][SHS] Add new executor metrics summary REST APIs

2020-10-28 Thread GitBox



SparkQA commented on pull request #29247:
URL: https://github.com/apache/spark/pull/29247#issuecomment-718361145


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34997/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718359146







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



SparkQA removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718281337


   **[Test build #130387 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130387/testReport)**
 for PR 30162 at commit 
[`c828811`](https://github.com/apache/spark/commit/c8288111064b98107346e93c7686bbd48d5138a1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718359146







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718358492


   **[Test build #130387 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130387/testReport)**
 for PR 30162 at commit 
[`c828811`](https://github.com/apache/spark/commit/c8288111064b98107346e93c7686bbd48d5138a1).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

2020-10-28 Thread GitBox



SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718356033


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34996/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29247: [SPARK-32446][SHS] Add new executor metrics summary REST APIs

2020-10-28 Thread GitBox



SparkQA commented on pull request #29247:
URL: https://github.com/apache/spark/pull/29247#issuecomment-718348533


   **[Test build #130394 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130394/testReport)**
 for PR 29247 at commit 
[`1854e74`](https://github.com/apache/spark/commit/1854e7465d5309a382e118bfe3fe7ada5023ef52).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on pull request #29247: [SPARK-32446][SHS] Add new executor metrics summary REST APIs

2020-10-28 Thread GitBox



gengliangwang commented on pull request #29247:
URL: https://github.com/apache/spark/pull/29247#issuecomment-718347748


   retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718345319







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718345319







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718345308


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34995/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718344106







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718344106







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718343569


   **[Test build #130386 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130386/testReport)**
 for PR 30162 at commit 
[`ab69ccd`](https://github.com/apache/spark/commit/ab69ccd424d350fc1dbc1d9e7892da5cb2854094).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



SparkQA removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718262767


   **[Test build #130386 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130386/testReport)**
 for PR 30162 at commit 
[`ab69ccd`](https://github.com/apache/spark/commit/ab69ccd424d350fc1dbc1d9e7892da5cb2854094).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

2020-10-28 Thread GitBox



SparkQA commented on pull request #30178:
URL: https://github.com/apache/spark/pull/30178#issuecomment-718342791


   **[Test build #130393 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130393/testReport)**
 for PR 30178 at commit 
[`181186c`](https://github.com/apache/spark/commit/181186c6cce9c3b4e3061dc84b667ee898dd3f40).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #24246: [SPARK-24252][SQL] Add TableCatalog API

2020-10-28 Thread GitBox



cloud-fan commented on a change in pull request #24246:
URL: https://github.com/apache/spark/pull/24246#discussion_r513932838



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableCatalog.java
##
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2;
+
+import org.apache.spark.sql.catalog.v2.expressions.Transform;
+import org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException;
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
+import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
+import org.apache.spark.sql.sources.v2.Table;
+import org.apache.spark.sql.types.StructType;
+
+import java.util.Map;
+
+/**
+ * Catalog methods for working with Tables.
+ */
+public interface TableCatalog extends CatalogPlugin {
+  /**
+   * List the tables in a namespace from the catalog.
+   * 
+   * If the catalog supports views, this must return identifiers for only 
tables and not views.
+   *
+   * @param namespace a multi-part namespace
+   * @return an array of Identifiers for tables
+   * @throws NoSuchNamespaceException If the namespace does not exist 
(optional).
+   */
+  Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException;
+
+  /**
+   * Load table metadata by {@link Identifier identifier} from the catalog.
+   * 
+   * If the catalog supports views and contains a view for the identifier and 
not a table, this
+   * must throw {@link NoSuchTableException}.
+   *
+   * @param ident a table identifier
+   * @return the table's metadata
+   * @throws NoSuchTableException If the table doesn't exist or is a view
+   */
+  Table loadTable(Identifier ident) throws NoSuchTableException;
+
+  /**
+   * Invalidate cached table metadata for an {@link Identifier identifier}.
+   * 
+   * If the table is already loaded or cached, drop cached data. If the table 
does not exist or is
+   * not cached, do nothing. Calling this method should not query remote 
services.
+   *
+   * @param ident a table identifier
+   */
+  default void invalidateTable(Identifier ident) {

Review comment:
   Yes, we should add this kind of missing actions into v2 commands as 
well. I think DROP TABLE has the same issue.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer opened a new pull request #30178: [SPARK-33278][SQL] Improve the performance for FIRST_VALUE

2020-10-28 Thread GitBox



beliefer opened a new pull request #30178:
URL: https://github.com/apache/spark/pull/30178


   ### What changes were proposed in this pull request?
   https://github.com/apache/spark/pull/29800 provides a performance 
improvement for `NTH_VALUE`.
   `FIRST_VALUE` also could uses the `UnboundedOffsetWindowFunctionFrame` and 
`UnboundedPrecedingOffsetWindowFunctionFrame`.
   
   
   ### Why are the changes needed?
   Improve the performance for `FIRST_VALUE`.
   
   
   ### Does this PR introduce _any_ user-facing change?
'No'.
   
   
   ### How was this patch tested?
   Jenkins test.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-28 Thread GitBox



HeartSaVioR commented on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-718339453


   The current behavior was already a surprise, otherwise I wouldn't insist 
strongly and try to fix it.
   
   Btw, probably I'd just revive the another thread that "disallow setting 
custom v2 session catalog until all statements are going through v2 session 
catalog" instead of this as we are finding more and more issues from there.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718337953


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34995/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon edited a comment on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-28 Thread GitBox



HyukjinKwon edited a comment on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-718336795


   _In a way_, a user _could_ think it's all internal as long as the tables are 
created somewhere and they can be read regardless of where the tables are 
created. And, a user _could_ think "there's no functional difference".
   
   I agree ^ this isn't right for the all reasons we discussed in the mailing 
list and here but I would prefer to avoid such potential surprise in a 
maintenance release.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-28 Thread GitBox



HyukjinKwon commented on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-718336795


   _In a way_, someone could think it's all internal as long as the tables are 
created somewhere and they can be read regardless of where the tables are 
created. And, someone _could_ think "there's no functional difference". 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30176: [SQL][MINOR] Update from_unixtime doc

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30176:
URL: https://github.com/apache/spark/pull/30176#issuecomment-718335189







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less th

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-718335063







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30176: [SQL][MINOR] Update from_unixtime doc

2020-10-28 Thread GitBox



AmplabJenkins commented on pull request #30176:
URL: https://github.com/apache/spark/pull/30176#issuecomment-718335189







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schem

2020-10-28 Thread GitBox



AmplabJenkins commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-718335063







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size

2020-10-28 Thread GitBox



SparkQA commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-718335052


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34994/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #30176: [SQL][MINOR] Update from_unixtime doc

2020-10-28 Thread GitBox



SparkQA removed a comment on pull request #30176:
URL: https://github.com/apache/spark/pull/30176#issuecomment-718252249


   **[Test build #130384 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130384/testReport)**
 for PR 30176 at commit 
[`25efb4f`](https://github.com/apache/spark/commit/25efb4f80eb33447dd3dc9f93f4b0258ef0912dc).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30176: [SQL][MINOR] Update from_unixtime doc

2020-10-28 Thread GitBox



SparkQA commented on pull request #30176:
URL: https://github.com/apache/spark/pull/30176#issuecomment-718334511


   **[Test build #130384 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130384/testReport)**
 for PR 30176 at commit 
[`25efb4f`](https://github.com/apache/spark/commit/25efb4f80eb33447dd3dc9f93f4b0258ef0912dc).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-28 Thread GitBox



HeartSaVioR commented on pull request #26935:
URL: https://github.com/apache/spark/pull/26935#issuecomment-718334009


   @viirya Would you like to have another eyes on reviewing, or review from 
@gaborgsomogyi could convince you?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR edited a comment on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-28 Thread GitBox



HeartSaVioR edited a comment on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-718330774







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR edited a comment on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-28 Thread GitBox



HeartSaVioR edited a comment on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-718331516


   Additionally, if the pattern is normal in Spark codebase I think we should 
revisit - if users configure something (A) and Spark decides to fail back (B), 
it must be only case where there's no functional difference between A and B 
(e.g. whole stage codegen failback might be OK as it should ideally only have 
difference on performance). Otherwise Spark is silently breaking the intention.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR edited a comment on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-28 Thread GitBox



HeartSaVioR edited a comment on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-718331516


   If the pattern is normal in Spark codebase I think we should revisit - if 
users configure something (A) and Spark decides to fail back (B), it must be 
only case where there's no behavioral difference between A and B (e.g. whole 
stage codegen failback might be OK as it should ideally only have difference on 
performance). Otherwise Spark is silently breaking the intention.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-28 Thread GitBox



HeartSaVioR commented on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-718331516


   If the pattern is normal in Spark codebase I think we should revisit - if 
users configure something (A) and Spark decides to fail back (B), it must be 
only case where there's no behavioral difference between A and B. Otherwise 
Spark is silently breaking the intention.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog

2020-10-28 Thread GitBox



HeartSaVioR commented on pull request #30167:
URL: https://github.com/apache/spark/pull/30167#issuecomment-718330774


   > For example, for any reason, in a cluster the catalog was set and users 
used to use it with no problem. User don't want to change their existing 
workload and code but switch the catalog they are using. So they just reset the 
configuration from their cluster. After the fix, this case wouldn't work
   
   "The catalog was set and users used to use it with no problem" - we are 
talking about the case catalog was set but it wasn't effectively working as the 
default catalog (failback) isn't same as they provided, right? My declaration 
of this is "silently broken" which is "worse" than "not functioning". That is 
not a kind of "no problem". If you don't agree with the statement, let's not 
repeat the opinions around us and hear the opinions from broader group.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] waitinfuture commented on pull request #30168: [SPARK-33208][SQL] Update the document of SparkSession#sql

2020-10-28 Thread GitBox



waitinfuture commented on pull request #30168:
URL: https://github.com/apache/spark/pull/30168#issuecomment-718329039


   > @waitinfuture can you leave a comment in the JIRA so that I can assign the 
ticket to you?
   
   done



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



HeartSaVioR commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513903355



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCompatibilitySuite.scala
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.io.CompressionCodec
+import org.apache.spark.sql.catalyst.plans.PlanTestBase
+import org.apache.spark.sql.catalyst.streaming.InternalOutputModes.Update
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.functions.count
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.streaming.StreamTest
+import org.apache.spark.util.Utils
+
+class StateStoreCompatibilitySuite extends StreamTest with 
StateStoreCodecsTest {
+   testWithAllCodec(

Review comment:
   I didn't know we apply all codec test here as well (I guess this case is 
different from existing one as the default value of the configuration isn't 
changed).
   
   Based on that I feel it becomes a bit ambiguous where is the better place to 
put StateStoreCodecsTest. Anything of these options 1) here 2) StateStoreSuite 
3) even in new file would be fine.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718327943


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34993/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size

2020-10-28 Thread GitBox



SparkQA commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-718328603


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34994/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



HeartSaVioR commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513903355



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCompatibilitySuite.scala
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.io.CompressionCodec
+import org.apache.spark.sql.catalyst.plans.PlanTestBase
+import org.apache.spark.sql.catalyst.streaming.InternalOutputModes.Update
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.functions.count
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.streaming.StreamTest
+import org.apache.spark.util.Utils
+
+class StateStoreCompatibilitySuite extends StreamTest with 
StateStoreCodecsTest {
+   testWithAllCodec(

Review comment:
   I didn't know we apply all codec test here as well - if then I feel it 
becomes a bit ambiguous where is the better place to put StateStoreCodecsTest. 
Anything of these options 1) here 2) StateStoreSuite 3) even in new file would 
be fine.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718327938


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718327928


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34993/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718327938







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718326137


   **[Test build #130392 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130392/testReport)**
 for PR 30162 at commit 
[`4dc153d`](https://github.com/apache/spark/commit/4dc153db5eeea904e0c2208726cbd41bd514b62f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



HyukjinKwon commented on a change in pull request #30177:
URL: https://github.com/apache/spark/pull/30177#discussion_r513898999



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvalPythonExec.scala
##
@@ -137,3 +138,18 @@ trait EvalPythonExec extends UnaryExecNode {
 }
   }
 }
+
+/**
+ * A TaskContext aware iterator.
+ *
+ * As the Python evaluation consumes the parent iterator in a separate thread,
+ * it could consume more data from the parent even after the task ends and the 
parent is closed.
+ * Thus, we should use ContextAwareIterator to stop consuming after the task 
ends.
+ */
+class ContextAwareIterator[IN](iter: Iterator[IN], context: TaskContext) 
extends Iterator[IN] {
+
+  override def hasNext: Boolean =
+!context.isCompleted() && !context.isInterrupted() && iter.hasNext

Review comment:
   BTW, one thing I would like to note that this is not a clean shot.
   
   This is rather a bandaid fix because the consumption in the iterator is 
async-ed from the main task thread. So, the close can happen at any point in 
the upstream, e.g. in the middle of `hasNext`, and it still can cause the same 
issue.
   
   To completely fix this, IMHO, we should sync completely. Then there's no 
point of having a separate thread to process Python UDFs.
   
   I think the cause is basically similar with that `input_file_name` due to 
un-sync between this thread and main thread (see 
https://github.com/apache/spark/pull/24958#issuecomment-511364075).
   
   If there's a better option, it'd be great but I think this fix is good 
enough (given that I see similar approach in `ContinuousQueuedDataReader`).
   
   Let me know if I missed something here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



dongjoon-hyun edited a comment on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718324895


   Thanks. Could you update the affected version of SPARK-33277 accordingly?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



viirya commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513897382



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCompatibilitySuite.scala
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.io.CompressionCodec
+import org.apache.spark.sql.catalyst.plans.PlanTestBase
+import org.apache.spark.sql.catalyst.streaming.InternalOutputModes.Update
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.functions.count
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.streaming.StreamTest
+import org.apache.spark.util.Utils
+
+class StateStoreCompatibleSuite extends StreamTest with StateStoreCodecsTest {

Review comment:
   Oops!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



dongjoon-hyun commented on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718324895


   Thanks. Could you update SPARK-33277 accordingly?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



HyukjinKwon commented on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718322801


   I believe this is a long standing bug even for 2.4 releases ..



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



dongjoon-hyun commented on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718321831


   Is this only for Apache Spark 3.1 and 3.0, @ueshin and @HyukjinKwon ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718321187


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34993/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



dongjoon-hyun commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513885144



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCompatibilitySuite.scala
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.io.CompressionCodec
+import org.apache.spark.sql.catalyst.plans.PlanTestBase
+import org.apache.spark.sql.catalyst.streaming.InternalOutputModes.Update
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.functions.count
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.streaming.StreamTest
+import org.apache.spark.util.Utils
+
+class StateStoreCompatibleSuite extends StreamTest with StateStoreCodecsTest {

Review comment:
   `StateStoreCompatibleSuite` -> `StateStoreCompatibilitySuite`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size

2020-10-28 Thread GitBox



SparkQA commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-718314992


   **[Test build #130391 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130391/testReport)**
 for PR 30156 at commit 
[`3148608`](https://github.com/apache/spark/commit/314860863c592563d91fc98cc0a2a4ec06a5e837).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema

2020-10-28 Thread GitBox



AngersZh commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-718314367


   > Ah, sorry can you update the conflict in migration guide? @AngersZh
   
   Done



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya edited a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



viirya edited a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718305957


   > I'm sorry to try back and forth, but I feel it weird that 
StateStoreCodecsTest belongs to StateStoreCompatibilitySuite.scala file. 
Probably we can just inline StateStoreCodecsTest to StateStoreSuite.
   
   Inlining `StateStoreCodecsTest` to `StateStoreSuite` means we need extend 
`StateStoreSuite` at `StateStoreCompatibleSuite`. It will run duplicate tests 
in `StateStoreSuite`. And `StateStoreCompatibleSuite` also needs to provide 
implementation for `newStoreProvider`, `newStoreProvider` and `getLatestData` 
which are not used at all.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718312288







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



AmplabJenkins commented on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718312288







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



SparkQA commented on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718312272


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34992/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718308057


   **[Test build #130390 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130390/testReport)**
 for PR 30162 at commit 
[`8e435a1`](https://github.com/apache/spark/commit/8e435a1a7266b13e9cfe0072c5a808f10c41ca6b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718305957


   > I'm sorry to try back and forth, but I feel it weird that 
StateStoreCodecsTest belongs to StateStoreCompatibilitySuite.scala file. 
Probably we can just inline StateStoreCodecsTest to StateStoreSuite.
   
   Inlining `StateStoreCodecsTest` to `StateStoreSuite` means we need extend 
`StateStoreSuite` at `StateStoreCompatibleSuite`. It will run duplicate tests 
in `StateStoreSuite`. And `StateStoreCompatibleSuite` also needs to provide 
implementation for `newStoreProvider`, `newStoreProvider` and `getLatestData`.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



HyukjinKwon commented on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718305873


   cc @BryanCutler for another look if you find some time.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



SparkQA commented on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718305551


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34992/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718305004


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130385/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718304998


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema

2020-10-28 Thread GitBox



HyukjinKwon commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-718305127


   Ah, sorry can you update the conflict in migration guide? @AngersZh 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



SparkQA removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718260070


   **[Test build #130385 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130385/testReport)**
 for PR 30162 at commit 
[`f16c563`](https://github.com/apache/spark/commit/f16c563cddc24d9df596a7a2b690457b514c8464).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718304998







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718304768


   **[Test build #130385 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130385/testReport)**
 for PR 30162 at commit 
[`f16c563`](https://github.com/apache/spark/commit/f16c563cddc24d9df596a7a2b690457b514c8464).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class StateStoreCompatibleSuite extends StreamTest with 
StateStoreCodecsTest `
 * `trait StateStoreCodecsTest extends SparkFunSuite with PlanTestBase `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #30172: [SPARK-33270][SQL] Return SQL schema instead of Catalog string from the `SchemaOfJson` expression

2020-10-28 Thread GitBox



HyukjinKwon closed pull request #30172:
URL: https://github.com/apache/spark/pull/30172


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #30176: [SQL][MINOR] Update from_unixtime doc

2020-10-28 Thread GitBox



HyukjinKwon closed pull request #30176:
URL: https://github.com/apache/spark/pull/30176


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #30172: [SPARK-33270][SQL] Return SQL schema instead of Catalog string from the `SchemaOfJson` expression

2020-10-28 Thread GitBox



HyukjinKwon commented on pull request #30172:
URL: https://github.com/apache/spark/pull/30172#issuecomment-718302297


   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #30172: [SPARK-33270][SQL] Return SQL schema instead of Catalog string from the `SchemaOfJson` expression

2020-10-28 Thread GitBox



HyukjinKwon commented on a change in pull request #30172:
URL: https://github.com/apache/spark/pull/30172#discussion_r513859872



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
##
@@ -801,7 +801,7 @@ case class SchemaOfJson(
   }
 }
 
-UTF8String.fromString(dt.catalogString)
+UTF8String.fromString(dt.sql)

Review comment:
   Okay .. NVM ..





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



SparkQA commented on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718301803


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34991/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718301816







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-10-28 Thread GitBox



AmplabJenkins commented on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-718301816







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #30176: [SQL][MINOR] Update from_unixtime doc

2020-10-28 Thread GitBox



HyukjinKwon commented on pull request #30176:
URL: https://github.com/apache/spark/pull/30176#issuecomment-718301543


   Merged to master and branch-3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #30174: [SPARK-33271] load HADOOP_HOME and SPARK_DIST_CLASSPATH in class path

2020-10-28 Thread GitBox



HyukjinKwon commented on pull request #30174:
URL: https://github.com/apache/spark/pull/30174#issuecomment-718300403


   @BigaDev, looks like this is a backport of SPARK-29574. You'll have to pick 
the commits and PR description as are, and keep the JIRA IDs in the PR title.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon edited a comment on pull request #30174: [SPARK-33271] load HADOOP_HOME and SPARK_DIST_CLASSPATH in class path

2020-10-28 Thread GitBox



HyukjinKwon edited a comment on pull request #30174:
URL: https://github.com/apache/spark/pull/30174#issuecomment-718300403


   @BigaDev, looks like this is a backport of SPARK-29574. You'll have to pick 
the commits and keep the PR description as are, and keep the JIRA IDs in the PR 
title.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718299797







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718299797







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718299791


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34990/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718296486







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718296486







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



SparkQA removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718191772


   **[Test build #130383 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130383/testReport)**
 for PR 30162 at commit 
[`92b0bee`](https://github.com/apache/spark/commit/92b0beefcce229acaa93e3d76b2b7ffa52ae0369).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-28 Thread GitBox



SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718295797


   **[Test build #130383 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130383/testReport)**
 for PR 30162 at commit 
[`92b0bee`](https://github.com/apache/spark/commit/92b0beefcce229acaa93e3d76b2b7ffa52ae0369).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 640 matches

Mail list logo