[GitHub] [spark] xuanyuanking edited a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-14 Thread GitBox


xuanyuanking edited a comment on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-643916110


   cc @maropu @gatorsmile @HeartSaVioR @dongjoon-hyun 
   
   A new regression bug SPARK-31990 was found when investigating the test 
failure https://github.com/apache/spark/pull/28707#issuecomment-639861273. The 
root cause is that [this 
line](https://github.com/apache/spark/pull/28062/files#diff-7a46f10c3cedbf013cf255564d9483cdL2458)
 in SPARK-31292 made the order of groupCols in Deduplicate changed, and the 
order changing will break the validation logic here. That is to say, if we 
don't have this PR, the executor JVM could probably crash, throw a random 
exception, or even return a wrong answer when using the checkpoint written by 
the previous version.
   
   So we have 2 related work of this PR:
   
   - [ ]**[Block]** Fix and merge the compatibility issue in #28830
   - [ ][Follow-up] Add new test(or modify the current Kafka test) in #28725
   
   --
   ### More detailed analysis:
   The expected order of `Deduplicate.groupCols` in UT 
KafkaMicroBatchV2SourceSuite is
   ```
   [timestamp, partition, timestampType, key, offset, topic, value]
   ```
   Which is also the order in the checkpoint written by the version before 
Spark 3.0
   After the changes in SPARK-31292, the groupCols changed to
   ```
   [key, value, topic, partition, offset, timestamp, timestampType]
   ```
   
    Why this incompatibility bug didn't fail the 
`KafkaMicroBatchV2SourceSuite` when it merged?
   
   Because the UT `default config of includeHeader doesn't break the existing 
query from Spark 2.4` didn't test the scenario of duplicating and check the 
answer.
   Although the UT uses the checkpoint written by version 2.4.3 and streaming 
duplicate operation, it just wants to prove that the new header(added in 
SPARK-23539) doesn't break the original checkpoint file. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28619: [SPARK-21040][CORE] Speculate tasks which are running on decommission executors

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28619:
URL: https://github.com/apache/spark/pull/28619#issuecomment-643916951







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-14 Thread GitBox


xuanyuanking commented on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-643916855


   ```
   How we plan to consolidate both? How we will write JIRA title/description 
and PR title/description? Which is the type of the consolidated issue? Is the 
consolidated issue a blocker?
   ```
   Here's my plan to consolidate both: 
https://github.com/apache/spark/pull/28707#issuecomment-643916110, this will 
also comment in JIRA & PR description.
   Yes, #28707 is blocking by this fix.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28829: [WIP][SQL] Benchmark the EXCEPTION rebase mode

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28829:
URL: https://github.com/apache/spark/pull/28829#issuecomment-643916877







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28829: [WIP][SQL] Benchmark the EXCEPTION rebase mode

2020-06-14 Thread GitBox


SparkQA commented on pull request #28829:
URL: https://github.com/apache/spark/pull/28829#issuecomment-643916564


   **[Test build #124033 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124033/testReport)**
 for PR 28829 at commit 
[`16e90be`](https://github.com/apache/spark/commit/16e90bebf9314105d20c581a07120adb6d288e0b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28829: [WIP][SQL] Benchmark the EXCEPTION rebase mode

2020-06-14 Thread GitBox


AmplabJenkins commented on pull request #28829:
URL: https://github.com/apache/spark/pull/28829#issuecomment-643916882


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/28652/
   Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #17953: [SPARK-20680][SQL] Spark-sql do not support for void column datatype …

2020-06-14 Thread GitBox


HyukjinKwon commented on pull request #17953:
URL: https://github.com/apache/spark/pull/17953#issuecomment-643916503


   Yeah .. I personally support this change FWIW.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28619: [SPARK-21040][CORE] Speculate tasks which are running on decommission executors

2020-06-14 Thread GitBox


AmplabJenkins commented on pull request #28619:
URL: https://github.com/apache/spark/pull/28619#issuecomment-643916951







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28619: [SPARK-21040][CORE] Speculate tasks which are running on decommission executors

2020-06-14 Thread GitBox


SparkQA commented on pull request #28619:
URL: https://github.com/apache/spark/pull/28619#issuecomment-643916615


   **[Test build #124034 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124034/testReport)**
 for PR 28619 at commit 
[`4affa58`](https://github.com/apache/spark/commit/4affa58f95f893ef6de1c1bf1c6b731468a2519d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #27805: [SPARK-31056][SQL] Add CalendarIntervals division

2020-06-14 Thread GitBox


HyukjinKwon commented on pull request #27805:
URL: https://github.com/apache/spark/pull/27805#issuecomment-643915859


   Do we have an answer to 
https://github.com/apache/spark/pull/27805#issuecomment-635381702? It's easier 
to justify with actual references and/or standard.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-14 Thread GitBox


xuanyuanking commented on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-643916110


   A new regression bug SPARK-31990 was found when investigating the test 
failure https://github.com/apache/spark/pull/28707#issuecomment-639861273. The 
root cause is that [this 
line](https://github.com/apache/spark/pull/28062/files#diff-7a46f10c3cedbf013cf255564d9483cdL2458)
 in SPARK-31292 made the order of groupCols in Deduplicate changed, and the 
order changing will break the validation logic here. That is to say, if we 
don't have this PR, the executor JVM could probably crash, throw a random 
exception, or even return a wrong answer when using the checkpoint written by 
the previous version.
   
   So we have 2 related work of this PR:
   
   - [ ] Fix and merge the compatibility issue in #28830
   - [ ] Add new test(or modify the current Kafka test) in #28725
   
   --
   ### More detailed analysis:
   The expected order of `Deduplicate.groupCols` in UT 
KafkaMicroBatchV2SourceSuite is
   ```
   [timestamp, partition, timestampType, key, offset, topic, value]
   ```
   After the changes in SPARK-31292, the groupCols changed to
   ```
   [key, value, topic, partition, offset, timestamp, timestampType]
   ```
   
    Why this incompatibility bug didn't fail the 
`KafkaMicroBatchV2SourceSuite` when it merged?
   
   Because the UT `default config of includeHeader doesn't break the existing 
query from Spark 2.4` didn't test the scenario of duplicating and check the 
answer.
   Although the UT uses the checkpoint written by version 2.4.3 and streaming 
duplicate operation, it just wants to prove that the new header(added in 
SPARK-23539) doesn't break the original checkpoint file. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #28829: [WIP][SQL] Benchmark the EXCEPTION rebase mode

2020-06-14 Thread GitBox


MaxGekk commented on pull request #28829:
URL: https://github.com/apache/spark/pull/28829#issuecomment-643915417


   jenkins, retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #28619: [SPARK-21040][CORE] Speculate tasks which are running on decommission executors

2020-06-14 Thread GitBox


Ngone51 commented on pull request #28619:
URL: https://github.com/apache/spark/pull/28619#issuecomment-643915676


   retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28642:
URL: https://github.com/apache/spark/pull/28642#issuecomment-643914834







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

2020-06-14 Thread GitBox


HyukjinKwon commented on a change in pull request #28642:
URL: https://github.com/apache/spark/pull/28642#discussion_r439940687



##
File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
##
@@ -1039,7 +1039,7 @@ class JoinSuite extends QueryTest with SharedSparkSession 
with AdaptiveSparkPlan
 val pythonEvals = collect(joinNode.get) {
   case p: BatchEvalPythonExec => p
 }
-assert(pythonEvals.size == 2)
+assert(pythonEvals.size == 4)

Review comment:
   Yeah, I don't think it's more efficient to have `BatchEvalPythonExec` 
more. It will require more Python executions which aren't trivial.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

2020-06-14 Thread GitBox


AmplabJenkins commented on pull request #28642:
URL: https://github.com/apache/spark/pull/28642#issuecomment-643914834







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

2020-06-14 Thread GitBox


SparkQA commented on pull request #28642:
URL: https://github.com/apache/spark/pull/28642#issuecomment-643914470


   **[Test build #124032 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124032/testReport)**
 for PR 28642 at commit 
[`65cd324`](https://github.com/apache/spark/commit/65cd324093fac15357fb0ca9bae7c524b40c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

2020-06-14 Thread GitBox


HyukjinKwon commented on pull request #28642:
URL: https://github.com/apache/spark/pull/28642#issuecomment-643913716


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #28801: [SPARK-31970][CORE] Make MDC configuration step be consistent between setLocalProperty and log4j.properties

2020-06-14 Thread GitBox


Ngone51 commented on pull request #28801:
URL: https://github.com/apache/spark/pull/28801#issuecomment-643912320


   thanks all!!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27604: [SPARK-30849][CORE][SHUFFLE]Fix application failed due to failed to get MapStatuses broadcast block

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #27604:
URL: https://github.com/apache/spark/pull/27604#issuecomment-643909975


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124026/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27604: [SPARK-30849][CORE][SHUFFLE]Fix application failed due to failed to get MapStatuses broadcast block

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #27604:
URL: https://github.com/apache/spark/pull/27604#issuecomment-643909967


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27604: [SPARK-30849][CORE][SHUFFLE]Fix application failed due to failed to get MapStatuses broadcast block

2020-06-14 Thread GitBox


AmplabJenkins commented on pull request #27604:
URL: https://github.com/apache/spark/pull/27604#issuecomment-643909975


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124026/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27604: [SPARK-30849][CORE][SHUFFLE]Fix application failed due to failed to get MapStatuses broadcast block

2020-06-14 Thread GitBox


SparkQA removed a comment on pull request #27604:
URL: https://github.com/apache/spark/pull/27604#issuecomment-643877230


   **[Test build #124026 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124026/testReport)**
 for PR 27604 at commit 
[`2e11d1b`](https://github.com/apache/spark/commit/2e11d1bedf15b59c89b1f686ea716a575802f1e6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27604: [SPARK-30849][CORE][SHUFFLE]Fix application failed due to failed to get MapStatuses broadcast block

2020-06-14 Thread GitBox


SparkQA commented on pull request #27604:
URL: https://github.com/apache/spark/pull/27604#issuecomment-643909627


   **[Test build #124026 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124026/testReport)**
 for PR 27604 at commit 
[`2e11d1b`](https://github.com/apache/spark/commit/2e11d1bedf15b59c89b1f686ea716a575802f1e6).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28828: [SPARK-24634][SS][FOLLOWUP] Rename the variable from "numLateInputs" to "numDropppedRowsByWatermark"

2020-06-14 Thread GitBox


HyukjinKwon commented on pull request #28828:
URL: https://github.com/apache/spark/pull/28828#issuecomment-643906549


   @xuanyuanking too FYI



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28821: [SPARK-31981][SQL] Keep TimestampType when taking an average of a Timestamp

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28821:
URL: https://github.com/apache/spark/pull/28821#issuecomment-643904439


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124024/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28821: [SPARK-31981][SQL] Keep TimestampType when taking an average of a Timestamp

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28821:
URL: https://github.com/apache/spark/pull/28821#issuecomment-643904434


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28821: [SPARK-31981][SQL] Keep TimestampType when taking an average of a Timestamp

2020-06-14 Thread GitBox


SparkQA removed a comment on pull request #28821:
URL: https://github.com/apache/spark/pull/28821#issuecomment-643865812


   **[Test build #124024 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124024/testReport)**
 for PR 28821 at commit 
[`707b0cf`](https://github.com/apache/spark/commit/707b0cf949e2532429bdc62d7ef219fe98a0751e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28821: [SPARK-31981][SQL] Keep TimestampType when taking an average of a Timestamp

2020-06-14 Thread GitBox


AmplabJenkins commented on pull request #28821:
URL: https://github.com/apache/spark/pull/28821#issuecomment-643904434







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28821: [SPARK-31981][SQL] Keep TimestampType when taking an average of a Timestamp

2020-06-14 Thread GitBox


SparkQA commented on pull request #28821:
URL: https://github.com/apache/spark/pull/28821#issuecomment-643904220


   **[Test build #124024 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124024/testReport)**
 for PR 28821 at commit 
[`707b0cf`](https://github.com/apache/spark/commit/707b0cf949e2532429bdc62d7ef219fe98a0751e).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28807: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

2020-06-14 Thread GitBox


AmplabJenkins commented on pull request #28807:
URL: https://github.com/apache/spark/pull/28807#issuecomment-643899506







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28807: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28807:
URL: https://github.com/apache/spark/pull/28807#issuecomment-643899506







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28807: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

2020-06-14 Thread GitBox


maropu commented on a change in pull request #28807:
URL: https://github.com/apache/spark/pull/28807#discussion_r439927771



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
##
@@ -388,12 +396,24 @@ class TableIdentifierParserSuite extends SparkFunSuite 
with SQLHelper {
   val reservedKeywordsInAnsiMode = allCandidateKeywords -- 
nonReservedKeywordsInAnsiMode
 
   test("check # of reserved keywords") {
-val numReservedKeywords = 78
+val numReservedKeywords = 74

Review comment:
   Note: `ANTI`, `SEMI`, `MINUS`, and `!` are removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28807: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

2020-06-14 Thread GitBox


SparkQA commented on pull request #28807:
URL: https://github.com/apache/spark/pull/28807#issuecomment-643899210


   **[Test build #124031 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124031/testReport)**
 for PR 28807 at commit 
[`eeceb30`](https://github.com/apache/spark/commit/eeceb30e050c26acdb93372eef0ce14410bd0159).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-643897872







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-14 Thread GitBox


AmplabJenkins commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-643897872







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-14 Thread GitBox


SparkQA commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-643897635


   **[Test build #124030 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124030/testReport)**
 for PR 28710 at commit 
[`2e6f35c`](https://github.com/apache/spark/commit/2e6f35c8e31fe1cde1637b922673339bfeef65fe).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-14 Thread GitBox


huaxingao commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-643896578


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-643892810







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-14 Thread GitBox


AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-643892810







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-14 Thread GitBox


SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-643892530


   **[Test build #124029 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124029/testReport)**
 for PR 28593 at commit 
[`8fe1960`](https://github.com/apache/spark/commit/8fe1960ef3a0c598a626b7024820b74cec787642).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #24922: [SPARK-28120][SS] Rocksdb state storage implementation

2020-06-14 Thread GitBox


dongjoon-hyun commented on pull request #24922:
URL: https://github.com/apache/spark/pull/24922#issuecomment-643892244


   Thank you for the update, @itsvikramagr .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-643891541


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124021/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-643891538


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-14 Thread GitBox


SparkQA removed a comment on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-643855623


   **[Test build #124021 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124021/testReport)**
 for PR 28710 at commit 
[`2e6f35c`](https://github.com/apache/spark/commit/2e6f35c8e31fe1cde1637b922673339bfeef65fe).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-14 Thread GitBox


SparkQA commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-643891334


   **[Test build #124021 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124021/testReport)**
 for PR 28710 at commit 
[`2e6f35c`](https://github.com/apache/spark/commit/2e6f35c8e31fe1cde1637b922673339bfeef65fe).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-14 Thread GitBox


HeartSaVioR edited a comment on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-64318


   How we plan to consolidate both? How we will write JIRA title/description 
and PR title/description? Which is the type of the consolidated issue? Is the 
consolidated issue a blocker?
   
   Things would be simpler if we merge the partial revert as it is, and spend 
our efforts to discuss how to guide known issues - this is one of candidates 
for Spark 3.0.0. This is clearly a bugfix which is a "blocker" preventing some 
of end users migrate to Spark 3.0.0, worth to have its own JIRA issue, and also 
commit. Sure, this may need to be placed on migration guide or release note as 
well.
   
   It'd be no harm for #28707 to wait for this patch to be merged, and rebase 
to fix the test failure.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] GuoPhilipse commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-14 Thread GitBox


GuoPhilipse commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-643890774


   it is generated by set command,now we have removed it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-14 Thread GitBox


HeartSaVioR edited a comment on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-643878976


   I’m sorry, but version 4 doesn’t leverage UnsafeRow. (version 2 was.) Please 
read the description thoughtfully.
   
   As I commented earlier there’re still lots of possible improvements in 
metadata, but I don’t want to go through unless we promise dedicated efforts on 
reviewing. This is low hanging fruit which brings massive improvement.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-14 Thread GitBox


HeartSaVioR edited a comment on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-64318


   How we plan to consolidate both? How we will write JIRA title/description 
and PR title/description? Which is the type of the consolidated issue? Is the 
consolidated issue a blocker?
   
   Things would be simpler if we merge the partial fix as it is, and spend our 
efforts to discuss how to guide known issues - this is one of candidates for 
Spark 3.0.0. This is clearly a bugfix which is a "blocker" preventing some of 
end users migrate to Spark 3.0.0. Sure, this may need to be placed on migration 
guide or release note as well.
   
   It'd be no harm for #28707 to wait for this patch to be merged, and rebase 
to fix the test failure.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-14 Thread GitBox


HeartSaVioR edited a comment on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-64318


   How we plan to consolidate both? How we will write JIRA title/description 
and PR title/description? Which is the type of the consolidated issue? Is the 
consolidated issue a blocker?
   
   Things would be simpler if we merge the partial fix as it is, and spend our 
efforts to discuss how to guide known issues - this is one of candidates for 
Spark 3.0.0. This is clearly a bugfix which is a "blocker" preventing some of 
end users migrate to Spark 3.0.0, worth to have its own JIRA issue, and also 
commit. Sure, this may need to be placed on migration guide or release note as 
well.
   
   It'd be no harm for #28707 to wait for this patch to be merged, and rebase 
to fix the test failure.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-14 Thread GitBox


HeartSaVioR edited a comment on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-64318


   How we plan to consolidate both? How we will write JIRA title/description 
and PR title/description? Which is the type of the consolidated issue? Is the 
consolidated issue a blocker?
   
   Things would be simpler if we merge the partial fix as it is, and spend our 
efforts to discuss how to guide known issue - this is one of candidates for 
Spark 3.0.0. This is clearly a bugfix which is a "blocker" preventing some of 
end users migrate to Spark 3.0.0. Sure, this may need to be placed on migration 
guide or release note as well.
   
   It'd be no harm for #28707 to wait for this patch to be merged, and rebase 
to fix the test failure.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-14 Thread GitBox


HeartSaVioR edited a comment on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-64318


   How we plan to consolidate both? How we will write JIRA title/description 
and PR title/description? Which is the type of the consolidated issue? Is the 
consolidated issue a blocker?
   
   Things would be simpler if we merge the partial fix as it is, and spend our 
efforts to discuss how to guide known issue - this is one of candidates for 
Spark 3.0.0. This is clearly a bugfix which is a "blocker" preventing some of 
end users migrate to Spark 3.0.0.
   
   It'd be no harm for #28707 to wait for this patch to be merged, and rebase 
to fix the test failure.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-14 Thread GitBox


HeartSaVioR commented on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-64318


   How we plan to consolidate both? How we will write JIRA title/description 
and PR title/description? Which is the type of the consolidated issue? Is the 
consolidated issue a blocker?
   
   Things would be simple if we merge the partial fix as it is, and spend our 
efforts to discuss how to guide known issue - this is one of candidates for 
Spark 3.0.0. This is clearly a bugfix which is a "blocker" preventing some of 
end users migrate to Spark 3.0.0.
   
   It'd be no harm for #28707 to wait for this patch to be merged, and rebase 
to fix the test failure.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-06-14 Thread GitBox


AmplabJenkins commented on pull request #27690:
URL: https://github.com/apache/spark/pull/27690#issuecomment-643887374







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #27690:
URL: https://github.com/apache/spark/pull/27690#issuecomment-643887374







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-06-14 Thread GitBox


moomindani commented on a change in pull request #27690:
URL: https://github.com/apache/spark/pull/27690#discussion_r439917190



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
##
@@ -124,11 +153,24 @@ private[hive] trait SaveAsHiveFile extends 
DataWritingCommand {
 val hiveVersion = 
externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client.version
 val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
 val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
+logDebug(s"path '${path.toString}', staging dir '$stagingDir', " +
+  s"scratch dir '$scratchDir' are used")
 
 if (hiveVersionsUsingOldExternalTempPath.contains(hiveVersion)) {
   oldVersionExternalTempPath(path, hadoopConf, scratchDir)
 } else if (hiveVersionsUsingNewExternalTempPath.contains(hiveVersion)) {

Review comment:
   Got it. I added the description "This option is supported in Hive 2.0 or 
later." in SQLConf.scala.
   
https://github.com/apache/spark/pull/27690/files#diff-9a6b543db706f1a90f790783d6930a13R849





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-06-14 Thread GitBox


moomindani commented on a change in pull request #27690:
URL: https://github.com/apache/spark/pull/27690#discussion_r439917190



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
##
@@ -124,11 +153,24 @@ private[hive] trait SaveAsHiveFile extends 
DataWritingCommand {
 val hiveVersion = 
externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client.version
 val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
 val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
+logDebug(s"path '${path.toString}', staging dir '$stagingDir', " +
+  s"scratch dir '$scratchDir' are used")
 
 if (hiveVersionsUsingOldExternalTempPath.contains(hiveVersion)) {
   oldVersionExternalTempPath(path, hadoopConf, scratchDir)
 } else if (hiveVersionsUsingNewExternalTempPath.contains(hiveVersion)) {

Review comment:
   Got it. I added the descroption "This option is supported in Hive 2.0 or 
later." in SQLConf.scala.
   
https://github.com/apache/spark/pull/27690/files#diff-9a6b543db706f1a90f790783d6930a13R849





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-06-14 Thread GitBox


SparkQA commented on pull request #27690:
URL: https://github.com/apache/spark/pull/27690#issuecomment-643887119


   **[Test build #124028 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124028/testReport)**
 for PR 27690 at commit 
[`0fbeaf3`](https://github.com/apache/spark/commit/0fbeaf374bf35a7d0cde2b3340d9f3c4551cbdb2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28786: [SPARK-31925][ML] Summary.totalIterations greater than maxIters

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28786:
URL: https://github.com/apache/spark/pull/28786#issuecomment-643885908







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28786: [SPARK-31925][ML] Summary.totalIterations greater than maxIters

2020-06-14 Thread GitBox


SparkQA removed a comment on pull request #28786:
URL: https://github.com/apache/spark/pull/28786#issuecomment-643867351


   **[Test build #124025 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124025/testReport)**
 for PR 28786 at commit 
[`4c4d52b`](https://github.com/apache/spark/commit/4c4d52b91e1ebbd018835c3bb2cd565df79bd430).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28786: [SPARK-31925][ML] Summary.totalIterations greater than maxIters

2020-06-14 Thread GitBox


AmplabJenkins commented on pull request #28786:
URL: https://github.com/apache/spark/pull/28786#issuecomment-643885908







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28786: [SPARK-31925][ML] Summary.totalIterations greater than maxIters

2020-06-14 Thread GitBox


SparkQA commented on pull request #28786:
URL: https://github.com/apache/spark/pull/28786#issuecomment-643885633


   **[Test build #124025 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124025/testReport)**
 for PR 28786 at commit 
[`4c4d52b`](https://github.com/apache/spark/commit/4c4d52b91e1ebbd018835c3bb2cd565df79bd430).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-14 Thread GitBox


maropu commented on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-643885408


   > Thanks for the quick fix @maropu! I think maybe we can simplify the bugfix 
by combining it together with #28707. WDYT? I'll also reference this PR with 
#28707.
   
   @xuanyuanking yea, looks fine to me. Could you takes this over? Thanks, 
anyway!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-06-14 Thread GitBox


moomindani commented on a change in pull request #27690:
URL: https://github.com/apache/spark/pull/27690#discussion_r439913882



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -839,6 +839,17 @@ object SQLConf {
 .checkValues(HiveCaseSensitiveInferenceMode.values.map(_.toString))
 .createWithDefault(HiveCaseSensitiveInferenceMode.NEVER_INFER.toString)
 
+  val HIVE_SUPPORTED_SCHEMES_TO_USE_NONBLOBSTORE =
+buildConf("spark.sql.hive.supportedSchemesToUseNonBlobstore")
+  .doc("Comma-separated list of supported blobstore schemes (e.g. 
's3,s3a'). " +
+"If any blobstore schemes are specified, this feature is enabled. " +
+"When writing data out to a Hive table, " +
+"Spark writes the data first into non blobstore storage, and then 
moves it to blobstore. " +
+"By default, this option is set to empty. It means this feature is 
disabled.")
+  .version("3.1.0")
+  .stringConf
+  .createWithDefault("")

Review comment:
   Note: I am not 100% sure whether all these blob storage systems have 
similar characteristics and not sure if this option is effective. At least, 
this option is effective for Amazon S3.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-06-14 Thread GitBox


moomindani commented on a change in pull request #27690:
URL: https://github.com/apache/spark/pull/27690#discussion_r439913383



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -839,6 +839,17 @@ object SQLConf {
 .checkValues(HiveCaseSensitiveInferenceMode.values.map(_.toString))
 .createWithDefault(HiveCaseSensitiveInferenceMode.NEVER_INFER.toString)
 
+  val HIVE_SUPPORTED_SCHEMES_TO_USE_NONBLOBSTORE =
+buildConf("spark.sql.hive.supportedSchemesToUseNonBlobstore")
+  .doc("Comma-separated list of supported blobstore schemes (e.g. 
's3,s3a'). " +
+"If any blobstore schemes are specified, this feature is enabled. " +
+"When writing data out to a Hive table, " +
+"Spark writes the data first into non blobstore storage, and then 
moves it to blobstore. " +
+"By default, this option is set to empty. It means this feature is 
disabled.")
+  .version("3.1.0")
+  .stringConf
+  .createWithDefault("")

Review comment:
   Users can specify any blob storage schema like following. If copy 
operation is expensive in the storage system, this option will be effective.
   - Amazon S3: `s3`, `s3a`, `s3n`
   - Azure Blob Storage: `wasb`, `wasbs`
   - Google Cloud Storage: `gs`
   - Databricks: `dbfs`
   - OpenStack: `swift`
   
   Since any schemes are possible to be used, I believe we cannot define 
specific supported schemes here. That's why I just listed samples in 
SQLConf.scala.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-14 Thread GitBox


AmplabJenkins commented on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-643882434







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-643882434







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-14 Thread GitBox


SparkQA commented on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-643882321


   **[Test build #124027 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124027/testReport)**
 for PR 28830 at commit 
[`7546ba4`](https://github.com/apache/spark/commit/7546ba4eebeee480d9a2ff8b948e900cd6023dfc).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28830: [SPARK-31990][SS] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


HeartSaVioR edited a comment on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-643880332


   +1 to partial revert which should be also OK with author. (I guess it was 
applied simply by pattern, and it wasn’t for some intended improvement, so no 
problem for author as well.)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28830: [SPARK-31990][SS] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


HeartSaVioR commented on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-643880332


   +1 to partial revert which should be also OK with author. (I guess it was 
applied simply by pattern, and it wasn’t for some outstanding improvement, so 
no problem for author as well.)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on pull request #28830: [SPARK-31990][SS] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


xuanyuanking commented on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-643880347


   Yep, I think just revert that part is good enough. I will give more context 
and details on #28707. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28830: [SPARK-31990][SS] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


dongjoon-hyun commented on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-643880429


   Ya. +1 for partial revert in this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on a change in pull request #28830: [SPARK-31990][SS] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


xuanyuanking commented on a change in pull request #28830:
URL: https://github.com/apache/spark/pull/28830#discussion_r439910372



##
File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
##
@@ -2548,6 +2548,21 @@ class DataFrameSuite extends QueryTest
   assert(df.schema === new StructType().add(StructField("d", 
DecimalType(38, 0
 }
   }
+
+  test("SPARK-31990: preserves the input order of colNames in dropDuplicates") 
{
+val df = Seq((1, 2, 3, 4, 5), (1, 2, 3, 4, 5)).toDF("c", "e", "d", "a", 
"b")
+val inputColNames = Seq("c", "b", "c", "d", "b", "c", "b")

Review comment:
   Thanks for adding a new UT here. Since this issue was found when 
investigating the test failure in 
https://github.com/apache/spark/pull/28707#issuecomment-639861273,
   how about reusing the UT `default config of includeHeader doesn't break 
existing query from Spark 2.4` in `KafkaMicroBatchV2SourceSuite`? I think we 
don't need to add a new UT for this regression after #28707. That is to say 
after #28707 is merged, if we don't do the fix, the mentioned UT will fail.

##
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##
@@ -2541,7 +2542,20 @@ class Dataset[T] private[sql](
   def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
 val resolver = sparkSession.sessionState.analyzer.resolver
 val allColumns = queryExecution.analyzed.output
-val groupCols = colNames.distinct.flatMap { (colName: String) =>
+// SPARK-31990: We must preserve the input order of `colNames` because of 
the compatibility
+// issue (the Streaming's state store depends on the `groupCols` order).
+val orderPreservingDistinctColNames = {
+  val nameSeen = mutable.Set[String]()

Review comment:
   How about simply revert this line to 
https://github.com/apache/spark/pull/28062/files#diff-7a46f10c3cedbf013cf255564d9483cdL2458,
 use the original implementation of `toSet`.
   Yes, the `toSet.toSeq` might incompatible during to Scala version, but I 
think the current fix should just keep the original order. How to detect the 
order changing and have solid validation should be the work of SPARK-31894 and 
SPARK-27237. WDYT?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-14 Thread GitBox


HeartSaVioR edited a comment on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-643878976


   I’m sorry, but version 4 doesn’t leverage UnsafeRow. (version 2 was.) Please 
read the description thoughtfully.
   
   As I commented earlier there’re still lots of possible improvements in 
metadata, but I don’t want to go through unless we promise dedicate efforts on 
reviewing. This is low hanging fruit which brings massive improvement.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on pull request #28830: [SPARK-31990][SS] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


gatorsmile commented on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-643879059


   Yes. I prefer to reverting the original fix in 3.0.1. and then discuss how 
to solve/avoid the problems in a proper way. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #28830: [SPARK-31990][SS] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


maropu commented on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-643879180


   okay, I'll revert that part in this PR first.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-14 Thread GitBox


HeartSaVioR commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-643878976


   I’m sorry, but version 4 doesn’t leverage UnsafeRow. (version 2 was.) Please 
read the description thoughtfully.
   
   As I commented earlier there’re still lots of possible improvements in 
metadata, but I don’t want to go through unless we promise dedicate efforts on 
reviewing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28830: [SPARK-31990][SS] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-643878606


   Hi, All.
   This issue is marked as a hotfix for the blocker issue, but the validation 
of this issue looks non-trivial. Since `toSet.toSeq` is used since Apache Spark 
2.2.0 (SPARK-19497) and SPARK-31292 is just an `Improvement` issue with 
`Trivial` priority. I'd like to propose to revert SPARK-31292 from `branch-3.0` 
first. We will keep SPARK-31292 in `master` branch still and proceed this 
@maropu 's PR to find a better way for Apache Spark 3.1.0.
   
   I know that the reverting is not a good solution for the original author as 
mentioned by @HeartSaVioR in the dev mailing list, but I believe that is the 
proper way in this case to cut Apache Spark 3.0.1. How do you think about that?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28830: [SPARK-31990][SS] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-643878606







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28830: [SPARK-31990][SS] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


dongjoon-hyun commented on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-643878606


   Hi, All.
   This issue is marked as a hotfix for the blocker issue, but the validation 
of this issue looks non-trivial. Since `toSet.toSeq` is used since Apache Spark 
2.2.0 (SPARK-19497) and SPARK-31292 is just an `Improvement` with `Trivial` 
issue. I'd like to propose to revert SPARK-31292 from `branch-3.0` first? We 
will keep SPARK-31292 in `master` branch still and proceed this PR to find a 
better way for Apache Spark 3.1.0.
   
   I know that the reverting is not a good solution for the original author as 
mentioned by @HeartSaVioR in the dev mailing list, but I believe that is the 
proper way in this case to cut Apache Spark 3.0.1. How do you think about that?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27604: [SPARK-30849][CORE][SHUFFLE]Fix application failed due to failed to get MapStatuses broadcast block

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #27604:
URL: https://github.com/apache/spark/pull/27604#issuecomment-643877494







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27604: [SPARK-30849][CORE][SHUFFLE]Fix application failed due to failed to get MapStatuses broadcast block

2020-06-14 Thread GitBox


AmplabJenkins commented on pull request #27604:
URL: https://github.com/apache/spark/pull/27604#issuecomment-643877494







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] uncleGen commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-14 Thread GitBox


uncleGen commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-643877261


   @HeartSaVioR Thanks for your efforts. The result (version 4) is very 
impressive. Overall, it makes sense to me. But we should resolve the concern 
about using `UnsafeRow`. I am not very familiar with the history of discussing 
about `UnsafeRow`. By the way, is there any value or plan to use this 
[idea](https://github.com/apache/spark/pull/24128#issuecomment-558548047)? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27604: [SPARK-30849][CORE][SHUFFLE]Fix application failed due to failed to get MapStatuses broadcast block

2020-06-14 Thread GitBox


SparkQA commented on pull request #27604:
URL: https://github.com/apache/spark/pull/27604#issuecomment-643877230


   **[Test build #124026 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124026/testReport)**
 for PR 27604 at commit 
[`2e11d1b`](https://github.com/apache/spark/commit/2e11d1bedf15b59c89b1f686ea716a575802f1e6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] iRakson commented on pull request #26901: [SPARK-29152][CORE][2.4] Executor Plugin shutdown when dynamic allocation is enabled

2020-06-14 Thread GitBox


iRakson commented on pull request #26901:
URL: https://github.com/apache/spark/pull/26901#issuecomment-643876875


   @dongjoon-hyun Its behaviour is pretty confusing. But yeah, if this is 
breaking branch again then we should not keep it. Yes, this patch failed twice 
so we must move on.
   
   Thank you for actively monitoring this patch. :) :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28830: [SPARK-31990][SS] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


maropu commented on a change in pull request #28830:
URL: https://github.com/apache/spark/pull/28830#discussion_r439907906



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##
@@ -2541,7 +2542,20 @@ class Dataset[T] private[sql](
   def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
 val resolver = sparkSession.sessionState.analyzer.resolver
 val allColumns = queryExecution.analyzed.output
-val groupCols = colNames.distinct.flatMap { (colName: String) =>
+// SPARK-31990: We must preserve the input order of `colNames` because of 
the compatibility
+// issue (the Streaming's state store depends on the `groupCols` order).
+val orderPreservingDistinctColNames = {
+  val nameSeen = mutable.Set[String]()

Review comment:
   Ah, I see. Yea, I'll update the code based on `toSeq.toSeq`.

##
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##
@@ -2541,7 +2542,20 @@ class Dataset[T] private[sql](
   def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
 val resolver = sparkSession.sessionState.analyzer.resolver
 val allColumns = queryExecution.analyzed.output
-val groupCols = colNames.distinct.flatMap { (colName: String) =>
+// SPARK-31990: We must preserve the input order of `colNames` because of 
the compatibility
+// issue (the Streaming's state store depends on the `groupCols` order).
+val orderPreservingDistinctColNames = {
+  val nameSeen = mutable.Set[String]()

Review comment:
   Ah, I see. Yea, I'll update the code based on `toSet.toSeq`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28830: [SPARK-31990][SS] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #28830:
URL: https://github.com/apache/spark/pull/28830#discussion_r439907916



##
File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
##
@@ -2548,6 +2548,21 @@ class DataFrameSuite extends QueryTest
   assert(df.schema === new StructType().add(StructField("d", 
DecimalType(38, 0
 }
   }
+
+  test("SPARK-31990: preserves the input order of colNames in dropDuplicates") 
{
+val df = Seq((1, 2, 3, 4, 5), (1, 2, 3, 4, 5)).toDF("c", "e", "d", "a", 
"b")
+val inputColNames = Seq("c", "b", "c", "d", "b", "c", "b")

Review comment:
   BTW, @HeartSaVioR . Is there a test case failure using the same Spark 
version checkpointing? I'm curious if this only occurs between different Spark 
versions.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28830: [SPARK-31990][SS] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #28830:
URL: https://github.com/apache/spark/pull/28830#discussion_r439907052



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##
@@ -2541,7 +2542,20 @@ class Dataset[T] private[sql](
   def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
 val resolver = sparkSession.sessionState.analyzer.resolver
 val allColumns = queryExecution.analyzed.output
-val groupCols = colNames.distinct.flatMap { (colName: String) =>
+// SPARK-31990: We must preserve the input order of `colNames` because of 
the compatibility
+// issue (the Streaming's state store depends on the `groupCols` order).
+val orderPreservingDistinctColNames = {
+  val nameSeen = mutable.Set[String]()

Review comment:
   The reported issue claims that Scala `distinct` function was not enough. 
That's the reason why I asked that `Is there a change?` to fix Spark issue.
   
   As @HeartSaVioR 's commented 
(https://github.com/apache/spark/pull/28830#discussion_r439904302), we need a 
different code.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28830: [SPARK-31990][SS] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #28830:
URL: https://github.com/apache/spark/pull/28830#discussion_r439907052



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##
@@ -2541,7 +2542,20 @@ class Dataset[T] private[sql](
   def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
 val resolver = sparkSession.sessionState.analyzer.resolver
 val allColumns = queryExecution.analyzed.output
-val groupCols = colNames.distinct.flatMap { (colName: String) =>
+// SPARK-31990: We must preserve the input order of `colNames` because of 
the compatibility
+// issue (the Streaming's state store depends on the `groupCols` order).
+val orderPreservingDistinctColNames = {
+  val nameSeen = mutable.Set[String]()

Review comment:
   The reported issue claims that Scala `distinct` function was not enough. 
That's the reason why I asked that `What is the difference to fix Spark issue`.
   
   As @HeartSaVioR 's commented 
(https://github.com/apache/spark/pull/28830#discussion_r439904302), we need a 
different code.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #28830: [SPARK-31990][SQL][SS] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


HeartSaVioR commented on a change in pull request #28830:
URL: https://github.com/apache/spark/pull/28830#discussion_r439905543



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##
@@ -2541,7 +2542,20 @@ class Dataset[T] private[sql](
   def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
 val resolver = sparkSession.sessionState.analyzer.resolver
 val allColumns = queryExecution.analyzed.output
-val groupCols = colNames.distinct.flatMap { (colName: String) =>
+// SPARK-31990: We must preserve the input order of `colNames` because of 
the compatibility
+// issue (the Streaming's state store depends on the `groupCols` order).
+val orderPreservingDistinctColNames = {
+  val nameSeen = mutable.Set[String]()

Review comment:
   Oh I didn't see @maropu 's comment while I'm commenting. ;) Thanks for 
explaining.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default

2020-06-14 Thread GitBox


cloud-fan commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-643873707


   Why are there empty golden files generated in 
`sql/hive/src/test/resources/golden`?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] iRakson commented on pull request #28752: [SPARK-31983] Fix Sorting for duration column and make Status column sortable

2020-06-14 Thread GitBox


iRakson commented on pull request #28752:
URL: https://github.com/apache/spark/pull/28752#issuecomment-643873658


   Thank You. @srowen @sarutak.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] iRakson commented on pull request #28823: [SPARK-31983][WEBUI][3.0] Fix sorting for duration column in structured streaming tab

2020-06-14 Thread GitBox


iRakson commented on pull request #28823:
URL: https://github.com/apache/spark/pull/28823#issuecomment-643873542


   Thank You. @srowen @sarutak :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28807: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

2020-06-14 Thread GitBox


maropu commented on a change in pull request #28807:
URL: https://github.com/apache/spark/pull/28807#discussion_r439905202



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
##
@@ -388,12 +391,24 @@ class TableIdentifierParserSuite extends SparkFunSuite 
with SQLHelper {
   val reservedKeywordsInAnsiMode = allCandidateKeywords -- 
nonReservedKeywordsInAnsiMode
 
   test("check # of reserved keywords") {
-val numReservedKeywords = 78
+val numReservedKeywords = 75
 assert(reservedKeywordsInAnsiMode.size == numReservedKeywords,
   s"The expected number of reserved keywords is $numReservedKeywords, but 
" +
 s"${reservedKeywordsInAnsiMode.size} found.")
   }
 
+  test("should follow reserved keywords in SQL:2016") {
+withTempDir { dir =>
+  val tmpFile = new File(dir, "tmp")
+  val is = Thread.currentThread().getContextClassLoader
+.getResourceAsStream("ansi-sql-2016-reserved-keywords.txt")
+  Files.copy(is, tmpFile.toPath)
+  val reservedKeywordsInSql2016 = Files.readAllLines(tmpFile.toPath)
+.asScala.filterNot(_.startsWith("--")).map(_.trim).toSet
+  assert(((reservedKeywordsInAnsiMode -- Set("!")) -- 
reservedKeywordsInSql2016).isEmpty)

Review comment:
   Yea, will do.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28828: [SPARK-24634][SS][FOLLOWUP] Rename the variable from "numLateInputs" to "numDropppedRowsByWatermark"

2020-06-14 Thread GitBox


AmplabJenkins commented on pull request #28828:
URL: https://github.com/apache/spark/pull/28828#issuecomment-643873268







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28828: [SPARK-24634][SS][FOLLOWUP] Rename the variable from "numLateInputs" to "numDropppedRowsByWatermark"

2020-06-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28828:
URL: https://github.com/apache/spark/pull/28828#issuecomment-643873268







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28807: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

2020-06-14 Thread GitBox


maropu commented on a change in pull request #28807:
URL: https://github.com/apache/spark/pull/28807#discussion_r439905098



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
##
@@ -388,12 +391,24 @@ class TableIdentifierParserSuite extends SparkFunSuite 
with SQLHelper {
   val reservedKeywordsInAnsiMode = allCandidateKeywords -- 
nonReservedKeywordsInAnsiMode
 
   test("check # of reserved keywords") {
-val numReservedKeywords = 78
+val numReservedKeywords = 75
 assert(reservedKeywordsInAnsiMode.size == numReservedKeywords,
   s"The expected number of reserved keywords is $numReservedKeywords, but 
" +
 s"${reservedKeywordsInAnsiMode.size} found.")
   }
 
+  test("should follow reserved keywords in SQL:2016") {

Review comment:
   Looks clearer, okay, I'll update. Thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #28830: [SPARK-31990][SQL] Preserves the input order of colNames in dropDuplicates

2020-06-14 Thread GitBox


HeartSaVioR commented on a change in pull request #28830:
URL: https://github.com/apache/spark/pull/28830#discussion_r439904837



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##
@@ -2541,7 +2542,20 @@ class Dataset[T] private[sql](
   def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
 val resolver = sparkSession.sessionState.analyzer.resolver
 val allColumns = queryExecution.analyzed.output
-val groupCols = colNames.distinct.flatMap { (colName: String) =>
+// SPARK-31990: We must preserve the input order of `colNames` because of 
the compatibility
+// issue (the Streaming's state store depends on the `groupCols` order).
+val orderPreservingDistinctColNames = {
+  val nameSeen = mutable.Set[String]()

Review comment:
   So consider this as manual implementation of distinct so that we don't 
even get affected by Scala changes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28828: [SPARK-24634][SS][FOLLOWUP] Rename the variable from "numLateInputs" to "numDropppedRowsByWatermark"

2020-06-14 Thread GitBox


SparkQA removed a comment on pull request #28828:
URL: https://github.com/apache/spark/pull/28828#issuecomment-643827509


   **[Test build #124015 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124015/testReport)**
 for PR 28828 at commit 
[`ca3b3de`](https://github.com/apache/spark/commit/ca3b3de653a92090db33ca8282eea18b75ff2420).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >