[GitHub] [spark] HeartSaVioR commented on pull request #24173: [SPARK-27237][SS] Introduce State schema validation among query restart
HeartSaVioR commented on pull request #24173: URL: https://github.com/apache/spark/pull/24173#issuecomment-675299208 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #29256: [SPARK-32456][SS] Check the Distinct by assuming it as Aggregate for Structured Streaming
HeartSaVioR commented on a change in pull request #29256: URL: https://github.com/apache/spark/pull/29256#discussion_r471960294 ## File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala ## @@ -1106,6 +1107,54 @@ class StreamingQuerySuite extends StreamTest with BeforeAndAfter with Logging wi } } + test("union in streaming query of append mode without watermark") { +val inputData1 = MemoryStream[Int] +val inputData2 = MemoryStream[Int] +withTempView("s1", "s2") { + inputData1.toDF().createOrReplaceTempView("s1") + inputData2.toDF().createOrReplaceTempView("s2") + val unioned = spark.sql( +"select s1.value from s1 union select s2.value from s2") + checkExceptionMessage(unioned) +} + } + + test("distinct in streaming query of append mode without watermark") { +val inputData = MemoryStream[Int] +withTempView("deduptest") { + inputData.toDF().toDF("value").createOrReplaceTempView("deduptest") + val distinct = spark.sql("select distinct value from deduptest") + checkExceptionMessage(distinct) +} + } + + test("distinct in streaming query of complete mode") { +val inputData = MemoryStream[Int] +withTempView("deduptest") { + inputData.toDF().toDF("value").createOrReplaceTempView("deduptest") + val distinct = spark.sql("select distinct value from deduptest") + + testStream(distinct, Complete)( +AddData(inputData, 1, 2, 3, 3, 4), +CheckAnswer(Row(1), Row(2), Row(3), Row(4)) Review comment: As an alternative I added some note on SS guide doc. #29461 I'm not sure it is enough to let us free to not complained by improper usages, so I just marked the PR as draft. I think it's better to collect the voices on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29461: [DO-NOT-MERGE][SPARK-32456][SS][FOLLOWUP] Update doc to note about using SQL statement with streaming Dataset
AmplabJenkins removed a comment on pull request #29461: URL: https://github.com/apache/spark/pull/29461#issuecomment-675298637 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #25965: [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption
AmplabJenkins removed a comment on pull request #25965: URL: https://github.com/apache/spark/pull/25965#issuecomment-675297890 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127538/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29461: [DO-NOT-MERGE][SPARK-32456][SS][FOLLOWUP] Update doc to note about using SQL statement with streaming Dataset
AmplabJenkins commented on pull request #29461: URL: https://github.com/apache/spark/pull/29461#issuecomment-675298637 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
AmplabJenkins removed a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-675298265 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue
AmplabJenkins removed a comment on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-675298132 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127530/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue
AmplabJenkins removed a comment on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-675298123 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29360: [SPARK-32542][SQL] Add an optimizer rule to split an Expand into multiple Expands for aggregates
AmplabJenkins removed a comment on pull request #29360: URL: https://github.com/apache/spark/pull/29360#issuecomment-675297814 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
AmplabJenkins removed a comment on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-675297409 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127535/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27333: [SPARK-29438][SS][FOLLOWUP] Add regression tests for Streaming Aggregation and flatMapGroupsWithState
AmplabJenkins removed a comment on pull request #27333: URL: https://github.com/apache/spark/pull/27333#issuecomment-675297631 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127536/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29360: [SPARK-32542][SQL] Add an optimizer rule to split an Expand into multiple Expands for aggregates
AmplabJenkins commented on pull request #29360: URL: https://github.com/apache/spark/pull/29360#issuecomment-675297814 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29453: [SPARK-31999][SQL][FOLLOWUP] Adds negative test cases with typos
AmplabJenkins removed a comment on pull request #29453: URL: https://github.com/apache/spark/pull/29453#issuecomment-675297371 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127542/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29457: [SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins removed a comment on pull request #29457: URL: https://github.com/apache/spark/pull/29457#issuecomment-675297236 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127545/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-675297983 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore
AmplabJenkins removed a comment on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-675297424 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127537/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27333: [SPARK-29438][SS][FOLLOWUP] Add regression tests for Streaming Aggregation and flatMapGroupsWithState
AmplabJenkins removed a comment on pull request #27333: URL: https://github.com/apache/spark/pull/27333#issuecomment-675297620 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
AmplabJenkins removed a comment on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-675297618 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127532/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29460: [DO-NOT-MERGE][SPARK-32249][3.0] Run Github Actions builds in other branches as well
AmplabJenkins removed a comment on pull request #29460: URL: https://github.com/apache/spark/pull/29460#issuecomment-675297649 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127543/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29461: [DO-NOT-MERGE][SPARK-32456][SS][FOLLOWUP] Update doc to note about using SQL statement with streaming Dataset
SparkQA commented on pull request #29461: URL: https://github.com/apache/spark/pull/29461#issuecomment-675297860 **[Test build #127546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127546/testReport)** for PR 29461 at commit [`f8d1416`](https://github.com/apache/spark/commit/f8d1416315cdfded655d860281b807e90f84c002). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #25965: [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption
AmplabJenkins removed a comment on pull request #25965: URL: https://github.com/apache/spark/pull/25965#issuecomment-675297882 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-675297983 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #25965: [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption
AmplabJenkins commented on pull request #25965: URL: https://github.com/apache/spark/pull/25965#issuecomment-675297882 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
SparkQA removed a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-675252986 **[Test build #127534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127534/testReport)** for PR 27694 at commit [`2559928`](https://github.com/apache/spark/commit/2559928be2d7981c2c1c2d9b6111c4449e721310). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue
AmplabJenkins commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-675298123 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29453: [SPARK-31999][SQL][FOLLOWUP] Adds negative test cases with typos
SparkQA removed a comment on pull request #29453: URL: https://github.com/apache/spark/pull/29453#issuecomment-675275574 **[Test build #127542 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127542/testReport)** for PR 29453 at commit [`69b45be`](https://github.com/apache/spark/commit/69b45bed5e12064d19c4edbac94c3cdbef63f5ff). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29360: [SPARK-32542][SQL] Add an optimizer rule to split an Expand into multiple Expands for aggregates
SparkQA removed a comment on pull request #29360: URL: https://github.com/apache/spark/pull/29360#issuecomment-675233817 **[Test build #127526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127526/testReport)** for PR 29360 at commit [`87b9a82`](https://github.com/apache/spark/commit/87b9a825359168eb07fe5f9791e1dc26ce138046). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files
AmplabJenkins removed a comment on pull request #28422: URL: https://github.com/apache/spark/pull/28422#issuecomment-675297540 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127531/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29460: [DO-NOT-MERGE][SPARK-32249][3.0] Run Github Actions builds in other branches as well
AmplabJenkins commented on pull request #29460: URL: https://github.com/apache/spark/pull/29460#issuecomment-675297637 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29456: [SPARK-32647][INFRA] Report SparkR test results with JUnit reporter
AmplabJenkins removed a comment on pull request #29456: URL: https://github.com/apache/spark/pull/29456#issuecomment-675297225 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127544/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29460: [DO-NOT-MERGE][SPARK-32249][3.0] Run Github Actions builds in other branches as well
SparkQA removed a comment on pull request #29460: URL: https://github.com/apache/spark/pull/29460#issuecomment-675273071 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue
SparkQA removed a comment on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-675250393 **[Test build #127530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127530/testReport)** for PR 28904 at commit [`e16ebe4`](https://github.com/apache/spark/commit/e16ebe4e530d3c44bb0ba39981c4ec2287c3589e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29460: [DO-NOT-MERGE][SPARK-32249][3.0] Run Github Actions builds in other branches as well
AmplabJenkins removed a comment on pull request #29460: URL: https://github.com/apache/spark/pull/29460#issuecomment-675297293 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29456: [SPARK-32647][INFRA] Report SparkR test results with JUnit reporter
SparkQA removed a comment on pull request #29456: URL: https://github.com/apache/spark/pull/29456#issuecomment-675291618 **[Test build #127544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127544/testReport)** for PR 29456 at commit [`603268e`](https://github.com/apache/spark/commit/603268e6598e538946102952aeb46b1874d54e38). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files
AmplabJenkins removed a comment on pull request #28422: URL: https://github.com/apache/spark/pull/28422#issuecomment-675297527 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
AmplabJenkins commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-675298265 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
AmplabJenkins removed a comment on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-675297402 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore
SparkQA removed a comment on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-675253028 **[Test build #127537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127537/testReport)** for PR 26935 at commit [`cabd38f`](https://github.com/apache/spark/commit/cabd38f32622b61c73bb3f1ca6c6390df7e89c04). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #25965: [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption
SparkQA removed a comment on pull request #25965: URL: https://github.com/apache/spark/pull/25965#issuecomment-675253090 **[Test build #127538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127538/testReport)** for PR 25965 at commit [`d15acef`](https://github.com/apache/spark/commit/d15acef9698528239dc8a5b92d55c950cdf602b2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
SparkQA removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-675227409 **[Test build #127524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127524/testReport)** for PR 28841 at commit [`263dd2a`](https://github.com/apache/spark/commit/263dd2a58ee990600aae3c40ea3eb56368a9c48d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
SparkQA removed a comment on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-675252940 **[Test build #127535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127535/testReport)** for PR 27649 at commit [`6406e36`](https://github.com/apache/spark/commit/6406e36eb34377983aaf113495ca16b1553317a3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29453: [SPARK-31999][SQL][FOLLOWUP] Adds negative test cases with typos
AmplabJenkins removed a comment on pull request #29453: URL: https://github.com/apache/spark/pull/29453#issuecomment-675297359 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29456: [SPARK-32647][INFRA] Report SparkR test results with JUnit reporter
AmplabJenkins removed a comment on pull request #29456: URL: https://github.com/apache/spark/pull/29456#issuecomment-675297211 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
SparkQA removed a comment on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-675250435 **[Test build #127532 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127532/testReport)** for PR 28363 at commit [`b648156`](https://github.com/apache/spark/commit/b64815622bb4e8cd8b474cb2983f2c9b78ed9342). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29457: [SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis
SparkQA removed a comment on pull request #29457: URL: https://github.com/apache/spark/pull/29457#issuecomment-675294654 **[Test build #127545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127545/testReport)** for PR 29457 at commit [`090747d`](https://github.com/apache/spark/commit/090747d4a9d1540c4b65e45f960c926a23d76b84). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29457: [SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins removed a comment on pull request #29457: URL: https://github.com/apache/spark/pull/29457#issuecomment-675297229 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27333: [SPARK-29438][SS][FOLLOWUP] Add regression tests for Streaming Aggregation and flatMapGroupsWithState
AmplabJenkins commented on pull request #27333: URL: https://github.com/apache/spark/pull/27333#issuecomment-675297620 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #29461: [DO-NOT-MERGE][SPARK-32456][SS][FOLLOWUP] Update doc to note about using SQL statement with streaming Dataset
HeartSaVioR commented on pull request #29461: URL: https://github.com/apache/spark/pull/29461#issuecomment-675297309 I'm marking this as draft as I'd like to see which is preferred - just document to warn about end users (this PR) vs collect and prevent some error-prone operations for streaming workload (proposed https://github.com/apache/spark/pull/29256#discussion_r471945148). If we don't mind covering this with guide doc, this PR can be converted to "ready-to-review". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files
AmplabJenkins commented on pull request #28422: URL: https://github.com/apache/spark/pull/28422#issuecomment-675297527 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
AmplabJenkins removed a comment on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-675297607 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
AmplabJenkins commented on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-675297607 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore
AmplabJenkins commented on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-675297411 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27333: [SPARK-29438][SS][FOLLOWUP] Add regression tests for Streaming Aggregation and flatMapGroupsWithState
SparkQA removed a comment on pull request #27333: URL: https://github.com/apache/spark/pull/27333#issuecomment-675252970 **[Test build #127536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127536/testReport)** for PR 27333 at commit [`466363e`](https://github.com/apache/spark/commit/466363edb22ea83a81e21a72f1b983dc7b5a733e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore
AmplabJenkins removed a comment on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-675297411 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29457: [SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins commented on pull request #29457: URL: https://github.com/apache/spark/pull/29457#issuecomment-675297229 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
SparkQA commented on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-675297170 **[Test build #127532 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127532/testReport)** for PR 28363 at commit [`b648156`](https://github.com/apache/spark/commit/b64815622bb4e8cd8b474cb2983f2c9b78ed9342). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29453: [SPARK-31999][SQL][FOLLOWUP] Adds negative test cases with typos
SparkQA commented on pull request #29453: URL: https://github.com/apache/spark/pull/29453#issuecomment-675297169 **[Test build #127542 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127542/testReport)** for PR 29453 at commit [`69b45be`](https://github.com/apache/spark/commit/69b45bed5e12064d19c4edbac94c3cdbef63f5ff). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files
SparkQA removed a comment on pull request #28422: URL: https://github.com/apache/spark/pull/28422#issuecomment-675250414 **[Test build #127531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127531/testReport)** for PR 28422 at commit [`06ee53d`](https://github.com/apache/spark/commit/06ee53d9dee60756be8563d584d589e198d670f1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29460: [DO-NOT-MERGE][SPARK-32249][3.0] Run Github Actions builds in other branches as well
SparkQA commented on pull request #29460: URL: https://github.com/apache/spark/pull/29460#issuecomment-675297152 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
AmplabJenkins commented on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-675297402 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27333: [SPARK-29438][SS][FOLLOWUP] Add regression tests for Streaming Aggregation and flatMapGroupsWithState
SparkQA commented on pull request #27333: URL: https://github.com/apache/spark/pull/27333#issuecomment-675297154 **[Test build #127536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127536/testReport)** for PR 27333 at commit [`466363e`](https://github.com/apache/spark/commit/466363edb22ea83a81e21a72f1b983dc7b5a733e). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #25965: [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption
SparkQA commented on pull request #25965: URL: https://github.com/apache/spark/pull/25965#issuecomment-675297180 **[Test build #127538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127538/testReport)** for PR 25965 at commit [`d15acef`](https://github.com/apache/spark/commit/d15acef9698528239dc8a5b92d55c950cdf602b2). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29453: [SPARK-31999][SQL][FOLLOWUP] Adds negative test cases with typos
AmplabJenkins commented on pull request #29453: URL: https://github.com/apache/spark/pull/29453#issuecomment-675297359 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
SparkQA commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-675297182 **[Test build #127534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127534/testReport)** for PR 27694 at commit [`2559928`](https://github.com/apache/spark/commit/2559928be2d7981c2c1c2d9b6111c4449e721310). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29456: [SPARK-32647][INFRA] Report SparkR test results with JUnit reporter
AmplabJenkins commented on pull request #29456: URL: https://github.com/apache/spark/pull/29456#issuecomment-675297211 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files
SparkQA commented on pull request #28422: URL: https://github.com/apache/spark/pull/28422#issuecomment-675297171 **[Test build #127531 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127531/testReport)** for PR 28422 at commit [`06ee53d`](https://github.com/apache/spark/commit/06ee53d9dee60756be8563d584d589e198d670f1). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
SparkQA commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-675297165 **[Test build #127524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127524/testReport)** for PR 28841 at commit [`263dd2a`](https://github.com/apache/spark/commit/263dd2a58ee990600aae3c40ea3eb56368a9c48d). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29360: [SPARK-32542][SQL] Add an optimizer rule to split an Expand into multiple Expands for aggregates
SparkQA commented on pull request #29360: URL: https://github.com/apache/spark/pull/29360#issuecomment-675297167 **[Test build #127526 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127526/testReport)** for PR 29360 at commit [`87b9a82`](https://github.com/apache/spark/commit/87b9a825359168eb07fe5f9791e1dc26ce138046). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29457: [SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis
SparkQA commented on pull request #29457: URL: https://github.com/apache/spark/pull/29457#issuecomment-675297166 **[Test build #127545 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127545/testReport)** for PR 29457 at commit [`090747d`](https://github.com/apache/spark/commit/090747d4a9d1540c4b65e45f960c926a23d76b84). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29460: [DO-NOT-MERGE][SPARK-32249][3.0] Run Github Actions builds in other branches as well
AmplabJenkins commented on pull request #29460: URL: https://github.com/apache/spark/pull/29460#issuecomment-675297293 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
SparkQA commented on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-675297150 **[Test build #127535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127535/testReport)** for PR 27649 at commit [`6406e36`](https://github.com/apache/spark/commit/6406e36eb34377983aaf113495ca16b1553317a3). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore
SparkQA commented on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-675297173 **[Test build #127537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127537/testReport)** for PR 26935 at commit [`cabd38f`](https://github.com/apache/spark/commit/cabd38f32622b61c73bb3f1ca6c6390df7e89c04). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class HDFSBackedReadOnlyStateStore(val version: Long, map: MapType)` * `abstract class ReadOnlyStateStore extends StateStore ` * `class WrappedReadOnlyStateStore(store: StateStore) extends ReadOnlyStateStore ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue
SparkQA commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-675297184 **[Test build #127530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127530/testReport)** for PR 28904 at commit [`e16ebe4`](https://github.com/apache/spark/commit/e16ebe4e530d3c44bb0ba39981c4ec2287c3589e). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29456: [SPARK-32647][INFRA] Report SparkR test results with JUnit reporter
SparkQA commented on pull request #29456: URL: https://github.com/apache/spark/pull/29456#issuecomment-675297168 **[Test build #127544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127544/testReport)** for PR 29456 at commit [`603268e`](https://github.com/apache/spark/commit/603268e6598e538946102952aeb46b1874d54e38). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29457: [SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins removed a comment on pull request #29457: URL: https://github.com/apache/spark/pull/29457#issuecomment-675295113 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR opened a new pull request #29461: [DO-NOT-MERGE][SPARK-32456][SS][FOLLOWUP] Update doc to note about using SQL statement with streaming Dataset
HeartSaVioR opened a new pull request #29461: URL: https://github.com/apache/spark/pull/29461 ### What changes were proposed in this pull request? This patch proposes to update the doc (both SS guide doc and Dataset dropDuplicates method doc) to leave a note to check on using SQL statements with streaming Dataset. Once end users create a temp view based on streaming Dataset, they won't bother with thinking about "streaming" and do whatever they do with batch query. In many cases it works, but not just smoothly for the case when streaming aggregation is involved. They still need to concern about maintaining state store. ### Why are the changes needed? Although SPARK-32456 fixed the weird error message, as a side effect some operations are enabled on streaming workload via SQL statement, which is error-prone if end users don't indicate what they're doing. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Only doc change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29457: [SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins commented on pull request #29457: URL: https://github.com/apache/spark/pull/29457#issuecomment-675295113 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29457: [SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis
SparkQA commented on pull request #29457: URL: https://github.com/apache/spark/pull/29457#issuecomment-675294654 **[Test build #127545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127545/testReport)** for PR 29457 at commit [`090747d`](https://github.com/apache/spark/commit/090747d4a9d1540c4b65e45f960c926a23d76b84). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #29437: [SPARK-32621][SQL] 'path' option can cause issues while inferring schema in CSV/JSON datasources
gengliangwang commented on a change in pull request #29437: URL: https://github.com/apache/spark/pull/29437#discussion_r471954974 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala ## @@ -34,13 +35,21 @@ import org.apache.spark.sql.util.SchemaUtils abstract class FileTable( sparkSession: SparkSession, -options: CaseInsensitiveStringMap, +originalOptions: CaseInsensitiveStringMap, paths: Seq[String], userSpecifiedSchema: Option[StructType]) extends Table with SupportsRead with SupportsWrite { import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ + // Options without path-related options from `originalOptions`. + protected final lazy val options: CaseInsensitiveStringMap = { +val caseInsensitiveMap = CaseInsensitiveMap(originalOptions.asCaseSensitiveMap.asScala.toMap) +val caseInsensitiveMapWithoutPaths = caseInsensitiveMap - "paths" - "path" +new CaseInsensitiveStringMap( + caseInsensitiveMapWithoutPaths.asInstanceOf[CaseInsensitiveMap[String]].originalMap.asJava) + } Review comment: There was a time when the `FileIndex` is created in `FileDataSourceV2`, so that the `getPaths` method was in `FileDataSourceV2`. In the current code, it seems fine to move the location of the method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29452: [SPARK-32643][CORE] Consolidate state decommissioning in the TaskSchedulerImpl realm
cloud-fan commented on a change in pull request #29452: URL: https://github.com/apache/spark/pull/29452#discussion_r471954231 ## File path: core/src/main/scala/org/apache/spark/scheduler/ExecutorDecommissionInfo.scala ## @@ -18,11 +18,21 @@ package org.apache.spark.scheduler /** - * Provides more detail when an executor is being decommissioned. + * Message providing more detail when an executor is being decommissioned. * @param message Human readable reason for why the decommissioning is happening. * @param isHostDecommissioned Whether the host (aka the `node` or `worker` in other places) is * being decommissioned too. Used to infer if the shuffle data might * be lost even if the external shuffle service is enabled. */ private[spark] case class ExecutorDecommissionInfo(message: String, isHostDecommissioned: Boolean) + +/** + * State related to decommissioning that is kept by the TaskSchedulerImpl. This state is derived + * from the info message above but it is kept distinct to allow the state to evolve independently + * from the message. + */ +case class ExecutorDecommissionState(message: String, Review comment: why not `(info: ExecutorDecommissionInfo, tsMillis: Long)`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29456: [SPARK-32647][INFRA] Report SparkR test results with JUnit reporter
AmplabJenkins commented on pull request #29456: URL: https://github.com/apache/spark/pull/29456#issuecomment-675292135 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29456: [SPARK-32647][INFRA] Report SparkR test results with JUnit reporter
AmplabJenkins removed a comment on pull request #29456: URL: https://github.com/apache/spark/pull/29456#issuecomment-675292135 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29456: [SPARK-32647][INFRA] Report SparkR test results with JUnit reporter
SparkQA commented on pull request #29456: URL: https://github.com/apache/spark/pull/29456#issuecomment-675291618 **[Test build #127544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127544/testReport)** for PR 29456 at commit [`603268e`](https://github.com/apache/spark/commit/603268e6598e538946102952aeb46b1874d54e38). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan edited a comment on pull request #29395: [3.0][SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources
cloud-fan edited a comment on pull request #29395: URL: https://github.com/apache/spark/pull/29395#issuecomment-675290468 thanks, merging to 3.0! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #29395: [3.0][SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources
cloud-fan closed pull request #29395: URL: https://github.com/apache/spark/pull/29395 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29395: [3.0][SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources
cloud-fan commented on pull request #29395: URL: https://github.com/apache/spark/pull/29395#issuecomment-675290468 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #29422: [SPARK-32613][CORE] Fix regressions in DecommissionWorkerSuite
cloud-fan closed pull request #29422: URL: https://github.com/apache/spark/pull/29422 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29422: [SPARK-32613][CORE] Fix regressions in DecommissionWorkerSuite
cloud-fan commented on pull request #29422: URL: https://github.com/apache/spark/pull/29422#issuecomment-675289413 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29322: [SPARK-32511][SQL] Add dropFields method to Column class
cloud-fan commented on pull request #29322: URL: https://github.com/apache/spark/pull/29322#issuecomment-675287699 reopened This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29458: [SPARK-32018][FOLLOWUP][Doc] Add migration guide for decimal value overflow in sum aggregation
cloud-fan commented on a change in pull request #29458: URL: https://github.com/apache/spark/pull/29458#discussion_r471947639 ## File path: docs/sql-migration-guide.md ## @@ -36,6 +36,10 @@ license: | - In Spark 3.1, NULL elements of structures, arrays and maps are converted to "null" in casting them to strings. In Spark 3.0 or earlier, NULL elements are converted to empty strings. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`. + - In Spark 3.1, when `spark.sql.ansi.enabled` is false, sum aggregation of decimal type column always returns `null` on decimal value overflow. In Spark 3.0 or earlier, when `spark.sql.ansi.enabled` is false and decimal value overflow happens in sum aggregation of decimal type column: +- If it is hash aggregation with `group by` clause, a runtime exception is thrown. Review comment: We can use "default mode". I don't see a difference between "may fail at runtime" or `may return null`. They are mutually exclusive. ## File path: docs/sql-migration-guide.md ## @@ -36,6 +36,10 @@ license: | - In Spark 3.1, NULL elements of structures, arrays and maps are converted to "null" in casting them to strings. In Spark 3.0 or earlier, NULL elements are converted to empty strings. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`. + - In Spark 3.1, when `spark.sql.ansi.enabled` is false, sum aggregation of decimal type column always returns `null` on decimal value overflow. In Spark 3.0 or earlier, when `spark.sql.ansi.enabled` is false and decimal value overflow happens in sum aggregation of decimal type column: +- If it is hash aggregation with `group by` clause, a runtime exception is thrown. Review comment: We can use "default mode". I don't see a difference between "may fail at runtime" or "may return null". They are mutually exclusive. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #29256: [SPARK-32456][SS] Check the Distinct by assuming it as Aggregate for Structured Streaming
HeartSaVioR commented on a change in pull request #29256: URL: https://github.com/apache/spark/pull/29256#discussion_r471945148 ## File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala ## @@ -1106,6 +1107,54 @@ class StreamingQuerySuite extends StreamTest with BeforeAndAfter with Logging wi } } + test("union in streaming query of append mode without watermark") { +val inputData1 = MemoryStream[Int] +val inputData2 = MemoryStream[Int] +withTempView("s1", "s2") { + inputData1.toDF().createOrReplaceTempView("s1") + inputData2.toDF().createOrReplaceTempView("s2") + val unioned = spark.sql( +"select s1.value from s1 union select s2.value from s2") + checkExceptionMessage(unioned) +} + } + + test("distinct in streaming query of append mode without watermark") { +val inputData = MemoryStream[Int] +withTempView("deduptest") { + inputData.toDF().toDF("value").createOrReplaceTempView("deduptest") + val distinct = spark.sql("select distinct value from deduptest") + checkExceptionMessage(distinct) +} + } + + test("distinct in streaming query of complete mode") { +val inputData = MemoryStream[Int] +withTempView("deduptest") { + inputData.toDF().toDF("value").createOrReplaceTempView("deduptest") + val distinct = spark.sql("select distinct value from deduptest") + + testStream(distinct, Complete)( +AddData(inputData, 1, 2, 3, 3, 4), +CheckAnswer(Row(1), Row(2), Row(3), Row(4)) Review comment: What I am suggesting is that waiting and hearing the operations we have been restricted on SS with the reasons, and if the reasons make sense then ban them even with SQL statements. Not only distinct. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29459: [MINOR][INFRA] Rename master.yml to build_and_test.yml
HyukjinKwon commented on pull request #29459: URL: https://github.com/apache/spark/pull/29459#issuecomment-675283818 Thanks guys. Let me merge this after I cherry-pick Github Actions to other branches (at https://github.com/apache/spark/pull/29460) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29456: [SPARK-32647][INFRA] Report SparkR test results with JUnit reporter
HyukjinKwon commented on pull request #29456: URL: https://github.com/apache/spark/pull/29456#issuecomment-675283772 Thanks guys. Let me merge this after I cherry-pick Github Actions to other branches (at https://github.com/apache/spark/pull/29460) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29459: [MINOR][INFRA] Rename master.yml to build_and_test.yml
AmplabJenkins removed a comment on pull request #29459: URL: https://github.com/apache/spark/pull/29459#issuecomment-675280417 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127529/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29437: [SPARK-32621][SQL] 'path' option can cause issues while inferring schema in CSV/JSON datasources
viirya commented on a change in pull request #29437: URL: https://github.com/apache/spark/pull/29437#discussion_r471933928 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala ## @@ -34,13 +35,21 @@ import org.apache.spark.sql.util.SchemaUtils abstract class FileTable( sparkSession: SparkSession, -options: CaseInsensitiveStringMap, +originalOptions: CaseInsensitiveStringMap, Review comment: Do we have chance to use `path` related options in `FileTable`? If not, can we just remove it when create `FileTable`? It feels a bit stranger that we assign some options to it, but also ask it to remove a few options. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29459: [MINOR][INFRA] Rename master.yml to build_and_test.yml
AmplabJenkins removed a comment on pull request #29459: URL: https://github.com/apache/spark/pull/29459#issuecomment-675280413 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29459: [MINOR][INFRA] Rename master.yml to build_and_test.yml
AmplabJenkins commented on pull request #29459: URL: https://github.com/apache/spark/pull/29459#issuecomment-675280413 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29459: [MINOR][INFRA] Rename master.yml to build_and_test.yml
SparkQA removed a comment on pull request #29459: URL: https://github.com/apache/spark/pull/29459#issuecomment-675238599 **[Test build #127529 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127529/testReport)** for PR 29459 at commit [`3bd540f`](https://github.com/apache/spark/commit/3bd540f529970130ede596a78097b24375972841). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29459: [MINOR][INFRA] Rename master.yml to build_and_test.yml
SparkQA commented on pull request #29459: URL: https://github.com/apache/spark/pull/29459#issuecomment-675279846 **[Test build #127529 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127529/testReport)** for PR 29459 at commit [`3bd540f`](https://github.com/apache/spark/commit/3bd540f529970130ede596a78097b24375972841). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29256: [SPARK-32456][SS] Check the Distinct by assuming it as Aggregate for Structured Streaming
cloud-fan commented on a change in pull request #29256: URL: https://github.com/apache/spark/pull/29256#discussion_r471939568 ## File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala ## @@ -1106,6 +1107,54 @@ class StreamingQuerySuite extends StreamTest with BeforeAndAfter with Logging wi } } + test("union in streaming query of append mode without watermark") { +val inputData1 = MemoryStream[Int] +val inputData2 = MemoryStream[Int] +withTempView("s1", "s2") { + inputData1.toDF().createOrReplaceTempView("s1") + inputData2.toDF().createOrReplaceTempView("s2") + val unioned = spark.sql( +"select s1.value from s1 union select s2.value from s2") + checkExceptionMessage(unioned) +} + } + + test("distinct in streaming query of append mode without watermark") { +val inputData = MemoryStream[Int] +withTempView("deduptest") { + inputData.toDF().toDF("value").createOrReplaceTempView("deduptest") + val distinct = spark.sql("select distinct value from deduptest") + checkExceptionMessage(distinct) +} + } + + test("distinct in streaming query of complete mode") { +val inputData = MemoryStream[Int] +withTempView("deduptest") { + inputData.toDF().toDF("value").createOrReplaceTempView("deduptest") + val distinct = spark.sql("select distinct value from deduptest") + + testStream(distinct, Complete)( +AddData(inputData, 1, 2, 3, 3, 4), +CheckAnswer(Row(1), Row(2), Row(3), Row(4)) Review comment: Are you suggesting to ban `Distinct` in SS completely? I think it's fine too, as long as we don't give confusing error messages. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org