[GitHub] [spark] AmplabJenkins removed a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
AmplabJenkins removed a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508362478 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107219/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
AmplabJenkins removed a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508362469 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on issue #25028: [SPARK-28227][SQL] Support TRANSFORM with aggregation.
AngersZh commented on issue #25028: [SPARK-28227][SQL] Support TRANSFORM with aggregation. URL: https://github.com/apache/spark/pull/25028#issuecomment-508362862 @cloud-fan @gatorsmile @HyukjinKwon @jerryshao @wangyum Hi all, could you help to review this and give some advise. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
AmplabJenkins commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508362469 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
AmplabJenkins commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508362478 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107219/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
SparkQA removed a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508334567 **[Test build #107219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107219/testReport)** for PR 24996 at commit [`12655f0`](https://github.com/apache/spark/commit/12655f00305c40ccb51295b22e8d271220c84e13). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
SparkQA commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508361904 **[Test build #107219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107219/testReport)** for PR 24996 at commit [`12655f0`](https://github.com/apache/spark/commit/12655f00305c40ccb51295b22e8d271220c84e13). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #24973: [SPARK-28169] Fix Partition table partition PushDown failed by "OR" expression
AngersZh commented on a change in pull request #24973: [SPARK-28169] Fix Partition table partition PushDown failed by "OR" expression URL: https://github.com/apache/spark/pull/24973#discussion_r300249848 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ## @@ -310,3 +310,97 @@ object PhysicalWindow { case _ => None } } + +/** + * Extract partition push down condition from ExpressionSet + * Since origin judge condition is + *{ + * !expression.references.isEmpty && + * expression.references.subsetOf(partitionKeyIds) + *} + * + * This can only push down simple condition expression. + * Such as table: + * CREATE TABLE DEFAULT.PARTITION_TABLE( + * A STRING, + * B STRING) + * PARTITIONED BY(DT STRING) + * + * With SQL: + * SELECT A, B + * FROM DEFAULT.PARTITION_TABLE + * WHERE DT = 20190601 OR (DT = 20190602 AND C = "TEST") + * + * Where condition "DT = 20190601 OR (DT = 20190602 AND C = "TEST")" Review comment: @cloud-fan In my code. coming predicate Set[Expression] has a potential **AND** logical. For one Expression, it will be restricted by other same level Expression. and : - if it is a combine of **AND** each side can be a constraint to others, so it one side is tenable, it can return a tenable condition. - if it is a combine of **OR**, if one side is out of control(such as have no condition about partition cols) this whole **OR** Expression should return NONE. Only when both side of **OR** 's child is reasonable, it can return a tenable combine of **OR**. - if it 's a multilayer nested Expression combined by BinaryOperator. It will visit the lowest level, if it found one level's **OR** Expression is untenable, it will break this Expression totally and return null. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted
AmplabJenkins removed a comment on issue #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted URL: https://github.com/apache/spark/pull/24186#issuecomment-508356572 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107215/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted
AmplabJenkins removed a comment on issue #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted URL: https://github.com/apache/spark/pull/24186#issuecomment-508356570 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted
AmplabJenkins commented on issue #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted URL: https://github.com/apache/spark/pull/24186#issuecomment-508356570 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted
AmplabJenkins commented on issue #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted URL: https://github.com/apache/spark/pull/24186#issuecomment-508356572 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107215/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted
SparkQA removed a comment on issue #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted URL: https://github.com/apache/spark/pull/24186#issuecomment-508323894 **[Test build #107215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107215/testReport)** for PR 24186 at commit [`933d00d`](https://github.com/apache/spark/commit/933d00db1995bb6bd62b9b6b7facdc1ad7e42ea3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted
SparkQA commented on issue #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted URL: https://github.com/apache/spark/pull/24186#issuecomment-508356121 **[Test build #107215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107215/testReport)** for PR 24186 at commit [`933d00d`](https://github.com/apache/spark/commit/933d00db1995bb6bd62b9b6b7facdc1ad7e42ea3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #24860: [SPARK-28034][SQL] Port with.sql
peter-toth commented on a change in pull request #24860: [SPARK-28034][SQL] Port with.sql URL: https://github.com/apache/spark/pull/24860#discussion_r300240977 ## File path: sql/core/src/test/resources/sql-tests/inputs/pgSQL/with.sql ## @@ -0,0 +1,1222 @@ +-- +-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group +-- +-- +-- WITH +-- https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/with.sql Review comment: Sure, https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/with.sql and https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/with.sql are equal. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #24860: [SPARK-28034][SQL] Port with.sql
peter-toth commented on a change in pull request #24860: [SPARK-28034][SQL] Port with.sql URL: https://github.com/apache/spark/pull/24860#discussion_r300240977 ## File path: sql/core/src/test/resources/sql-tests/inputs/pgSQL/with.sql ## @@ -0,0 +1,1222 @@ +-- +-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group +-- +-- +-- WITH +-- https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/with.sql Review comment: Sure, https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/with.sql and https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/with.sql are equal, I will update the comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24860: [SPARK-28034][SQL] Port with.sql
AmplabJenkins removed a comment on issue #24860: [SPARK-28034][SQL] Port with.sql URL: https://github.com/apache/spark/pull/24860#issuecomment-508350331 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24860: [SPARK-28034][SQL] Port with.sql
AmplabJenkins removed a comment on issue #24860: [SPARK-28034][SQL] Port with.sql URL: https://github.com/apache/spark/pull/24860#issuecomment-508350336 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107214/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24860: [SPARK-28034][SQL] Port with.sql
AmplabJenkins commented on issue #24860: [SPARK-28034][SQL] Port with.sql URL: https://github.com/apache/spark/pull/24860#issuecomment-508350331 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24860: [SPARK-28034][SQL] Port with.sql
AmplabJenkins commented on issue #24860: [SPARK-28034][SQL] Port with.sql URL: https://github.com/apache/spark/pull/24860#issuecomment-508350336 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107214/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24860: [SPARK-28034][SQL] Port with.sql
SparkQA removed a comment on issue #24860: [SPARK-28034][SQL] Port with.sql URL: https://github.com/apache/spark/pull/24860#issuecomment-508320204 **[Test build #107214 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107214/testReport)** for PR 24860 at commit [`b9e3388`](https://github.com/apache/spark/commit/b9e338863f66017b7c8e332e9bdff2c3e895efb0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24860: [SPARK-28034][SQL] Port with.sql
SparkQA commented on issue #24860: [SPARK-28034][SQL] Port with.sql URL: https://github.com/apache/spark/pull/24860#issuecomment-508349925 **[Test build #107214 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107214/testReport)** for PR 24860 at commit [`b9e3388`](https://github.com/apache/spark/commit/b9e338863f66017b7c8e332e9bdff2c3e895efb0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
Ngone51 commented on a change in pull request #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#discussion_r300237828 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -380,6 +380,17 @@ class SparkContext(config: SparkConf) extends Logging { val resourcesFileOpt = conf.get(DRIVER_RESOURCES_FILE) _resources = getOrDiscoverAllResources(_conf, SPARK_DRIVER_PREFIX, resourcesFileOpt) +// driver submitted in client mode under Standalone may have conflict resources with +// workers on this host. We should sync driver's resources info into SPARK_RESOURCES +// to avoid collision. +if (deployMode == "client" && (master.startsWith("spark://") + || master.startsWith("local-cluster"))) { + val requests = parseAllResourceRequests(_conf, SPARK_DRIVER_PREFIX).map {req => +req.id.resourceName -> req.amount + }.toMap + // TODO(wuyi) log driver's acquired resources separately ? Review comment: @dongjoon-hyun Thank you for reminding that. I'll fix those TODOs in following commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #24831: [SPARK-19799][SQL] Support WITH clause in subqueries
peter-toth commented on a change in pull request #24831: [SPARK-19799][SQL] Support WITH clause in subqueries URL: https://github.com/apache/spark/pull/24831#discussion_r300237706 ## File path: sql/core/src/test/resources/sql-tests/results/cte.sql.out ## @@ -88,26 +98,192 @@ struct 1 1 --- !query 7 +-- !query 8 WITH t(x) AS (SELECT 1) SELECT * FROM t WHERE x = 1 --- !query 7 schema +-- !query 8 schema struct --- !query 7 output +-- !query 8 output 1 --- !query 8 +-- !query 9 +WITH t as ( + WITH t2 AS (SELECT 1) + SELECT * FROM t2 +) +SELECT * FROM t +-- !query 9 schema +struct<1:int> +-- !query 9 output +1 + + +-- !query 10 +SELECT max(c) FROM ( + WITH t(c) AS (SELECT 1) + SELECT * FROM t +) +-- !query 10 schema +struct +-- !query 10 output +1 + + +-- !query 11 +SELECT ( + WITH t AS (SELECT 1) + SELECT * FROM t +) +-- !query 11 schema +struct +-- !query 11 output +1 + + +-- !query 12 +WITH + t AS (SELECT 1), + t2 AS ( +WITH t AS (SELECT 2) +SELECT * FROM t + ) +SELECT * FROM t2 +-- !query 12 schema +struct<1:int> +-- !query 12 output +1 + Review comment: Yes, after https://github.com/apache/spark/pull/25029 it will return `2` (https://github.com/apache/spark/pull/25029/files#diff-fc515a5db268d29b08b80f5eb8202026R145) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25000: [SPARK-28107][SQL] Support 'day to hour', 'day to minute', 'hour to minute' and 'minute to second'
AmplabJenkins removed a comment on issue #25000: [SPARK-28107][SQL] Support 'day to hour', 'day to minute', 'hour to minute' and 'minute to second' URL: https://github.com/apache/spark/pull/25000#issuecomment-508347291 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25000: [SPARK-28107][SQL] Support 'day to hour', 'day to minute', 'hour to minute' and 'minute to second'
AmplabJenkins removed a comment on issue #25000: [SPARK-28107][SQL] Support 'day to hour', 'day to minute', 'hour to minute' and 'minute to second' URL: https://github.com/apache/spark/pull/25000#issuecomment-508347295 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107216/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25000: [SPARK-28107][SQL] Support 'day to hour', 'day to minute', 'hour to minute' and 'minute to second'
AmplabJenkins commented on issue #25000: [SPARK-28107][SQL] Support 'day to hour', 'day to minute', 'hour to minute' and 'minute to second' URL: https://github.com/apache/spark/pull/25000#issuecomment-508347291 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25000: [SPARK-28107][SQL] Support 'day to hour', 'day to minute', 'hour to minute' and 'minute to second'
AmplabJenkins commented on issue #25000: [SPARK-28107][SQL] Support 'day to hour', 'day to minute', 'hour to minute' and 'minute to second' URL: https://github.com/apache/spark/pull/25000#issuecomment-508347295 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107216/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type
AmplabJenkins removed a comment on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type URL: https://github.com/apache/spark/pull/25044#issuecomment-508346810 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type
AmplabJenkins removed a comment on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type URL: https://github.com/apache/spark/pull/25044#issuecomment-508346815 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107209/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25000: [SPARK-28107][SQL] Support 'day to hour', 'day to minute', 'hour to minute' and 'minute to second'
SparkQA removed a comment on issue #25000: [SPARK-28107][SQL] Support 'day to hour', 'day to minute', 'hour to minute' and 'minute to second' URL: https://github.com/apache/spark/pull/25000#issuecomment-508325239 **[Test build #107216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107216/testReport)** for PR 25000 at commit [`fa68a41`](https://github.com/apache/spark/commit/fa68a41f4653bf87aba834cd3ff14d7ca0820ade). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type
AmplabJenkins commented on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type URL: https://github.com/apache/spark/pull/25044#issuecomment-508346810 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25000: [SPARK-28107][SQL] Support 'day to hour', 'day to minute', 'hour to minute' and 'minute to second'
SparkQA commented on issue #25000: [SPARK-28107][SQL] Support 'day to hour', 'day to minute', 'hour to minute' and 'minute to second' URL: https://github.com/apache/spark/pull/25000#issuecomment-508346932 **[Test build #107216 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107216/testReport)** for PR 25000 at commit [`fa68a41`](https://github.com/apache/spark/commit/fa68a41f4653bf87aba834cd3ff14d7ca0820ade). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type
SparkQA removed a comment on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type URL: https://github.com/apache/spark/pull/25044#issuecomment-508313740 **[Test build #107209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107209/testReport)** for PR 25044 at commit [`fd70990`](https://github.com/apache/spark/commit/fd7099009f49aa71b2d8ee10c9c0c321904d28f0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type
AmplabJenkins commented on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type URL: https://github.com/apache/spark/pull/25044#issuecomment-508346815 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107209/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode
HyukjinKwon commented on a change in pull request #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode URL: https://github.com/apache/spark/pull/24637#discussion_r300235778 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -588,6 +588,16 @@ object ColumnPruning extends Rule[LogicalPlan] { .map(_._2) p.copy(child = g.copy(child = newChild, unrequiredChildIndex = unrequiredIndices)) +// prune unrequired nested fields +case p @ Project(projectList, g: Generate) => Review comment: My impression was that we need a configuration but I think you or @dongjoon-hyun have more context then me about nested pruning stuff. @cloud-fan, @dongjoon-hyun, @gatorsmile, can you make a call here if we need a config or not? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type
SparkQA commented on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type URL: https://github.com/apache/spark/pull/25044#issuecomment-508346449 **[Test build #107209 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107209/testReport)** for PR 25044 at commit [`fd70990`](https://github.com/apache/spark/commit/fd7099009f49aa71b2d8ee10c9c0c321904d28f0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode
viirya commented on a change in pull request #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode URL: https://github.com/apache/spark/pull/24637#discussion_r300234241 ## File path: sql/core/benchmarks/MiscBenchmark-results.txt ## @@ -2,119 +2,126 @@ filter & aggregate without group -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +OpenJDK 64-Bit Server VM 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03 on Linux 4.15.0-1021-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -range/filter/sum:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - -range/filter/sum wholestage off 47752 / 48952 43.9 22.8 1.0X -range/filter/sum wholestage on3123 / 3558671.5 1.5 15.3X +range/filter/sum: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +range/filter/sum wholestage off 46703 47444 1048 44.9 22.3 1.0X +range/filter/sum wholestage on 3109 3506 222674.5 1.5 15.0X range/limit/sum -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +OpenJDK 64-Bit Server VM 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03 on Linux 4.15.0-1021-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -range/limit/sum: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - -range/limit/sum wholestage off 229 / 236 2288.9 0.4 1.0X -range/limit/sum wholestage on 257 / 267 2041.0 0.5 0.9X +range/limit/sum: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +range/limit/sum wholestage off 191205 19 2738.4 0.4 1.0X +range/limit/sum wholestage on 112124 13 4699.4 0.2 1.7X sample -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +OpenJDK 64-Bit Server VM 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03 on Linux 4.15.0-1021-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -sample with replacement: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - -sample with replacement wholestage off 12908 / 13076 10.2 98.5 1.0X -sample with replacement wholestage on 7334 / 7346 17.9 56.0 1.8X +sample with replacement: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +sample with replacement wholestage off12545 12789 344 10.4 95.7 1.0X +sample with replacement wholestage on 7666 7687 12 17.1 58.5 1.6X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +OpenJDK 64-Bit Server VM 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03 on Linux 4.15.0-1021-aws Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -sample without replacement: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - -sample without replacement wholestage off 3082 / 3095 42.5 23.5 1.0X -sample without replacement wholestage on 1125 / 1211116.5 8.6 2.7X +sample without replacement: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative +---
[GitHub] [spark] viirya commented on a change in pull request #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type
viirya commented on a change in pull request #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type URL: https://github.com/apache/spark/pull/25044#discussion_r300233304 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/MapInPandasExec.scala ## @@ -38,7 +38,7 @@ import org.apache.spark.sql.vectorized.{ArrowColumnVector, ColumnarBatch} * `org.apache.spark.sql.catalyst.plans.logical.MapPartitionsInRWithArrow` * */ -case class MapPartitionsInPandasExec( +case class MapInPandasExec( Review comment: Oh, sorry, it changed. :D This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type
viirya commented on a change in pull request #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type URL: https://github.com/apache/spark/pull/25044#discussion_r300233203 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/MapInPandasExec.scala ## @@ -38,7 +38,7 @@ import org.apache.spark.sql.vectorized.{ArrowColumnVector, ColumnarBatch} * `org.apache.spark.sql.catalyst.plans.logical.MapPartitionsInRWithArrow` * */ -case class MapPartitionsInPandasExec( +case class MapInPandasExec( Review comment: Change `MapPartitionsInPandasExec.scala` to `MapInPandasExec.scala`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #24999: [SPARK-28142][SS][TEST][FOLLOWUP] Add configuration check test on Kafka continuous stream
HeartSaVioR commented on issue #24999: [SPARK-28142][SS][TEST][FOLLOWUP] Add configuration check test on Kafka continuous stream URL: https://github.com/apache/spark/pull/24999#issuecomment-508343366 Thanks all for reviewing and merging! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24831: [SPARK-19799][SQL] Support WITH clause in subqueries
dongjoon-hyun commented on a change in pull request #24831: [SPARK-19799][SQL] Support WITH clause in subqueries URL: https://github.com/apache/spark/pull/24831#discussion_r300231993 ## File path: sql/core/src/test/resources/sql-tests/results/cte.sql.out ## @@ -88,26 +98,192 @@ struct 1 1 --- !query 7 +-- !query 8 WITH t(x) AS (SELECT 1) SELECT * FROM t WHERE x = 1 --- !query 7 schema +-- !query 8 schema struct --- !query 7 output +-- !query 8 output 1 --- !query 8 +-- !query 9 +WITH t as ( + WITH t2 AS (SELECT 1) + SELECT * FROM t2 +) +SELECT * FROM t +-- !query 9 schema +struct<1:int> +-- !query 9 output +1 + + +-- !query 10 +SELECT max(c) FROM ( + WITH t(c) AS (SELECT 1) + SELECT * FROM t +) +-- !query 10 schema +struct +-- !query 10 output +1 + + +-- !query 11 +SELECT ( + WITH t AS (SELECT 1) + SELECT * FROM t +) +-- !query 11 schema +struct +-- !query 11 output +1 + + +-- !query 12 +WITH + t AS (SELECT 1), + t2 AS ( +WITH t AS (SELECT 2) +SELECT * FROM t + ) +SELECT * FROM t2 +-- !query 12 schema +struct<1:int> +-- !query 12 output +1 + Review comment: I also agree that this is inevitable in this PR. (cc @gatorsmile ). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in ReducedWindo…
AmplabJenkins removed a comment on issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in ReducedWindo… URL: https://github.com/apache/spark/pull/22423#issuecomment-437464303 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in ReducedWindo…
AmplabJenkins commented on issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in ReducedWindo… URL: https://github.com/apache/spark/pull/22423#issuecomment-508340418 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #24999: [SPARK-28142][SS][TEST][FOLLOWUP] Add configuration check test on Kafka continuous stream
dongjoon-hyun closed pull request #24999: [SPARK-28142][SS][TEST][FOLLOWUP] Add configuration check test on Kafka continuous stream URL: https://github.com/apache/spark/pull/24999 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
SparkQA commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508334567 **[Test build #107219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107219/testReport)** for PR 24996 at commit [`12655f0`](https://github.com/apache/spark/commit/12655f00305c40ccb51295b22e8d271220c84e13). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
HeartSaVioR commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508334179 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
HeartSaVioR commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508334149 Looks like I could see the failure locally by on other branch (SPARK-27254). It fails intermittently, but even it succeeds it leaves suspicious error log. I'll see what is happening there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
HeartSaVioR edited a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508334149 Looks like I could see same failure locally by on other branch (SPARK-27254). It fails intermittently, but even it succeeds it leaves suspicious error log. I'll see what is happening there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25022: [SPARK-24695][SQL]: To add support to return Calendar interval from udf.
HyukjinKwon commented on issue #25022: [SPARK-24695][SQL]: To add support to return Calendar interval from udf. URL: https://github.com/apache/spark/pull/25022#issuecomment-508333003 Okay, I rushed to read it. This PR targets to rather expose `CalendarIntervalType`, rather than using it somewhere else. @priyankagargnitk Can you fix the PR title and description to explain what this PR fixes? We should fix the documentation as well, for instance, [here](https://github.com/apache/spark/blob/master/docs/sql-reference.md#data-types). There have been multiple discussions about this. cc @gatorsmile, @rxin, @cloud-fan. To me, I don't mind about exposing this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted
dongjoon-hyun commented on a change in pull request #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted URL: https://github.com/apache/spark/pull/24186#discussion_r300224123 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ManifestFileCommitProtocol.scala ## @@ -70,7 +81,29 @@ class ManifestFileCommitProtocol(jobId: String, path: String) override def abortJob(jobContext: JobContext): Unit = { require(fileLog != null, "setupManifestOptions must be called before this function") -// Do nothing +// Best effort cleanup of complete files from failed job. +// Since the file has UUID in its filename, we are safe to try deleting them +// as the file will not conflict with file with another attempt on same task. +if (pendingCommitFiles.nonEmpty) { + pendingCommitFiles.foreach { file => +try { + val path = new Path(file) + val fs = path.getFileSystem(jobContext.getConfiguration) + // this is to make sure the file can be seen from driver as well + if (fs.exists(path)) { +fs.delete(path, false) + } +} catch { + case e: IOException => +logWarning(s"Fail to remove temporary file $file, continue removing next.", e) +} + } + pendingCommitFiles.clear() +} + } + + override def onTaskCommit(taskCommit: TaskCommitMessage): Unit = { +pendingCommitFiles ++= taskCommit.obj.asInstanceOf[Seq[SinkFileStatus]].map(_.path) Review comment: Just a question. When we handle `SinkFileStatus`, we hit character escaping issue frequently before. So, in `SinkFileStatus`, we use `new Path(new URI(path))`. Do we have a test coverage for this newly add code in this PR? I'm worrying about line 90 and line 106. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted
dongjoon-hyun commented on a change in pull request #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted URL: https://github.com/apache/spark/pull/24186#discussion_r300224123 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ManifestFileCommitProtocol.scala ## @@ -70,7 +81,29 @@ class ManifestFileCommitProtocol(jobId: String, path: String) override def abortJob(jobContext: JobContext): Unit = { require(fileLog != null, "setupManifestOptions must be called before this function") -// Do nothing +// Best effort cleanup of complete files from failed job. +// Since the file has UUID in its filename, we are safe to try deleting them +// as the file will not conflict with file with another attempt on same task. +if (pendingCommitFiles.nonEmpty) { + pendingCommitFiles.foreach { file => +try { + val path = new Path(file) + val fs = path.getFileSystem(jobContext.getConfiguration) + // this is to make sure the file can be seen from driver as well + if (fs.exists(path)) { +fs.delete(path, false) + } +} catch { + case e: IOException => +logWarning(s"Fail to remove temporary file $file, continue removing next.", e) +} + } + pendingCommitFiles.clear() +} + } + + override def onTaskCommit(taskCommit: TaskCommitMessage): Unit = { +pendingCommitFiles ++= taskCommit.obj.asInstanceOf[Seq[SinkFileStatus]].map(_.path) Review comment: Just a question. When we handle `SinkFileStatus`, we hit character escaping issues frequently before. So, in `SinkFileStatus`, we use `new Path(new URI(path))`. Do we have a test coverage for this newly add code in this PR? I'm worrying about line 90 and line 106. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
AmplabJenkins removed a comment on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#issuecomment-508332161 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107211/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25017: [SPARK-28218][SQL] Migrate Avro to File source V2
AmplabJenkins removed a comment on issue #25017: [SPARK-28218][SQL] Migrate Avro to File source V2 URL: https://github.com/apache/spark/pull/25017#issuecomment-508332183 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107217/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25017: [SPARK-28218][SQL] Migrate Avro to File source V2
AmplabJenkins commented on issue #25017: [SPARK-28218][SQL] Migrate Avro to File source V2 URL: https://github.com/apache/spark/pull/25017#issuecomment-508332182 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25017: [SPARK-28218][SQL] Migrate Avro to File source V2
AmplabJenkins removed a comment on issue #25017: [SPARK-28218][SQL] Migrate Avro to File source V2 URL: https://github.com/apache/spark/pull/25017#issuecomment-508332182 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25017: [SPARK-28218][SQL] Migrate Avro to File source V2
AmplabJenkins commented on issue #25017: [SPARK-28218][SQL] Migrate Avro to File source V2 URL: https://github.com/apache/spark/pull/25017#issuecomment-508332183 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107217/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
AmplabJenkins removed a comment on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#issuecomment-508332157 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
SparkQA removed a comment on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#issuecomment-508318918 **[Test build #107211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107211/testReport)** for PR 25047 at commit [`a9160e4`](https://github.com/apache/spark/commit/a9160e4adb6aa94579a80545b19e3190ede52b6d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted
SparkQA commented on issue #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted URL: https://github.com/apache/spark/pull/24186#issuecomment-508332137 **[Test build #107218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107218/testReport)** for PR 24186 at commit [`57cd3bf`](https://github.com/apache/spark/commit/57cd3bf37af54620b4d016b1e77e3bbc95fb4c94). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
SparkQA commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#issuecomment-508332042 **[Test build #107211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107211/testReport)** for PR 25047 at commit [`a9160e4`](https://github.com/apache/spark/commit/a9160e4adb6aa94579a80545b19e3190ede52b6d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class ReleaseResources(toRelease: Map[String, ResourceInformation]) extends DeployMessage` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
AmplabJenkins commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#issuecomment-508332157 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
AmplabJenkins commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#issuecomment-508332161 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107211/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25017: [SPARK-28218][SQL] Migrate Avro to File source V2
SparkQA removed a comment on issue #25017: [SPARK-28218][SQL] Migrate Avro to File source V2 URL: https://github.com/apache/spark/pull/25017#issuecomment-508327713 **[Test build #107217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107217/testReport)** for PR 25017 at commit [`92d2f47`](https://github.com/apache/spark/commit/92d2f4710d9071995da7860a742db5b055c37fa5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25017: [SPARK-28218][SQL] Migrate Avro to File source V2
SparkQA commented on issue #25017: [SPARK-28218][SQL] Migrate Avro to File source V2 URL: https://github.com/apache/spark/pull/25017#issuecomment-508332015 **[Test build #107217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107217/testReport)** for PR 25017 at commit [`92d2f47`](https://github.com/apache/spark/commit/92d2f4710d9071995da7860a742db5b055c37fa5). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class AvroV1LogicalTypeSuite extends AvroLogicalTypeSuite ` * `class AvroV2LogicalTypeSuite extends AvroLogicalTypeSuite ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted
HeartSaVioR commented on a change in pull request #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted URL: https://github.com/apache/spark/pull/24186#discussion_r300223499 ## File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala ## @@ -473,6 +476,73 @@ abstract class FileStreamSinkSuite extends StreamTest { assert(outputFiles.toList.isEmpty, "Incomplete files should be cleaned up.") } } + + testQuietly("cleanup complete but invalid output for aborted job") { +withSQLConf(("spark.sql.streaming.commitProtocolClass", + classOf[PendingCommitFilesTrackingManifestFileCommitProtocol].getCanonicalName)) { + withTempDir { tempDir => +val checkpointDir = new File(tempDir, "chk") +val outputDir = new File(tempDir, "output") +val inputData = MemoryStream[Int] +inputData.addData(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) +val q = inputData.toDS() + .repartition(10) + .map { value => +// we intend task failure after some tasks succeeds +if (value == 5) { + // put some delay to let other task commits before this task fails + Thread.sleep(100) + value / 0 +} else { + value +} + } + .writeStream + .option("checkpointLocation", checkpointDir.getCanonicalPath) + .format("parquet") + .start(outputDir.getCanonicalPath) + +intercept[StreamingQueryException] { + try { +q.processAllAvailable() + } finally { +q.stop() + } +} + +import PendingCommitFilesTrackingManifestFileCommitProtocol._ +val outputFiles = Files.walk(outputDir.toPath).iterator().asScala + .filter(_.toString.endsWith(".parquet")) +// there would be possible to have race condition: +// - some tasks complete while abortJob is being called +// we can't delete complete files for these tasks (it's OK since this is a best effort) +assert(!outputFiles.toList.exists(f => tracking.contains(f.toUri.getPath)), + "abortJob should clean up files reported as successful.") + } +} + } +} + +object PendingCommitFilesTrackingManifestFileCommitProtocol { Review comment: Ah yes. I thought you suggested to add UTs to ManifestFileCommitProtocolSuite which only deals with ManifestFileCommitProtocol (unit tests instead of acceptance tests). If the matter was location and you're OK with leave this as it is, let's leave it. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted
dongjoon-hyun commented on a change in pull request #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted URL: https://github.com/apache/spark/pull/24186#discussion_r300223049 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ManifestFileCommitProtocol.scala ## @@ -70,7 +81,30 @@ class ManifestFileCommitProtocol(jobId: String, path: String) override def abortJob(jobContext: JobContext): Unit = { require(fileLog != null, "setupManifestOptions must be called before this function") -// Do nothing +// Best effort cleanup of complete files from failed job. +// Since the file has UUID in its filename, we are safe to try deleting them +// as the file will not conflict with file with another attempt on same task. Review comment: `on same task` -> `on the same task`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted
dongjoon-hyun commented on a change in pull request #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted URL: https://github.com/apache/spark/pull/24186#discussion_r300222769 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ManifestFileCommitProtocol.scala ## @@ -70,7 +81,30 @@ class ManifestFileCommitProtocol(jobId: String, path: String) override def abortJob(jobContext: JobContext): Unit = { require(fileLog != null, "setupManifestOptions must be called before this function") -// Do nothing +// Best effort cleanup of complete files from failed job. +// Since the file has UUID in its filename, we are safe to try deleting them +// as the file will not conflict with file with another attempt on same task. +if (pendingCommitFiles.nonEmpty) { + pendingCommitFiles.foreach { file => +try { + val path = new Path(file) + val fs = path.getFileSystem(jobContext.getConfiguration) + // this is to make sure the file can be seen from driver as well + if (fs.exists(path)) { +fs.delete(path, false) + } +} catch { + case e: IOException => +logWarning(s"Fail to remove temporary file $file , continue removing next.", e) Review comment: nit `$file ,` -> `$file,`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted
dongjoon-hyun commented on a change in pull request #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted URL: https://github.com/apache/spark/pull/24186#discussion_r300222627 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ManifestFileCommitProtocol.scala ## @@ -70,7 +81,30 @@ class ManifestFileCommitProtocol(jobId: String, path: String) override def abortJob(jobContext: JobContext): Unit = { require(fileLog != null, "setupManifestOptions must be called before this function") -// Do nothing +// Best effort cleanup of complete files from failed job. +// Since the file has UUID in its filename, we are safe to try deleting them +// as the file will not conflict with file with another attempt on same task. +if (pendingCommitFiles.nonEmpty) { + pendingCommitFiles.foreach { file => +try { + val path = new Path(file) + val fs = path.getFileSystem(jobContext.getConfiguration) + // this is to make sure the file can be seen from driver as well + if (fs.exists(path)) { +fs.delete(path, false) + } +} catch { + case e: IOException => +logWarning(s"Fail to remove temporary file $file , continue removing next.", e) +} + Review comment: nit. Shall we remove the empty line? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted
dongjoon-hyun commented on a change in pull request #24186: [SPARK-27254][SS] Cleanup complete but invalid output files in ManifestFileCommitProtocol if job is aborted URL: https://github.com/apache/spark/pull/24186#discussion_r300222432 ## File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala ## @@ -473,6 +476,73 @@ abstract class FileStreamSinkSuite extends StreamTest { assert(outputFiles.toList.isEmpty, "Incomplete files should be cleaned up.") } } + + testQuietly("cleanup complete but invalid output for aborted job") { +withSQLConf(("spark.sql.streaming.commitProtocolClass", + classOf[PendingCommitFilesTrackingManifestFileCommitProtocol].getCanonicalName)) { + withTempDir { tempDir => +val checkpointDir = new File(tempDir, "chk") +val outputDir = new File(tempDir, "output") +val inputData = MemoryStream[Int] +inputData.addData(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) +val q = inputData.toDS() + .repartition(10) + .map { value => +// we intend task failure after some tasks succeeds +if (value == 5) { + // put some delay to let other task commits before this task fails + Thread.sleep(100) + value / 0 +} else { + value +} + } + .writeStream + .option("checkpointLocation", checkpointDir.getCanonicalPath) + .format("parquet") + .start(outputDir.getCanonicalPath) + +intercept[StreamingQueryException] { + try { +q.processAllAvailable() + } finally { +q.stop() + } +} + +import PendingCommitFilesTrackingManifestFileCommitProtocol._ +val outputFiles = Files.walk(outputDir.toPath).iterator().asScala + .filter(_.toString.endsWith(".parquet")) +// there would be possible to have race condition: +// - some tasks complete while abortJob is being called +// we can't delete complete files for these tasks (it's OK since this is a best effort) +assert(!outputFiles.toList.exists(f => tracking.contains(f.toUri.getPath)), + "abortJob should clean up files reported as successful.") + } +} + } +} + +object PendingCommitFilesTrackingManifestFileCommitProtocol { Review comment: Got it. Sure. No problem with the current location. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24850: [WIP][SPARK-28020][SQL][TEST] Port date.sql
dongjoon-hyun commented on a change in pull request #24850: [WIP][SPARK-28020][SQL][TEST] Port date.sql URL: https://github.com/apache/spark/pull/24850#discussion_r30075 ## File path: sql/core/src/test/resources/sql-tests/inputs/pgSQL/date.sql ## @@ -0,0 +1,358 @@ +-- +-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group +-- +-- +-- DATE +-- https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql Review comment: Then, let's update this line 6 from `REL_12_BETA1` to `REL_12_BETA2` because we had better point to the up-to-date branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24850: [WIP][SPARK-28020][SQL][TEST] Port date.sql
dongjoon-hyun commented on a change in pull request #24850: [WIP][SPARK-28020][SQL][TEST] Port date.sql URL: https://github.com/apache/spark/pull/24850#discussion_r30075 ## File path: sql/core/src/test/resources/sql-tests/inputs/pgSQL/date.sql ## @@ -0,0 +1,358 @@ +-- +-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group +-- +-- +-- DATE +-- https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql Review comment: Thanks. Then, let's update this line 6 from `REL_12_BETA1` to `REL_12_BETA2` because we had better point to the latest branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24850: [WIP][SPARK-28020][SQL][TEST] Port date.sql
dongjoon-hyun commented on a change in pull request #24850: [WIP][SPARK-28020][SQL][TEST] Port date.sql URL: https://github.com/apache/spark/pull/24850#discussion_r30075 ## File path: sql/core/src/test/resources/sql-tests/inputs/pgSQL/date.sql ## @@ -0,0 +1,358 @@ +-- +-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group +-- +-- +-- DATE +-- https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql Review comment: Then, let's update this line 6 from `REL_12_BETA1` to `REL_12_BETA2` because we had better point to the latest branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
dongjoon-hyun edited a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508330109 Thank you for update. BTW, is the failure due to a flaky test case? ``` [info] - query without test harness *** FAILED *** (2 seconds, 931 milliseconds) [info] scala.Predef.Set.apply[Int](0, 1, 2, 3).map[org.apache.spark.sql.Row, scala.collection.immutable.Set[org.apache.spark.sql.Row]](((x$3: Int) => org.apache.spark.sql.Row.apply(x$3)))(immutable.this.Set.canBuildFrom[org.apache.spark.sql.Row]).subsetOf(scala.Predef.refArrayOps[org.apache.spark.sql.Row](results).toSet[org.apache.spark.sql.Row]) was false (ContinuousSuite.scala:226) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
dongjoon-hyun commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508330109 Thank you for update. BTW, is the failure due to a flaky test case? ``` [info] - query without test harness *** FAILED *** (2 seconds, 931 milliseconds) [info] scala.Predef.Set.apply[Int](0, 1, 2, 3).map[org.apache.spark.sql.Row, scala.collection.immutable.Set[org.apache.spark.sql.Row]](((x$3: Int) => org.apache.spark.sql.Row.apply(x$3)))(immutable.this.Set.canBuildFrom[org.apache.spark.sql.Row]).subsetOf(scala.Predef.refArrayOps[org.apache.spark.sql.Row](results).toSet[org.apache.spark.sql.Row]) was false (ContinuousSuite.scala:226) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25022: [SPARK-24695][SQL]: To add support to return Calendar interval from udf.
dongjoon-hyun commented on a change in pull request #25022: [SPARK-24695][SQL]: To add support to return Calendar interval from udf. URL: https://github.com/apache/spark/pull/25022#discussion_r300221402 ## File path: sql/catalyst/src/test/java/org/apache/spark/sql/types/CalendarIntervalSuite.java ## @@ -0,0 +1,268 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.types; + +import org.junit.Test; + +import static org.junit.Assert.*; +import static org.apache.spark.sql.types.CalendarInterval.*; + +public class CalendarIntervalSuite { + +@Test +public void equalsTest() { +CalendarInterval i1 = new CalendarInterval(3, 123); Review comment: Also, all indentation are broken during remove. We are using `2-space` indentation in Java, too. If you use `git mv`, this kind of issue will removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25022: [SPARK-24695][SQL]: To add support to return Calendar interval from udf.
dongjoon-hyun commented on a change in pull request #25022: [SPARK-24695][SQL]: To add support to return Calendar interval from udf. URL: https://github.com/apache/spark/pull/25022#discussion_r300221072 ## File path: sql/catalyst/src/test/java/org/apache/spark/sql/types/CalendarIntervalSuite.java ## @@ -0,0 +1,268 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.types; Review comment: Please use `git mv` first and edit later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25022: [SPARK-24695][SQL]: To add support to return Calendar interval from udf.
dongjoon-hyun commented on a change in pull request #25022: [SPARK-24695][SQL]: To add support to return Calendar interval from udf. URL: https://github.com/apache/spark/pull/25022#discussion_r300221072 ## File path: sql/catalyst/src/test/java/org/apache/spark/sql/types/CalendarIntervalSuite.java ## @@ -0,0 +1,268 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.types; Review comment: Please use `git move` first and edit later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
AmplabJenkins removed a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508328959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107210/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25022: [SPARK-24695][SQL]: To add support to return Calendar interval from udf.
dongjoon-hyun commented on a change in pull request #25022: [SPARK-24695][SQL]: To add support to return Calendar interval from udf. URL: https://github.com/apache/spark/pull/25022#discussion_r300221072 ## File path: sql/catalyst/src/test/java/org/apache/spark/sql/types/CalendarIntervalSuite.java ## @@ -0,0 +1,268 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.types; Review comment: Please use `git move`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
AmplabJenkins removed a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508328955 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
AmplabJenkins commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508328959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107210/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
AmplabJenkins commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508328955 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
SparkQA removed a comment on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508316486 **[Test build #107210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107210/testReport)** for PR 24996 at commit [`12655f0`](https://github.com/apache/spark/commit/12655f00305c40ccb51295b22e8d271220c84e13). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users
SparkQA commented on issue #24996: [SPARK-28199][SS] Replace deprecated ProcessingTime with ProcessingTimeTrigger and hide from end users URL: https://github.com/apache/spark/pull/24996#issuecomment-508328824 **[Test build #107210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107210/testReport)** for PR 24996 at commit [`12655f0`](https://github.com/apache/spark/commit/12655f00305c40ccb51295b22e8d271220c84e13). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type
AmplabJenkins removed a comment on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type URL: https://github.com/apache/spark/pull/25044#issuecomment-508328531 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type
AmplabJenkins commented on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type URL: https://github.com/apache/spark/pull/25044#issuecomment-508328533 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107207/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type
AmplabJenkins commented on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type URL: https://github.com/apache/spark/pull/25044#issuecomment-508328531 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #25022: [SPARK-24695][SQL]: To add support to return Calendar interval from udf.
dongjoon-hyun edited a comment on issue #25022: [SPARK-24695][SQL]: To add support to return Calendar interval from udf. URL: https://github.com/apache/spark/pull/25022#issuecomment-508328617 I realized that this is based on @hvanhovell 's [request](https://github.com/apache/spark/pull/21679#issuecomment-503772393) at #21679. Hi, @hvanhovell . Could you review this PR? (Also, cc @gatorsmile , @cloud-fan ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type
AmplabJenkins removed a comment on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type URL: https://github.com/apache/spark/pull/25044#issuecomment-508328533 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107207/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25022: [SPARK-24695][SQL]: To add support to return Calendar interval from udf.
dongjoon-hyun commented on issue #25022: [SPARK-24695][SQL]: To add support to return Calendar interval from udf. URL: https://github.com/apache/spark/pull/25022#issuecomment-508328617 I realized that this is based on @hvanhovell 's request at #21679. Hi, @hvanhovell . Could you review this PR? (Also, cc @gatorsmile , @cloud-fan ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type
SparkQA removed a comment on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type URL: https://github.com/apache/spark/pull/25044#issuecomment-508298986 **[Test build #107207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107207/testReport)** for PR 25044 at commit [`3dd374e`](https://github.com/apache/spark/commit/3dd374e0ba007e341258d22868067189f9eca609). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type
SparkQA commented on issue #25044: [SPARK-28198][PYTHON][FOLLOW-UP] Rename mapPartitionsInPandas to mapInPandas with a separate evaluation type URL: https://github.com/apache/spark/pull/25044#issuecomment-508328296 **[Test build #107207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107207/testReport)** for PR 25044 at commit [`3dd374e`](https://github.com/apache/spark/commit/3dd374e0ba007e341258d22868067189f9eca609). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #25017: [SPARK-28218][SQL] Migrate Avro to File source V2
gengliangwang commented on a change in pull request #25017: [SPARK-28218][SQL] Migrate Avro to File source V2 URL: https://github.com/apache/spark/pull/25017#discussion_r300219709 ## File path: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroLogicalTypeSuite.scala ## @@ -349,3 +349,19 @@ class AvroLogicalTypeSuite extends QueryTest with SharedSQLContext with SQLTestU } } } + +class AvroV1LogicalTypeSuite extends AvroSuite { Review comment: Thanks! I have fixed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24978: [SPARK-28177][SQL] Adjust post shuffle partition number in adaptive execution
AmplabJenkins removed a comment on issue #24978: [SPARK-28177][SQL] Adjust post shuffle partition number in adaptive execution URL: https://github.com/apache/spark/pull/24978#issuecomment-508327647 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24978: [SPARK-28177][SQL] Adjust post shuffle partition number in adaptive execution
AmplabJenkins removed a comment on issue #24978: [SPARK-28177][SQL] Adjust post shuffle partition number in adaptive execution URL: https://github.com/apache/spark/pull/24978#issuecomment-508327648 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107206/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24978: [SPARK-28177][SQL] Adjust post shuffle partition number in adaptive execution
AmplabJenkins commented on issue #24978: [SPARK-28177][SQL] Adjust post shuffle partition number in adaptive execution URL: https://github.com/apache/spark/pull/24978#issuecomment-508327647 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25017: [SPARK-28218][SQL] Migrate Avro to File source V2
SparkQA commented on issue #25017: [SPARK-28218][SQL] Migrate Avro to File source V2 URL: https://github.com/apache/spark/pull/25017#issuecomment-508327713 **[Test build #107217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107217/testReport)** for PR 25017 at commit [`92d2f47`](https://github.com/apache/spark/commit/92d2f4710d9071995da7860a742db5b055c37fa5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
dongjoon-hyun commented on a change in pull request #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#discussion_r300217803 ## File path: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ## @@ -190,19 +189,8 @@ private[deploy] class Worker( private def createWorkDir() { workDir = Option(workDirPath).map(new File(_)).getOrElse(new File(sparkHome, "work")) -try { - // This sporadically fails - not sure why ... !workDir.exists() && !workDir.mkdirs() - // So attempting to create and then check if directory was created or not. - workDir.mkdirs() - if ( !workDir.exists() || !workDir.isDirectory) { -logError("Failed to create work directory " + workDir) -System.exit(1) - } - assert (workDir.isDirectory) -} catch { - case e: Exception => -logError("Failed to create work directory " + workDir, e) -System.exit(1) +if (!Utils.createDirectory(workDir)) { Review comment: ~Ur, is it the same? `Utils.createDirectory` seems to work differently from `workDir.mkdirs()`.~ ~Or, do we need to change the current behavior of `createWorkDir` fo this PR?~ Oops. I realized that this is a newly added function [here in this PR](https://github.com/apache/spark/pull/25047/files#diff-d239aee594001f8391676e1047a0381eR275). I was confused because this overloaded the existing one. Only parameters are different. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24978: [SPARK-28177][SQL] Adjust post shuffle partition number in adaptive execution
AmplabJenkins commented on issue #24978: [SPARK-28177][SQL] Adjust post shuffle partition number in adaptive execution URL: https://github.com/apache/spark/pull/24978#issuecomment-508327648 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107206/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org