[GitHub] [spark] AmplabJenkins removed a comment on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery
AmplabJenkins removed a comment on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery URL: https://github.com/apache/spark/pull/28043#issuecomment-604828858 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120450/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery
AmplabJenkins commented on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery URL: https://github.com/apache/spark/pull/28043#issuecomment-604828858 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120450/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery
SparkQA removed a comment on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery URL: https://github.com/apache/spark/pull/28043#issuecomment-604828224 **[Test build #120450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120450/testReport)** for PR 28043 at commit [`ba2e6b4`](https://github.com/apache/spark/commit/ba2e6b4ee0924cf4d735c8410211ea123ded7a66). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery
AmplabJenkins commented on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery URL: https://github.com/apache/spark/pull/28043#issuecomment-604828852 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery
AmplabJenkins removed a comment on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery URL: https://github.com/apache/spark/pull/28043#issuecomment-604828852 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery
AmplabJenkins removed a comment on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery URL: https://github.com/apache/spark/pull/28043#issuecomment-604828553 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery
SparkQA commented on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery URL: https://github.com/apache/spark/pull/28043#issuecomment-604828843 **[Test build #120450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120450/testReport)** for PR 28043 at commit [`ba2e6b4`](https://github.com/apache/spark/commit/ba2e6b4ee0924cf4d735c8410211ea123ded7a66). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery
AmplabJenkins removed a comment on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery URL: https://github.com/apache/spark/pull/28043#issuecomment-604828558 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25158/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery
AmplabJenkins commented on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery URL: https://github.com/apache/spark/pull/28043#issuecomment-604828558 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25158/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wang-zhun commented on issue #28009: [SPARK-31235][YARN] Separates different categories of applications
wang-zhun commented on issue #28009: [SPARK-31235][YARN] Separates different categories of applications URL: https://github.com/apache/spark/pull/28009#issuecomment-604828562 > Please add test case as Thomas suggested, thanks! Ok sorry i didn't notice This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery
AmplabJenkins commented on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery URL: https://github.com/apache/spark/pull/28043#issuecomment-604828553 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window URL: https://github.com/apache/spark/pull/27943#issuecomment-604828095 Thank you for your patient review, thanks a lot @tgravescs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery
SparkQA commented on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery URL: https://github.com/apache/spark/pull/28043#issuecomment-604828224 **[Test build #120450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120450/testReport)** for PR 28043 at commit [`ba2e6b4`](https://github.com/apache/spark/commit/ba2e6b4ee0924cf4d735c8410211ea123ded7a66). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window URL: https://github.com/apache/spark/pull/27943#issuecomment-604827678 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window URL: https://github.com/apache/spark/pull/27943#issuecomment-604827683 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120448/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window URL: https://github.com/apache/spark/pull/27943#issuecomment-604827678 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window URL: https://github.com/apache/spark/pull/27943#issuecomment-604827683 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120448/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
SparkQA removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window URL: https://github.com/apache/spark/pull/27943#issuecomment-604792133 **[Test build #120448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120448/testReport)** for PR 27943 at commit [`66adc09`](https://github.com/apache/spark/commit/66adc0983a5011b077096e5f9e62c04844793bde). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn opened a new pull request #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery
yaooqinn opened a new pull request #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery URL: https://github.com/apache/spark/pull/28043 ### What changes were proposed in this pull request? ```sql scala> spark.sql(" select * from values(1), (2) t(key) where key in (select 1 as key where 1=0)").queryExecution res15: org.apache.spark.sql.execution.QueryExecution = == Parsed Logical Plan == 'Project [*] +- 'Filter 'key IN (list#39 []) : +- Project [1 AS key#38] : +- Filter (1 = 0) :+- OneRowRelation +- 'SubqueryAlias t +- 'UnresolvedInlineTable [key], [List(1), List(2)] == Analyzed Logical Plan == key: int Project [key#40] +- Filter key#40 IN (list#39 []) : +- Project [1 AS key#38] : +- Filter (1 = 0) :+- OneRowRelation +- SubqueryAlias t +- LocalRelation [key#40] == Optimized Logical Plan == Join LeftSemi, (key#40 = key#38) :- LocalRelation [key#40] +- LocalRelation , [key#38] == Physical Plan == *(1) BroadcastHashJoin [key#40], [key#38], LeftSemi, BuildRight :- *(1) LocalTableScan [key#40] +- Br... ``` `LocalRelation ` should be able to propagate after subqueries are lift up to joins ### Why are the changes needed? optimize query ### Does this PR introduce any user-facing change? no ### How was this patch tested? add new tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery
yaooqinn commented on issue #28043: [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery URL: https://github.com/apache/spark/pull/28043#issuecomment-604827206 cc @cloud-fan @dongjoon-hyun, thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window URL: https://github.com/apache/spark/pull/27943#issuecomment-604827078 **[Test build #120448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120448/testReport)** for PR 27943 at commit [`66adc09`](https://github.com/apache/spark/commit/66adc0983a5011b077096e5f9e62c04844793bde). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
AmplabJenkins removed a comment on issue #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#issuecomment-604826658 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
AmplabJenkins removed a comment on issue #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#issuecomment-604826664 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25157/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
AmplabJenkins commented on issue #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#issuecomment-604826664 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25157/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
AmplabJenkins commented on issue #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#issuecomment-604826658 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#discussion_r399047030 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -135,6 +143,7 @@ private[spark] object HiveUtils extends Logging { "that should be shared is JDBC drivers that are needed to talk to the metastore. Other " + "classes that need to be shared are those that interact with classes that are already " + "shared. For example, custom appenders that are used by log4j.") +.version("1.4.0") Review comment: SPARK-7491, commit ID: a8556086d33cb993fab0ae2751e31455e6c664ab#diff-ff50aea397a607b79df9bec6f2a841db This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#discussion_r399047055 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -146,12 +155,14 @@ private[spark] object HiveUtils extends Logging { .doc("A comma separated list of class prefixes that should explicitly be reloaded for each " + "version of Hive that Spark SQL is communicating with. For example, Hive UDFs that are " + "declared in a prefix that typically would be shared (i.e. org.apache.spark.*).") +.version("1.4.0") Review comment: SPARK-7491, commit ID: a8556086d33cb993fab0ae2751e31455e6c664ab#diff-ff50aea397a607b79df9bec6f2a841db This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
SparkQA commented on issue #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#issuecomment-604826376 **[Test build #120449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120449/testReport)** for PR 28042 at commit [`3b3cb8c`](https://github.com/apache/spark/commit/3b3cb8c1cae88575c7c0bcf98a20d5d06f479267). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#discussion_r399047144 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -146,12 +155,14 @@ private[spark] object HiveUtils extends Logging { .doc("A comma separated list of class prefixes that should explicitly be reloaded for each " + "version of Hive that Spark SQL is communicating with. For example, Hive UDFs that are " + "declared in a prefix that typically would be shared (i.e. org.apache.spark.*).") +.version("1.4.0") .stringConf .toSequence .createWithDefault(Nil) val HIVE_THRIFT_SERVER_ASYNC = buildConf("spark.sql.hive.thriftServer.async") .doc("When set to true, Hive Thrift server executes SQL queries in an asynchronous way.") +.version("1.5.0") Review comment: SPARK-6964, commit ID: eb19d3f75cbd002f7e72ce02017a8de67f562792#diff-ff50aea397a607b79df9bec6f2a841db This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on issue #27985: [SPARK-31225][SQL] Override sql method of OuterReference
yaooqinn commented on issue #27985: [SPARK-31225][SQL] Override sql method of OuterReference URL: https://github.com/apache/spark/pull/27985#issuecomment-604826277 gentle ping @maropu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#discussion_r399046813 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -118,6 +124,7 @@ private[spark] object HiveUtils extends Logging { "`spark.sql.hive.convertMetastoreOrc` is true, the built-in ORC/Parquet writer is used" + "to process inserting into partitioned ORC/Parquet tables created by using the HiveSQL " + "syntax.") + .version("3.0.0") Review comment: SPARK-28573, commit ID: d5688dc732890923c326f272b0c18c329a69459a#diff-842e3447fc453de26c706db1cac8f2c4 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#discussion_r399046720 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -103,12 +107,14 @@ private[spark] object HiveUtils extends Logging { .doc("When true, also tries to merge possibly different but compatible Parquet schemas in " + "different Parquet data files. This configuration is only effective " + "when \"spark.sql.hive.convertMetastoreParquet\" is true.") + .version("1.3.1") .booleanConf .createWithDefault(false) val CONVERT_METASTORE_ORC = buildConf("spark.sql.hive.convertMetastoreOrc") .doc("When set to true, the built-in ORC reader and writer are used to process " + "ORC tables created by using the HiveQL syntax, instead of Hive serde.") +.version("2.0.0") Review comment: SPARK-14070, commit ID: 1e886159849e3918445d3fdc3c4cef86c6c1a236#diff-ff50aea397a607b79df9bec6f2a841db This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#discussion_r399046930 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -126,6 +133,7 @@ private[spark] object HiveUtils extends Logging { "instead of Hive serde in CTAS. This flag is effective only if " + "`spark.sql.hive.convertMetastoreParquet` or `spark.sql.hive.convertMetastoreOrc` is " + "enabled respectively for Parquet and ORC formats") +.version("3.0.0") Review comment: SPARK-25271, commit ID: 5ad03607d1487e7ab3e3b6d00eef9c4028ed4975#diff-842e3447fc453de26c706db1cac8f2c4 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#discussion_r399046428 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -89,12 +91,14 @@ private[spark] object HiveUtils extends Logging { | Use Hive jars of specified version downloaded from Maven repositories. | 3. A classpath in the standard format for both Hive and Hadoop. """.stripMargin) +.version("1.4.0") Review comment: SPARK-6908, commit ID: 05454fd8aef75b129cbbd0288f5089c5259f4a15#diff-ff50aea397a607b79df9bec6f2a841db This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#discussion_r399046516 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -89,12 +91,14 @@ private[spark] object HiveUtils extends Logging { | Use Hive jars of specified version downloaded from Maven repositories. | 3. A classpath in the standard format for both Hive and Hadoop. """.stripMargin) +.version("1.4.0") .stringConf .createWithDefault("builtin") val CONVERT_METASTORE_PARQUET = buildConf("spark.sql.hive.convertMetastoreParquet") .doc("When set to true, the built-in Parquet reader and writer are used to process " + "parquet tables created by using the HiveQL syntax, instead of Hive serde.") +.version("1.1.1") Review comment: SPARK-2406, commit ID: cc4015d2fa3785b92e6ab079b3abcf17627f7c56#diff-ff50aea397a607b79df9bec6f2a841db This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#discussion_r399046631 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -103,12 +107,14 @@ private[spark] object HiveUtils extends Logging { .doc("When true, also tries to merge possibly different but compatible Parquet schemas in " + "different Parquet data files. This configuration is only effective " + "when \"spark.sql.hive.convertMetastoreParquet\" is true.") + .version("1.3.1") Review comment: SPARK-6575, commit ID: 778c87686af0c04df9dfe144b8f744f271a988ad#diff-ff50aea397a607b79df9bec6f2a841db This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#discussion_r399046266 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -65,6 +65,7 @@ private[spark] object HiveUtils extends Logging { .doc("Version of the Hive metastore. Available options are " + "0.12.0 through 2.3.6 and " + "3.0.0 through 3.1.2.") +.version("1.4.0") Review comment: SPARK-6908, commit ID: 05454fd8aef75b129cbbd0288f5089c5259f4a15#diff-ff50aea397a607b79df9bec6f2a841db This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
beliefer commented on a change in pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042#discussion_r399046361 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -73,6 +74,7 @@ private[spark] object HiveUtils extends Logging { // already rely on this config. val FAKE_HIVE_VERSION = buildConf("spark.sql.hive.version") .doc(s"deprecated, please use ${HIVE_METASTORE_VERSION.key} to get the Hive version in Spark.") +.version("1.1.1") Review comment: SPARK-3971, commit ID: 64945f868443fbc59cb34b34c16d782dda0fb63d#diff-12fa2178364a810b3262b30d8d48aa2d This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer opened a new pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive
beliefer opened a new pull request #28042: [SPARK-31279][SQL][DOC] Add version information to the configuration of Hive URL: https://github.com/apache/spark/pull/28042 ### What changes were proposed in this pull request? Add version information to the configuration of `Hive`. I sorted out some information show below. Item name | Since version | JIRA ID | Commit ID | Note -- | -- | -- | -- | -- spark.sql.hive.metastore.version | 1.4.0 | SPARK-6908 | 05454fd8aef75b129cbbd0288f5089c5259f4a15#diff-ff50aea397a607b79df9bec6f2a841db | spark.sql.hive.version | 1.1.1 | SPARK-3971 | 64945f868443fbc59cb34b34c16d782dda0fb63d#diff-12fa2178364a810b3262b30d8d48aa2d | spark.sql.hive.metastore.jars | 1.4.0 | SPARK-6908 | 05454fd8aef75b129cbbd0288f5089c5259f4a15#diff-ff50aea397a607b79df9bec6f2a841db | spark.sql.hive.convertMetastoreParquet | 1.1.1 | SPARK-2406 | cc4015d2fa3785b92e6ab079b3abcf17627f7c56#diff-ff50aea397a607b79df9bec6f2a841db | spark.sql.hive.convertMetastoreParquet.mergeSchema | 1.3.1 | SPARK-6575 | 778c87686af0c04df9dfe144b8f744f271a988ad#diff-ff50aea397a607b79df9bec6f2a841db | spark.sql.hive.convertMetastoreOrc | 2.0.0 | SPARK-14070 | 1e886159849e3918445d3fdc3c4cef86c6c1a236#diff-ff50aea397a607b79df9bec6f2a841db | spark.sql.hive.convertInsertingPartitionedTable | 3.0.0 | SPARK-28573 | d5688dc732890923c326f272b0c18c329a69459a#diff-842e3447fc453de26c706db1cac8f2c4 | spark.sql.hive.convertMetastoreCtas | 3.0.0 | SPARK-25271 | 5ad03607d1487e7ab3e3b6d00eef9c4028ed4975#diff-842e3447fc453de26c706db1cac8f2c4 | spark.sql.hive.metastore.sharedPrefixes | 1.4.0 | SPARK-7491 | a8556086d33cb993fab0ae2751e31455e6c664ab#diff-ff50aea397a607b79df9bec6f2a841db | spark.sql.hive.metastore.barrierPrefixes | 1.4.0 | SPARK-7491 | a8556086d33cb993fab0ae2751e31455e6c664ab#diff-ff50aea397a607b79df9bec6f2a841db | spark.sql.hive.thriftServer.async | 1.5.0 | SPARK-6964 | eb19d3f75cbd002f7e72ce02017a8de67f562792#diff-ff50aea397a607b79df9bec6f2a841db | ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Exists UT This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28039: [SPARK-31275][WEBUI] Improve the metrics format in ExecutionPage for StageId.
cloud-fan commented on issue #28039: [SPARK-31275][WEBUI] Improve the metrics format in ExecutionPage for StageId. URL: https://github.com/apache/spark/pull/28039#issuecomment-604824078 I'm merging it (master and 3.0) to unblock https://github.com/apache/spark/pull/28037 , thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #28039: [SPARK-31275][WEBUI] Improve the metrics format in ExecutionPage for StageId.
cloud-fan closed pull request #28039: [SPARK-31275][WEBUI] Improve the metrics format in ExecutionPage for StageId. URL: https://github.com/apache/spark/pull/28039 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on issue #28009: [SPARK-31235][YARN] Separates different categories of applications
jiangxb1987 commented on issue #28009: [SPARK-31235][YARN] Separates different categories of applications URL: https://github.com/apache/spark/pull/28009#issuecomment-604822612 Please add test case as Thomas suggested, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wang-zhun commented on issue #28009: [SPARK-31235][YARN] Separates different categories of applications
wang-zhun commented on issue #28009: [SPARK-31235][YARN] Separates different categories of applications URL: https://github.com/apache/spark/pull/28009#issuecomment-604822007 Hi @jiangxb1987 @tgravescs , could you help to review this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on a change in pull request #28002: [SPARK-31233][Core] Enhance RpcTimeoutException Log Message
jiangxb1987 commented on a change in pull request #28002: [SPARK-31233][Core] Enhance RpcTimeoutException Log Message URL: https://github.com/apache/spark/pull/28002#discussion_r399039021 ## File path: core/src/test/scala/org/apache/spark/rpc/netty/NettyRpcEnvSuite.scala ## @@ -42,6 +50,34 @@ class NettyRpcEnvSuite extends RpcEnvSuite with MockitoSugar with TimeLimits { new NettyRpcEnvFactory().create(config) } + test("SPARK-31233: Send message to clientMode RpcEnv with timeout") { Review comment: maybe i'm missing something, but why not create a RpcEnv with clientMode = true? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ScrapCodes commented on a change in pull request #27966: [SPARK-31200][k8s] Switch https for debian mirrors, to avoid Mirror sync i…
ScrapCodes commented on a change in pull request #27966: [SPARK-31200][k8s] Switch https for debian mirrors, to avoid Mirror sync i… URL: https://github.com/apache/spark/pull/27966#discussion_r399038061 ## File path: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/sources.list ## @@ -0,0 +1,20 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# This file is required for switching to https url based mirrors, See SPARK-31200, for more info. +deb https://deb.debian.org/debian buster main Review comment: ahh! gotcha. Tomorrow oracle could be using some other base image. In that case, it might not even be of debian type. Then the fix would be a bit more than this, first check if it is a debian image and is using mirrors with http URLs, and then replace it with the `https` ones. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric URL: https://github.com/apache/spark/pull/28040#issuecomment-604815707 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120444/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric URL: https://github.com/apache/spark/pull/28040#issuecomment-604815704 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric URL: https://github.com/apache/spark/pull/28040#issuecomment-604815704 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric URL: https://github.com/apache/spark/pull/28040#issuecomment-604815707 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120444/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric URL: https://github.com/apache/spark/pull/28040#issuecomment-604763466 **[Test build #120444 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120444/testReport)** for PR 28040 at commit [`4655611`](https://github.com/apache/spark/commit/465561182d48632ad6ea1b4e4079532a8cc5ac69). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric URL: https://github.com/apache/spark/pull/28040#issuecomment-604815495 **[Test build #120444 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120444/testReport)** for PR 28040 at commit [`4655611`](https://github.com/apache/spark/commit/465561182d48632ad6ea1b4e4079532a8cc5ac69). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28004: [SPARK-31204][SQL] HiveResult compatibility for DatasourceV2 command
cloud-fan commented on issue #28004: [SPARK-31204][SQL] HiveResult compatibility for DatasourceV2 command URL: https://github.com/apache/spark/pull/28004#issuecomment-604813272 thanks, merging to master/3.0! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #28004: [SPARK-31204][SQL] HiveResult compatibility for DatasourceV2 command
cloud-fan closed pull request #28004: [SPARK-31204][SQL] HiveResult compatibility for DatasourceV2 command URL: https://github.com/apache/spark/pull/28004 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
viirya commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604809085 Thanks @HyukjinKwon @BryanCutler This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
viirya commented on a change in pull request #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#discussion_r398978882 ## File path: python/pyspark/sql/pandas/conversion.py ## @@ -132,25 +132,35 @@ def toPandas(self): # Below is toPandas without Arrow optimization. pdf = pd.DataFrame.from_records(self.collect(), columns=self.columns) -dtype = {} -for field in self.schema: +dtype = [None] * len(self.schema) +for fieldIdx in range(len(self.schema)): +field = self.schema[fieldIdx] +pandas_col = pdf.iloc[:, fieldIdx] + pandas_type = PandasConversionMixin._to_corrected_pandas_type(field.dataType) # SPARK-21766: if an integer field is nullable and has null values, it can be # inferred by pandas as float column. Once we convert the column with NaN back # to integer type e.g., np.int16, we will hit exception. So we use the inferred # float type, not the corrected type from the schema in this case. if pandas_type is not None and \ not(isinstance(field.dataType, IntegralType) and field.nullable and -pdf[field.name].isnull().any()): -dtype[field.name] = pandas_type +pandas_col.isnull().any()): +dtype[fieldIdx] = pandas_type # Ensure we fall back to nullable numpy types, even when whole column is null: -if isinstance(field.dataType, IntegralType) and pdf[field.name].isnull().any(): -dtype[field.name] = np.float64 -if isinstance(field.dataType, BooleanType) and pdf[field.name].isnull().any(): -dtype[field.name] = np.object +if isinstance(field.dataType, IntegralType) and pandas_col.isnull().any(): +dtype[fieldIdx] = np.float64 +if isinstance(field.dataType, BooleanType) and pandas_col.isnull().any(): +dtype[fieldIdx] = np.object + +df = pd.DataFrame() +for index, t in enumerate(dtype): +if t is not None: +series = pdf.iloc[:, index].astype(t, copy=False) +else: +series = pdf.iloc[:, index] +df.insert(index, self.schema[index].name, series, allow_duplicates=True) Review comment: Looks like so. `insert` calls `_sanitize_column` which makes a copy of the data. But `pdf.iloc[:, index] = pdf.iloc[:, index].astype(t, copy=False)` doesn't work as I replied earlier to @HyukjinKwon. Looks like whether `iloc` returns a view or a copy, may depend on the context. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP) URL: https://github.com/apache/spark/pull/28026#issuecomment-604808159 I don't know who marked comments as resolved so please correct me if I'm wrong, but assuming that @cloud-fan comments to the resolved comment, it doesn't seem @cloud-fan marked comments as resolved. Could we make sure comments are marked as resolved only when both agree? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP) URL: https://github.com/apache/spark/pull/28026#issuecomment-604805210 I understand that we want to make Hive implements the v2 API eventually, but can we focus on syntax unification right now? Let's not change the behavior, for example, EXTERNAL should still be disallowed when creating native data source tables. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
cloud-fan commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP) URL: https://github.com/apache/spark/pull/28026#discussion_r399024977 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java ## @@ -46,6 +46,11 @@ */ String PROP_LOCATION = "location"; + /** + * A reserved property to specify a table was created with EXTERNAL. + */ + String PROP_EXTERNAL = "external"; Review comment: The parse doesn't accept `EXTERNAL` except for hive serde tables. I don't agree that because Hive accepts it then all Spark data sources should accept it. Why not forbid EXTERNAL for hive serde tables? It can be an option as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir
cloud-fan commented on issue #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir URL: https://github.com/apache/spark/pull/27969#issuecomment-604803964 thanks, merging to master/3.0! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir
cloud-fan closed pull request #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir URL: https://github.com/apache/spark/pull/27969 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28004: [SPARK-31204][SQL] HiveResult compatibility for DatasourceV2 command
cloud-fan commented on a change in pull request #28004: [SPARK-31204][SQL] HiveResult compatibility for DatasourceV2 command URL: https://github.com/apache/spark/pull/28004#discussion_r399016671 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/HiveResultSuite.scala ## @@ -68,4 +69,35 @@ class HiveResultSuite extends SharedSparkSession { val result = HiveResult.hiveResultString(executedPlan) assert(result.head === "0.") } + + test("SHOW TABLES in hive result") { +withSQLConf("spark.sql.catalog.testcat" -> classOf[InMemoryTableCatalog].getName) { + Seq(("testcat.ns", "tbl", "foo"), ("spark_catalog.default", "tbl", "csv")).foreach { +case (ns, tbl, source) => + withTable(s"$ns.$tbl") { +spark.sql(s"CREATE TABLE $ns.$tbl (id bigint) USING $source") +val df = spark.sql(s"SHOW TABLES FROM $ns") +val executedPlan = df.queryExecution.executedPlan +assert(HiveResult.hiveResultString(executedPlan).head == tbl) + } + } +} + } + + test("DESCRIBE TABLE in hive result") { +withSQLConf("spark.sql.catalog.testcat" -> classOf[InMemoryTableCatalog].getName) { + Seq(("testcat.ns", "tbl", "foo"), ("spark_catalog.default", "tbl", "csv")).foreach { +case (ns, tbl, source) => + withTable(s"$ns.$tbl") { +spark.sql(s"CREATE TABLE $ns.$tbl (id bigint COMMENT 'col1') USING $source") +val df = spark.sql(s"DESCRIBE $ns.$tbl") +val executedPlan = df.queryExecution.executedPlan +val expected = "id " + + "\tbigint " + + "\tcol1" +assert(HiveResult.hiveResultString(executedPlan).head == expected) Review comment: ok so the number of spaces is also defined bu the Hive. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28004: [SPARK-31204][SQL] HiveResult compatibility for DatasourceV2 command
cloud-fan commented on a change in pull request #28004: [SPARK-31204][SQL] HiveResult compatibility for DatasourceV2 command URL: https://github.com/apache/spark/pull/28004#discussion_r399016671 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/HiveResultSuite.scala ## @@ -68,4 +69,35 @@ class HiveResultSuite extends SharedSparkSession { val result = HiveResult.hiveResultString(executedPlan) assert(result.head === "0.") } + + test("SHOW TABLES in hive result") { +withSQLConf("spark.sql.catalog.testcat" -> classOf[InMemoryTableCatalog].getName) { + Seq(("testcat.ns", "tbl", "foo"), ("spark_catalog.default", "tbl", "csv")).foreach { +case (ns, tbl, source) => + withTable(s"$ns.$tbl") { +spark.sql(s"CREATE TABLE $ns.$tbl (id bigint) USING $source") +val df = spark.sql(s"SHOW TABLES FROM $ns") +val executedPlan = df.queryExecution.executedPlan +assert(HiveResult.hiveResultString(executedPlan).head == tbl) + } + } +} + } + + test("DESCRIBE TABLE in hive result") { +withSQLConf("spark.sql.catalog.testcat" -> classOf[InMemoryTableCatalog].getName) { + Seq(("testcat.ns", "tbl", "foo"), ("spark_catalog.default", "tbl", "csv")).foreach { +case (ns, tbl, source) => + withTable(s"$ns.$tbl") { +spark.sql(s"CREATE TABLE $ns.$tbl (id bigint COMMENT 'col1') USING $source") +val df = spark.sql(s"DESCRIBE $ns.$tbl") +val executedPlan = df.queryExecution.executedPlan +val expected = "id " + + "\tbigint " + + "\tcol1" +assert(HiveResult.hiveResultString(executedPlan).head == expected) Review comment: ok so the number of spaces is also defined by Hive. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window URL: https://github.com/apache/spark/pull/27943#issuecomment-604792486 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
AmplabJenkins removed a comment on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window URL: https://github.com/apache/spark/pull/27943#issuecomment-604792489 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25156/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window URL: https://github.com/apache/spark/pull/27943#issuecomment-604792486 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
AmplabJenkins commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window URL: https://github.com/apache/spark/pull/27943#issuecomment-604792489 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25156/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window
SparkQA commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in fast fail time window URL: https://github.com/apache/spark/pull/27943#issuecomment-604792133 **[Test build #120448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120448/testReport)** for PR 27943 at commit [`66adc09`](https://github.com/apache/spark/commit/66adc0983a5011b077096e5f9e62c04844793bde). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
HyukjinKwon commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604791846 Merged to master, and branch-3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
HyukjinKwon closed pull request #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27665: [SPARK-30623][Core] Spark external shuffle allow disable of separate event loop group
dongjoon-hyun commented on a change in pull request #27665: [SPARK-30623][Core] Spark external shuffle allow disable of separate event loop group URL: https://github.com/apache/spark/pull/27665#discussion_r399010040 ## File path: common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java ## @@ -339,12 +341,25 @@ public int chunkFetchHandlerThreads() { return 0; } int chunkFetchHandlerThreadsPercent = - conf.getInt("spark.shuffle.server.chunkFetchHandlerThreadsPercent", 100); Review comment: Thank you, @xuanyuanking and @cloud-fan . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27966: [SPARK-31200][k8s] Switch https for debian mirrors, to avoid Mirror sync i…
dongjoon-hyun commented on a change in pull request #27966: [SPARK-31200][k8s] Switch https for debian mirrors, to avoid Mirror sync i… URL: https://github.com/apache/spark/pull/27966#discussion_r399005615 ## File path: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile ## @@ -27,6 +27,9 @@ ARG spark_uid=185 # of the Spark distribution. E.g.: # docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile . +RUN echo 'deb https://deb.debian.org/debian buster main' >/etc/apt/sources.list && \ +echo 'deb-src https://deb.debian.org/debian buster main' >>/etc/apt/sources.list + RUN set -ex && \ Review comment: Could you move those new two lines after this line? Then, we can have a single `RUN` command. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27966: [SPARK-31200][k8s] Switch https for debian mirrors, to avoid Mirror sync i…
dongjoon-hyun commented on a change in pull request #27966: [SPARK-31200][k8s] Switch https for debian mirrors, to avoid Mirror sync i… URL: https://github.com/apache/spark/pull/27966#discussion_r399004900 ## File path: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile ## @@ -27,6 +27,9 @@ ARG spark_uid=185 # of the Spark distribution. E.g.: # docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile . +RUN echo 'deb https://deb.debian.org/debian buster main' >/etc/apt/sources.list && \ +echo 'deb-src https://deb.debian.org/debian buster main' >>/etc/apt/sources.list Review comment: nit. - `>/etc/apt/sources.list` -> `> /etc/apt/sources.list` - `>>/etc/apt/sources.list` -> `>> /etc/apt/sources.list` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27966: [SPARK-31200][k8s] Switch https for debian mirrors, to avoid Mirror sync i…
dongjoon-hyun commented on a change in pull request #27966: [SPARK-31200][k8s] Switch https for debian mirrors, to avoid Mirror sync i… URL: https://github.com/apache/spark/pull/27966#discussion_r399005615 ## File path: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile ## @@ -27,6 +27,9 @@ ARG spark_uid=185 # of the Spark distribution. E.g.: # docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile . +RUN echo 'deb https://deb.debian.org/debian buster main' >/etc/apt/sources.list && \ +echo 'deb-src https://deb.debian.org/debian buster main' >>/etc/apt/sources.list + RUN set -ex && \ Review comment: Could you move those new two lines after this line? Then, we can have a single `RUN` command. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27966: [SPARK-31200][k8s] Switch https for debian mirrors, to avoid Mirror sync i…
dongjoon-hyun commented on a change in pull request #27966: [SPARK-31200][k8s] Switch https for debian mirrors, to avoid Mirror sync i… URL: https://github.com/apache/spark/pull/27966#discussion_r399004900 ## File path: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile ## @@ -27,6 +27,9 @@ ARG spark_uid=185 # of the Spark distribution. E.g.: # docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile . +RUN echo 'deb https://deb.debian.org/debian buster main' >/etc/apt/sources.list && \ +echo 'deb-src https://deb.debian.org/debian buster main' >>/etc/apt/sources.list Review comment: nit. - `>/etc/apt/sources.list` -> `> /etc/apt/sources.list` - `>>/etc/apt/sources.list` -> `>> /etc/apt/sources.list` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
AmplabJenkins removed a comment on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604786087 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
AmplabJenkins commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604786092 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120447/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
AmplabJenkins removed a comment on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604786092 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120447/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
AmplabJenkins commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604786087 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
SparkQA removed a comment on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604777972 **[Test build #120447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120447/testReport)** for PR 28025 at commit [`1cf1f12`](https://github.com/apache/spark/commit/1cf1f121f346c93f944feff97e56db0a1a9f7cea). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
SparkQA commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604785811 **[Test build #120447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120447/testReport)** for PR 28025 at commit [`1cf1f12`](https://github.com/apache/spark/commit/1cf1f121f346c93f944feff97e56db0a1a9f7cea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28033: [SPARK-31277][SQL][TESTS] Migrate `DateTimeTestUtils` from `TimeZone` to `ZoneId`
cloud-fan commented on a change in pull request #28033: [SPARK-31277][SQL][TESTS] Migrate `DateTimeTestUtils` from `TimeZone` to `ZoneId` URL: https://github.com/apache/spark/pull/28033#discussion_r399003676 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeTestUtils.scala ## @@ -39,7 +41,7 @@ object DateTimeTestUtils { val UTC_OPT = Option("UTC") - val ALL_TIMEZONES: Seq[TimeZone] = TimeZone.getAvailableIDs.toSeq.map(TimeZone.getTimeZone) + val ALL_TIMEZONES: Seq[ZoneId] = ZoneId.getAvailableZoneIds.asScala.map(getZoneId).toSeq Review comment: I'm wondering if we should only test the `outstandingZoneIds` instead of `ALL_TIMEZONES`. The test can be very slow if we test all timezones. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28033: [SPARK-31277][SQL][TESTS] Migrate `DateTimeTestUtils` from `TimeZone` to `ZoneId`
cloud-fan commented on a change in pull request #28033: [SPARK-31277][SQL][TESTS] Migrate `DateTimeTestUtils` from `TimeZone` to `ZoneId` URL: https://github.com/apache/spark/pull/28033#discussion_r399003351 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala ## @@ -491,7 +490,7 @@ class DateTimeUtilsSuite extends SparkFunSuite with Matchers with SQLHelper { } val tz = LA.getId -withDefaultTimeZone(TimeZone.getTimeZone(tz)) { +withDefaultTimeZone(getZoneId(tz)) { Review comment: nit: just `LA`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir
AmplabJenkins removed a comment on issue #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir URL: https://github.com/apache/spark/pull/27969#issuecomment-604783298 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120441/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir
AmplabJenkins removed a comment on issue #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir URL: https://github.com/apache/spark/pull/27969#issuecomment-604783295 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir
AmplabJenkins commented on issue #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir URL: https://github.com/apache/spark/pull/27969#issuecomment-604783298 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120441/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir
AmplabJenkins commented on issue #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir URL: https://github.com/apache/spark/pull/27969#issuecomment-604783295 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir
SparkQA removed a comment on issue #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir URL: https://github.com/apache/spark/pull/27969#issuecomment-604693947 **[Test build #120441 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120441/testReport)** for PR 27969 at commit [`6dddf02`](https://github.com/apache/spark/commit/6dddf02bfbba6ce71fea0b64c8aaf1e6c30a1a63). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28039: [SPARK-31275][WEBUI] Improve the metrics format in ExecutionPage for StageId.
cloud-fan commented on issue #28039: [SPARK-31275][WEBUI] Improve the metrics format in ExecutionPage for StageId. URL: https://github.com/apache/spark/pull/28039#issuecomment-604782673 LGTM, cc @tgravescs @gengliangwang This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir
SparkQA commented on issue #27969: [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir URL: https://github.com/apache/spark/pull/27969#issuecomment-604782814 **[Test build #120441 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120441/testReport)** for PR 27969 at commit [`6dddf02`](https://github.com/apache/spark/commit/6dddf02bfbba6ce71fea0b64c8aaf1e6c30a1a63). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
AmplabJenkins commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604779226 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
AmplabJenkins commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604779233 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120446/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
AmplabJenkins removed a comment on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604779233 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120446/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
AmplabJenkins removed a comment on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604779226 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
SparkQA removed a comment on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604771064 **[Test build #120446 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120446/testReport)** for PR 28025 at commit [`b8e69e0`](https://github.com/apache/spark/commit/b8e69e0eba86fdf308d708da91fb82107f4f084c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
SparkQA commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604778937 **[Test build #120446 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120446/testReport)** for PR 28025 at commit [`b8e69e0`](https://github.com/apache/spark/commit/b8e69e0eba86fdf308d708da91fb82107f4f084c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinooganesh commented on issue #24807: [SPARK-27958][SQL] Stopping a SparkSession should not always stop Spark Context
vinooganesh commented on issue #24807: [SPARK-27958][SQL] Stopping a SparkSession should not always stop Spark Context URL: https://github.com/apache/spark/pull/24807#issuecomment-604778033 Hi All - Apologies for the delay here (I actually switched companies since I initially worked on this and realized I didn't have access to the work I previously did anymore). I did some brushing up on this PR (it's been a while since I've touched it), and re-found a few things. 1. @HyukjinKwon - to your point about fixing the leak without changing the contract between a `SparkContext` and `SparkSession`, it's actually not a trivial fix. The leak comes from the fact that the `SparkSession` attaches a listener to singleton `SparkContext` (which the context will forever hold onto). Logically, at the end of the lifecycle of the `SparkSession` instance, I should be able to drop the listener from the context. We can do this a per-instance basis for a spark session, but that's where the weirdness comes in (see point 2). 2. The weirdness comes from the fact that lifecycle operations are permitted on both the singleton `SparkSession` object as well as the `session` instance created by `SparkSession.getOrCreate(...)`. Specifically, I can call `stop()` and kill the `SparkContext` on *any instance* of a spark session. That seems wrong, especially given an expected operating model where the active session and default session can be different. 3. The concrete problem here is that there isn't a way to clean up an instance of a spark session without killing the context as a whole. Here's my proposal: 1. I'll need to introduce a way to "end" an instance of a SparkSession (ie. mark it ready to be GCed) on a per-instance basis (the class, not the singleton) to fix the memory leak. I propose adding a new lifecycle method (maybe `end()`?) to mark an instance of a spark session as ready to be removed. In this method, I'll also clean up the listener leak. 2. The singleton (the `SparkSession` object) methods `clearActiveSession()` and `clearDefaultSession()` operate in a kind of strange way. The latter drops the singleton's reference to the default session, but doesn't actually clean up the listener state associated with the context. We've been able to get by the issues here thus far simply because there isn't truly a way to stop a spark session - stopping the session, just stops the context. If these sessions are meant to be lightweight, then we need a way to spin these up and tear them down easily, without affecting the underlying context. Meaning in a regular operating mode I could mark the instance of my spark session for GC (ie. spark.end()), and have the garbage collector clean it up (unless the `SparkSession` object still has a reference to it - which is expected behavior). 3. The point is valid that folks rely on the - albeit strange - behavior of stopping a session stoping the global context. We can largely leave the current `spark.stop()` method unaffected (though I think we should rename / proxy this to a new `spark.stopContext()` method). The plan above fixes the leak and changes the operating model without affecting the underlying functionality that people have been aware of. It still leaves it to the user to stop the context manually at the end of the operation of their last SparkSession, but I think it's an improvement to what we have now. Thoughts? cc @cloud-fan @jiangxb1987 @srowen @rdblue This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
SparkQA commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604777972 **[Test build #120447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120447/testReport)** for PR 28025 at commit [`1cf1f12`](https://github.com/apache/spark/commit/1cf1f121f346c93f944feff97e56db0a1a9f7cea). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
AmplabJenkins commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604776695 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
AmplabJenkins commented on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604776703 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25155/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names
AmplabJenkins removed a comment on issue #28025: [SPARK-31186][PySpark][SQL] toPandas should not fail on duplicate column names URL: https://github.com/apache/spark/pull/28025#issuecomment-604776703 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25155/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org