[GitHub] [spark] AmplabJenkins removed a comment on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events
AmplabJenkins removed a comment on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events URL: https://github.com/apache/spark/pull/27164#issuecomment-573545599 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116605/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] uncleGen commented on issue #26201: [SPARK-29543][SS][UI] Init structured streaming ui
uncleGen commented on issue #26201: [SPARK-29543][SS][UI] Init structured streaming ui URL: https://github.com/apache/spark/pull/26201#issuecomment-573546090 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#issuecomment-573545480 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21393/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events
AmplabJenkins removed a comment on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events URL: https://github.com/apache/spark/pull/27164#issuecomment-573545589 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#issuecomment-573545475 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
cloud-fan commented on a change in pull request #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#discussion_r365675326 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2Unresolved.scala ## @@ -19,15 +19,49 @@ package org.apache.spark.sql.catalyst.analysis import org.apache.spark.sql.catalyst.expressions.Attribute import org.apache.spark.sql.catalyst.plans.logical.LeafNode -import org.apache.spark.sql.connector.catalog.SupportsNamespaces +import org.apache.spark.sql.connector.catalog.{Identifier, SupportsNamespaces, Table, TableCatalog} + +/** + * Holds the name of a namespace that has yet to be looked up in a catalog. It will be resolved to + * [[ResolvedNamespace]] during analysis. + */ +case class UnresolvedNamespace(multipartIdentifier: Seq[String]) extends LeafNode { Review comment: how about `v2ResolutionPlans`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
yaooqinn commented on a change in pull request #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#discussion_r365674995 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2Unresolved.scala ## @@ -19,15 +19,49 @@ package org.apache.spark.sql.catalyst.analysis import org.apache.spark.sql.catalyst.expressions.Attribute import org.apache.spark.sql.catalyst.plans.logical.LeafNode -import org.apache.spark.sql.connector.catalog.SupportsNamespaces +import org.apache.spark.sql.connector.catalog.{Identifier, SupportsNamespaces, Table, TableCatalog} + +/** + * Holds the name of a namespace that has yet to be looked up in a catalog. It will be resolved to + * [[ResolvedNamespace]] during analysis. + */ +case class UnresolvedNamespace(multipartIdentifier: Seq[String]) extends LeafNode { Review comment: nit: `v2Unresolved` seems not to fit all This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events
AmplabJenkins commented on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events URL: https://github.com/apache/spark/pull/27164#issuecomment-573545599 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116605/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events
SparkQA removed a comment on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events URL: https://github.com/apache/spark/pull/27164#issuecomment-573519700 **[Test build #116605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116605/testReport)** for PR 27164 at commit [`5f37b64`](https://github.com/apache/spark/commit/5f37b64010bf669d1426db53e9e4c35770cf37b4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events
AmplabJenkins commented on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events URL: https://github.com/apache/spark/pull/27164#issuecomment-573545589 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
AmplabJenkins commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#issuecomment-573545475 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
AmplabJenkins commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#issuecomment-573545480 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21393/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
SparkQA commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#issuecomment-573545090 **[Test build #116611 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116611/testReport)** for PR 27187 at commit [`4e0e8c1`](https://github.com/apache/spark/commit/4e0e8c11031cf9b38f8970996c08190b5c2eeac7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events
SparkQA commented on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events URL: https://github.com/apache/spark/pull/27164#issuecomment-573545284 **[Test build #116605 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116605/testReport)** for PR 27164 at commit [`5f37b64`](https://github.com/apache/spark/commit/5f37b64010bf669d1426db53e9e4c35770cf37b4). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable AQE
SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-573545115 **[Test build #116612 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116612/testReport)** for PR 26813 at commit [`8b5e744`](https://github.com/apache/spark/commit/8b5e7442c63fe326db7c7f46f7a194fbae8f0d46). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26921: [SPARK-30282][SQL] Integrate V2 commands with UnresolvedV2Relation into new resolution framework
cloud-fan commented on a change in pull request #26921: [SPARK-30282][SQL] Integrate V2 commands with UnresolvedV2Relation into new resolution framework URL: https://github.com/apache/spark/pull/26921#discussion_r365673895 ## File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala ## @@ -201,22 +201,21 @@ class ResolveSessionCatalog( case RenameTableStatement(SessionCatalogAndTable(_, oldName), newNameParts, isView) => AlterTableRenameCommand(oldName.asTableIdentifier, newNameParts.asTableIdentifier, isView) -case DescribeTableStatement( - nameParts @ SessionCatalogAndTable(catalog, tbl), partitionSpec, isExtended) => - loadTable(catalog, tbl.asIdentifier).collect { -case v1Table: V1Table => - DescribeTableCommand(tbl.asTableIdentifier, partitionSpec, isExtended) - }.getOrElse { -// The v1 `DescribeTableCommand` can describe view as well. -if (isView(tbl)) { - DescribeTableCommand(tbl.asTableIdentifier, partitionSpec, isExtended) -} else { - if (partitionSpec.nonEmpty) { -throw new AnalysisException("DESCRIBE TABLE does not support partition for v2 tables.") +case d @ DescribeTable(SessionCatalogAndResolvedTable(resolved), partitionSpec, isExtended) => + resolved.table match { +case _: V1Table => + DescribeTableCommand(getTableIdentifier(resolved), partitionSpec, isExtended) +case _ => + // The v1 `DescribeTableCommand` can describe view as well. + if (isView(resolved.identifier.asMultipartIdentifier)) { Review comment: I've open https://github.com/apache/spark/pull/27187 to refine it. This is a feature we missed when designing the new framework, so I opened a separated PR to update the framework to support accepting both table and view like DESCRIBE command. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] guykhazma edited a comment on issue #27157: [SPARK-30475][SQL] File source V2: Push data filters for file listing
guykhazma edited a comment on issue #27157: [SPARK-30475][SQL] File source V2: Push data filters for file listing URL: https://github.com/apache/spark/pull/27157#issuecomment-573543733 @gengliangwang by `"data skipping uniformly for all file based data sources"` I mean that the above approach works uniformly for all formats whether they support pushdown or not. (It has also benefits for formats which support pushdown such as parquet by avoiding the need to read the footer of each file). See for example this [Spark Summit talk](https://databricks.com/session/using-pluggable-apache-spark-sql-filters-to-help-gridpocket-users-keep-up-with-the-jones-and-save-the-planet). Note that in datasource v1 the `dataFilters` are also passed to the `listFiles` method in the [`FileSourceScanExec`](https://github.com/apache/spark/blob/eefcc7d762a627bf19cab7041a1a82f88862e7e1/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L210) case class which is used by all of the file based datasources. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration
HeartSaVioR edited a comment on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration URL: https://github.com/apache/spark/pull/27146#issuecomment-573542906 Btw I'm also seeing different understanding of the section "Does this PR introduce any user-facing change?" around many open PRs. My understanding of intention for the section is emphasizing the fact and enumerating if there's any behavioral changes / API side changes so that end users are likely to change their query/code. (So if the answer of section is yes then the patch should have to be reviewed carefully.) Expanding this to anything end users are facing would lead the answer of section to be most likely "yes", lighten the meaning of the section. I might be missing anything, welcome discussion around this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27176: [WIP][SPIP]Support year-month and day-time interval types
AmplabJenkins removed a comment on issue #27176: [WIP][SPIP]Support year-month and day-time interval types URL: https://github.com/apache/spark/pull/27176#issuecomment-573543302 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116603/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
cloud-fan commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#issuecomment-573543790 cc @imback82 @yaooqinn @viirya @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] guykhazma commented on issue #27157: [SPARK-30475][SQL] File source V2: Push data filters for file listing
guykhazma commented on issue #27157: [SPARK-30475][SQL] File source V2: Push data filters for file listing URL: https://github.com/apache/spark/pull/27157#issuecomment-573543733 @gengliangwang by `"data skipping uniformly for all file based data sources"` I mean that the above approach works uniformly for all formats whether they support pushdown or not. (It has also benefits for formats which support pushdown such as parquet by avoiding the need to read the footer of each file). See for example this [Spark Summit talk](https://databricks.com/session/using-pluggable-apache-spark-sql-filters-to-help-gridpocket-users-keep-up-with-the-jones-and-save-the-planet). Note that in datasource v1 the `dataFilters` are also passed to the `listFiles` method in the [`FileSourceScanExec`](https://github.com/apache/spark/blob/eefcc7d762a627bf19cab7041a1a82f88862e7e1/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L210) case class which is used by all of the file based datasources. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable AQE
AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-573543303 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
cloud-fan commented on a change in pull request #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#discussion_r365673286 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2Unresolved.scala ## @@ -19,15 +19,49 @@ package org.apache.spark.sql.catalyst.analysis import org.apache.spark.sql.catalyst.expressions.Attribute import org.apache.spark.sql.catalyst.plans.logical.LeafNode -import org.apache.spark.sql.connector.catalog.SupportsNamespaces +import org.apache.spark.sql.connector.catalog.{Identifier, SupportsNamespaces, Table, TableCatalog} + +/** + * Holds the name of a namespace that has yet to be looked up in a catalog. It will be resolved to + * [[ResolvedNamespace]] during analysis. + */ +case class UnresolvedNamespace(multipartIdentifier: Seq[String]) extends LeafNode { Review comment: move these new unresolve plans to one file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-573543313 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21392/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable AQE
AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-573543311 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21391/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration
HeartSaVioR edited a comment on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration URL: https://github.com/apache/spark/pull/27146#issuecomment-573542906 Btw I'm also seeing different understanding of the section "Does this PR introduce any user-facing change?". My understanding of intention for the section is emphasizing the fact and enumerating if there's any behavioral changes / API side changes so that end users are likely to change their query/code. (So if the answer of section is yes then the patch should have to be reviewed carefully.) Expanding this to anything end users are facing would lead the answer of section to be most likely "yes", lighten the meaning of the section. I might be missing anything, welcome discussion around this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27176: [WIP][SPIP]Support year-month and day-time interval types
AmplabJenkins removed a comment on issue #27176: [WIP][SPIP]Support year-month and day-time interval types URL: https://github.com/apache/spark/pull/27176#issuecomment-573543296 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-573543305 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27176: [WIP][SPIP]Support year-month and day-time interval types
AmplabJenkins commented on issue #27176: [WIP][SPIP]Support year-month and day-time interval types URL: https://github.com/apache/spark/pull/27176#issuecomment-573543296 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable AQE
AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-573543311 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21391/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-573543305 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-573543313 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21392/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable AQE
AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-573543303 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24990: [SPARK-28191][SS] New data source - state - reader part
AmplabJenkins commented on issue #24990: [SPARK-28191][SS] New data source - state - reader part URL: https://github.com/apache/spark/pull/24990#issuecomment-573543177 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27176: [WIP][SPIP]Support year-month and day-time interval types
AmplabJenkins commented on issue #27176: [WIP][SPIP]Support year-month and day-time interval types URL: https://github.com/apache/spark/pull/27176#issuecomment-573543302 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116603/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24990: [SPARK-28191][SS] New data source - state - reader part
AmplabJenkins removed a comment on issue #24990: [SPARK-28191][SS] New data source - state - reader part URL: https://github.com/apache/spark/pull/24990#issuecomment-573543177 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24990: [SPARK-28191][SS] New data source - state - reader part
AmplabJenkins removed a comment on issue #24990: [SPARK-28191][SS] New data source - state - reader part URL: https://github.com/apache/spark/pull/24990#issuecomment-573543186 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116588/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
AmplabJenkins commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#issuecomment-573543237 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24990: [SPARK-28191][SS] New data source - state - reader part
AmplabJenkins commented on issue #24990: [SPARK-28191][SS] New data source - state - reader part URL: https://github.com/apache/spark/pull/24990#issuecomment-573543186 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116588/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#issuecomment-573543237 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27176: [WIP][SPIP]Support year-month and day-time interval types
SparkQA removed a comment on issue #27176: [WIP][SPIP]Support year-month and day-time interval types URL: https://github.com/apache/spark/pull/27176#issuecomment-573516694 **[Test build #116603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116603/testReport)** for PR 27176 at commit [`7ad45b4`](https://github.com/apache/spark/commit/7ad45b484c02b5a1e893c956e309a45f7d28e87b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
AmplabJenkins commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#issuecomment-573543246 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21390/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#issuecomment-573543246 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21390/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration
HeartSaVioR commented on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration URL: https://github.com/apache/spark/pull/27146#issuecomment-573542906 Btw I'm also seeing different understanding of the section "Does this PR introduce any user-facing change?". My understanding of intention for the section is emphasizing the fact and enumerating if there's any behavioral changes / API side changes so that end users are likely to change their query/code. Expanding this to anything end users are facing would lead the answer of section to be most likely "yes", lighten the meaning of the section. I might be missing anything, welcome discussion around this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27176: [WIP][SPIP]Support year-month and day-time interval types
SparkQA commented on issue #27176: [WIP][SPIP]Support year-month and day-time interval types URL: https://github.com/apache/spark/pull/27176#issuecomment-573542945 **[Test build #116603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116603/testReport)** for PR 27176 at commit [`7ad45b4`](https://github.com/apache/spark/commit/7ad45b484c02b5a1e893c956e309a45f7d28e87b). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24990: [SPARK-28191][SS] New data source - state - reader part
SparkQA removed a comment on issue #24990: [SPARK-28191][SS] New data source - state - reader part URL: https://github.com/apache/spark/pull/24990#issuecomment-573492657 **[Test build #116588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116588/testReport)** for PR 24990 at commit [`64a08b9`](https://github.com/apache/spark/commit/64a08b957e085b5489b0278d0972889d6e1f1e20). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-573542861 **[Test build #116610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116610/testReport)** for PR 26434 at commit [`cee1c8c`](https://github.com/apache/spark/commit/cee1c8cb7b4c4714dd3d59fbaa693d1dab217767). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
SparkQA commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#issuecomment-573542839 **[Test build #116609 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116609/testReport)** for PR 27187 at commit [`0b729b9`](https://github.com/apache/spark/commit/0b729b9b2913e03ca9eefa4a3c58f1b990fa2e24). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24990: [SPARK-28191][SS] New data source - state - reader part
SparkQA commented on issue #24990: [SPARK-28191][SS] New data source - state - reader part URL: https://github.com/apache/spark/pull/24990#issuecomment-573542708 **[Test build #116588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116588/testReport)** for PR 24990 at commit [`64a08b9`](https://github.com/apache/spark/commit/64a08b957e085b5489b0278d0972889d6e1f1e20). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types
HyukjinKwon commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types URL: https://github.com/apache/spark/pull/27165#issuecomment-573542451 Should be ready for a look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
JkSelf commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-573542055 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan opened a new pull request #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
cloud-fan opened a new pull request #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187 ### What changes were proposed in this pull request? Use the new framework to resolve the DESCRIBE TABLE command. The v1 DESCRIBE TABLE command supports both table and view. Checked with Hive and Presto, they don't have DESCRIBE TABLE but only DESCRIBE, which supports both table and view: 1. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DescribeTable/View/MaterializedView/Column 2. https://prestodb.io/docs/current/sql/describe.html We should make it clear that DESCRIBE support both table and view, by renaming the command to `DescribeRelation`. This PR also tunes the framework a little bit to support the case that a command accepts both table and view. ### Why are the changes needed? This is a part of effort to make the relation lookup behavior consistent: SPARK-2990. Note that I make a separated PR here, as I need to update the framework to support a new use case: accept both table and view. ### Does this PR introduce any user-facing change? no ### How was this patch tested? existing tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #27181: [SPARK-30493][PYTHON][ML] Remove OneVsRestModel setClassifier, setLabelCol and setWeightCol methods
HyukjinKwon commented on issue #27181: [SPARK-30493][PYTHON][ML] Remove OneVsRestModel setClassifier, setLabelCol and setWeightCol methods URL: https://github.com/apache/spark/pull/27181#issuecomment-573541927 Before merging it, can we update PR description properly please? It's best to avoid having bad PR examples. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
AmplabJenkins removed a comment on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit' URL: https://github.com/apache/spark/pull/27186#issuecomment-573541230 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
AmplabJenkins removed a comment on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit' URL: https://github.com/apache/spark/pull/27186#issuecomment-573541231 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21389/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
AmplabJenkins commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit' URL: https://github.com/apache/spark/pull/27186#issuecomment-573541230 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
AmplabJenkins commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit' URL: https://github.com/apache/spark/pull/27186#issuecomment-573541231 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21389/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
SparkQA commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit' URL: https://github.com/apache/spark/pull/27186#issuecomment-573540951 **[Test build #116608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116608/testReport)** for PR 27186 at commit [`057e4e9`](https://github.com/apache/spark/commit/057e4e99c782aafa235b081ac4247d1a0314a1a1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration
HeartSaVioR edited a comment on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration URL: https://github.com/apache/spark/pull/27146#issuecomment-573540460 Thanks all for reviewing and merging! @dongjoon-hyun What about enhancing the steps on contribution for documentation? (Either adding this to 'contributing' page or even adding this to PR template.) I'm not sure we explicitly have some requirements, and it would be helpful if we standardize this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration
HeartSaVioR commented on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration URL: https://github.com/apache/spark/pull/27146#issuecomment-573540460 Thanks all for reviewing and merging! @dongjoon-hyun What about enhancing the steps on contribution for documentation? I'm not sure we explicitly have some requirements, and it would be helpful if we standardize this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
HyukjinKwon commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit' URL: https://github.com/apache/spark/pull/27186#issuecomment-573539570 cc @HeartSaVioR, @dongjoon-hyun, @xuanyuanking This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
HyukjinKwon edited a comment on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit' URL: https://github.com/apache/spark/pull/27186#issuecomment-573539570 cc @HeartSaVioR, @dongjoon-hyun, @xuanyuanking, @MaxGekk This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure
HyukjinKwon closed pull request #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure URL: https://github.com/apache/spark/pull/27183 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
HyukjinKwon opened a new pull request #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit' URL: https://github.com/apache/spark/pull/27186 ### What changes were proposed in this pull request? This PR proposes to increase the memory in `WorkerMemoryTest.test_memory_limit` in order to make the test pass with PyPy. The test is currently failed only in PyPy as below: ``` Current mem limits: 18446744073709551615 of max 18446744073709551615 Setting mem limits to 1048576 of max 1048576 RPython traceback: File "pypy_module_pypyjit_interp_jit.c", line 289, in portal_5 File "pypy_interpreter_pyopcode.c", line 3468, in handle_bytecode__AccessDirect_None File "pypy_interpreter_pyopcode.c", line 5558, in dispatch_bytecode__AccessDirect_None out of memory: couldn't allocate the next arena ERROR ``` It seems related to how PyPy allocates the memory and GC works PyPy-specifically. There seems nothing wrong in this configuration implementation itself in PySpark side. I roughly tested in higher PyPy versions on Ubuntu and this test seems passing fine so I suspect this might be an issue in old PyPy behaviours. The change only increases the limit so it would not affect actual memory allocations. It just needs to test if the limit is properly set in worker sides. For clarification, this limit is maximum memory in the machine if not set. ### Why are the changes needed? To make the tests pass and unblock other PRs. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually and Jenkins should test it out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure
HyukjinKwon commented on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure URL: https://github.com/apache/spark/pull/27183#issuecomment-573539312 Made a PR at https://github.com/apache/spark/pull/27186 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart
AmplabJenkins removed a comment on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart URL: https://github.com/apache/spark/pull/24173#issuecomment-573538923 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116590/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption
AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption URL: https://github.com/apache/spark/pull/25965#issuecomment-573539020 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption
AmplabJenkins commented on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption URL: https://github.com/apache/spark/pull/25965#issuecomment-573539020 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart
AmplabJenkins removed a comment on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart URL: https://github.com/apache/spark/pull/24173#issuecomment-573538913 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption
AmplabJenkins commented on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption URL: https://github.com/apache/spark/pull/25965#issuecomment-573539029 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116587/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption
AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption URL: https://github.com/apache/spark/pull/25965#issuecomment-573539029 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116587/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart
AmplabJenkins commented on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart URL: https://github.com/apache/spark/pull/24173#issuecomment-573538923 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116590/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart
AmplabJenkins commented on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart URL: https://github.com/apache/spark/pull/24173#issuecomment-573538913 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption
SparkQA commented on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption URL: https://github.com/apache/spark/pull/25965#issuecomment-573538544 **[Test build #116587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116587/testReport)** for PR 25965 at commit [`726d920`](https://github.com/apache/spark/commit/726d920db6969a46174f4dac96db8af1a9991cad). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart
SparkQA removed a comment on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart URL: https://github.com/apache/spark/pull/24173#issuecomment-573492660 **[Test build #116590 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116590/testReport)** for PR 24173 at commit [`1fcfff5`](https://github.com/apache/spark/commit/1fcfff5c2ca78049eb38cf4ef7c041d0005ab9b3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption
SparkQA removed a comment on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption URL: https://github.com/apache/spark/pull/25965#issuecomment-573492649 **[Test build #116587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116587/testReport)** for PR 25965 at commit [`726d920`](https://github.com/apache/spark/commit/726d920db6969a46174f4dac96db8af1a9991cad). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart
SparkQA commented on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart URL: https://github.com/apache/spark/pull/24173#issuecomment-573538431 **[Test build #116590 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116590/testReport)** for PR 24173 at commit [`1fcfff5`](https://github.com/apache/spark/commit/1fcfff5c2ca78049eb38cf4ef7c041d0005ab9b3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run
AmplabJenkins removed a comment on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run URL: https://github.com/apache/spark/pull/25987#issuecomment-573537980 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116586/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run
AmplabJenkins removed a comment on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run URL: https://github.com/apache/spark/pull/25987#issuecomment-573537977 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run
AmplabJenkins commented on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run URL: https://github.com/apache/spark/pull/25987#issuecomment-573537980 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116586/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui
HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui URL: https://github.com/apache/spark/pull/26201#discussion_r365650148 ## File path: core/src/main/scala/org/apache/spark/ui/UIUtils.scala ## @@ -572,4 +628,23 @@ private[spark] object UIUtils extends Logging { def buildErrorResponse(status: Response.Status, msg: String): Response = { Response.status(status).entity(msg).`type`(MediaType.TEXT_PLAIN).build() } + + /** + * There may be different duration labels in each batch. So we need to + * mark those missing duration label as '0d' to avoid UI rending error. + */ + def durationDataPadding( + values: Array[(Long, ju.Map[String, JLong])]): Array[(Long, Map[String, Double])] = { +val operationLabels = values.flatMap(_._2.keySet().asScala).toSet +values.map { case (x, y) => + val dataPadding = operationLabels.map(d => Review comment: nit: let's be consistent on style ``` val dataPadding = operationLabels.map { d => ... } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui
HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui URL: https://github.com/apache/spark/pull/26201#discussion_r365654786 ## File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala ## @@ -68,6 +69,9 @@ class StreamingQueryManager private[sql] (sparkSession: SparkSession) extends Lo logInfo(s"Registered listener ${listener.getClass.getName}") }) } +if (sparkSession.sparkContext.conf.get(UI_ENABLED)) { Review comment: `UI_ENABLED` is being checked twice, and `.get` is called anyway, defeating the purpose of Option. This would behave similar but checks UI_ENABLED only once and leverages Option properly. ``` sparkSession.sharedState.streamingQueryStatusListener.foreach { listener => addListener(listener) } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run
AmplabJenkins commented on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run URL: https://github.com/apache/spark/pull/25987#issuecomment-573537977 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui
HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui URL: https://github.com/apache/spark/pull/26201#discussion_r365656488 ## File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala ## @@ -0,0 +1,170 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming.ui + +import java.text.SimpleDateFormat +import java.util.TimeZone +import javax.servlet.http.HttpServletRequest + +import scala.xml.Node + +import org.apache.commons.lang3.StringEscapeUtils + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.execution.streaming.{QuerySummary, StreamQueryStore} +import org.apache.spark.sql.execution.ui.SQLTab +import org.apache.spark.sql.streaming.StreamingQuery +import org.apache.spark.sql.streaming.ui.UIUtils._ +import org.apache.spark.ui.{UIUtils => SparkUIUtils, WebUIPage} + +class StreamingQueryPage(parent: SQLTab, store: Option[StreamQueryStore]) + extends WebUIPage("streaming") with Logging { + val df = new SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") + df.setTimeZone(TimeZone.getDefault) + + override def render(request: HttpServletRequest): Seq[Node] = { +val content = store.synchronized { + generateStreamingQueryTable(request) +} +SparkUIUtils.headerSparkPage(request, "Streaming Query", content, parent) + } + + def generateDataRow(request: HttpServletRequest, isActive: Boolean) +(streamQuery: (StreamingQuery, Long)): Seq[Node] = { + +val (query, timeSinceStart) = streamQuery +def details(detail: Any): Seq[Node] = { Review comment: Just a note: review comment is not addressed yet. Most of parts are same, and they don't seem to be likely diverged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui
HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui URL: https://github.com/apache/spark/pull/26201#discussion_r365660047 ## File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala ## @@ -0,0 +1,273 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming.ui + +import java.{util => ju} +import java.lang.{Long => JLong} +import java.text.SimpleDateFormat +import java.util.UUID +import javax.servlet.http.HttpServletRequest + +import scala.xml.{Node, Unparsed} + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.catalyst.util.DateTimeUtils.getTimeZone +import org.apache.spark.sql.streaming.ui.UIUtils._ +import org.apache.spark.ui.{GraphUIData, JsCollector, UIUtils => SparkUIUtils, WebUIPage} + +class StreamingQueryStatisticsPage( +parent: StreamingQueryTab, +statusListener: StreamingQueryStatusListener) Review comment: `statusListener` is also available in `StreamingQueryTab`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui
HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui URL: https://github.com/apache/spark/pull/26201#discussion_r365658608 ## File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala ## @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming.ui + +import java.text.SimpleDateFormat +import javax.servlet.http.HttpServletRequest + +import scala.xml.Node + +import org.apache.commons.lang3.StringEscapeUtils + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.catalyst.util.DateTimeUtils.getTimeZone +import org.apache.spark.sql.streaming.ui.UIUtils._ +import org.apache.spark.ui.{UIUtils => SparkUIUtils, WebUIPage} + +class StreamingQueryPage(parent: StreamingQueryTab, statusListener: StreamingQueryStatusListener) +extends WebUIPage("") with Logging { + val df = new SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") + df.setTimeZone(getTimeZone("UTC")) + + override def render(request: HttpServletRequest): Seq[Node] = { +val content = generateStreamingQueryTable(request) +SparkUIUtils.headerSparkPage(request, "Streaming Query", content, parent) + } + + def generateDataRow(request: HttpServletRequest, queryActive: Boolean) +(query: StreamingQueryUIData): Seq[Node] = { + +def details(detail: Any): Seq[Node] = { + if (queryActive) { +return Seq.empty[Node] + } + val s = detail.asInstanceOf[String] + val isMultiline = s.indexOf('\n') >= 0 + val summary = StringEscapeUtils.escapeHtml4( +if (isMultiline) s.substring(0, s.indexOf('\n')) else s + ) + val details = if (isMultiline) { +// scalastyle:off + + +details + ++ + +{s} + +// scalastyle:on + } else { +"" + } + {summary}{details} +} + +val statisticsLink = "%s/%s/statistics?id=%s" + .format(SparkUIUtils.prependBaseUri(request, parent.basePath), parent.prefix, query.runId) + +val name = UIUtils.getQueryName(query) +val status = UIUtils.getQueryStatus(query) +val duration = if (queryActive) { + SparkUIUtils.formatDurationVerbose(System.currentTimeMillis() - query.submitTime) +} else { + withNoProgress(query, { +val endTimeMs = query.lastProgress.timestamp +SparkUIUtils.formatDurationVerbose(df.parse(endTimeMs).getTime - query.submitTime) + }, "-") +} + + + {name} + {status} + {query.id} +{query.runId} + {SparkUIUtils.formatDate(query.submitTime)} + {duration} + {withNoProgress(query, { +(query.recentProgress.map(p => withNumberInvalid(p.inputRowsPerSecond)).sum / Review comment: Given we have a function for constructing exception message, why not add another one for constructing average message for this and below? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui
HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui URL: https://github.com/apache/spark/pull/26201#discussion_r365662919 ## File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming.ui + +import java.util.UUID +import java.util.concurrent.ConcurrentHashMap + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress} + +/** + * A customized StreamingQueryListener used in structured streaming UI, which contains all + * UI data for both active and inactive query. + * TODO: Add support for history server. + */ +class StreamingQueryStatusListener(sqlConf: SQLConf) extends StreamingQueryListener { + + /** + * We use runId as the key here instead of id in active query status map, + * because the runId is unique for every started query, even it its a restart. + */ + private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]() + private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]() + + private val streamingProgressRetention = sqlConf.streamingProgressRetention + private val inactiveQueryStatusRetention = sqlConf.streamingUIInactiveQueryRetention + + override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = { +activeQueryStatus.putIfAbsent(event.runId, + new StreamingQueryUIData(event.name, event.id, event.runId)) + } + + override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = { +val queryStatus = activeQueryStatus.getOrDefault( + event.progress.runId, + new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId)) +queryStatus.updateProcess(event.progress, streamingProgressRetention) + } + + override def onQueryTerminated(event: StreamingQueryListener.QueryTerminatedEvent): Unit = { +val queryStatus = activeQueryStatus.remove(event.runId) +if (queryStatus != null) { + queryStatus.queryTerminated(event) + inactiveQueryStatus.synchronized { +inactiveQueryStatus += queryStatus +while (inactiveQueryStatus.length >= inactiveQueryStatusRetention) { + inactiveQueryStatus.dequeue() +} + } +} + } + + def allQueryStatus: Seq[StreamingQueryUIData] = inactiveQueryStatus.synchronized { +activeQueryStatus.values().asScala.toSeq ++ inactiveQueryStatus + } +} + +/** + * This class contains all message related to UI display, each instance corresponds to a single + * [[org.apache.spark.sql.streaming.StreamingQuery]]. + */ +private[ui] class StreamingQueryUIData( +val name: String, +val id: UUID, +val runId: UUID) { + val submitTime: Long = System.currentTimeMillis() Review comment: Ideally we may want to take timestamp for a batch if this StreamingQueryUIData is constructed from onQueryProgress; otherwise submit time will be greater than batch timestamp. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui
HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui URL: https://github.com/apache/spark/pull/26201#discussion_r365660084 ## File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala ## @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming.ui + +import java.text.SimpleDateFormat +import javax.servlet.http.HttpServletRequest + +import scala.xml.Node + +import org.apache.commons.lang3.StringEscapeUtils + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.catalyst.util.DateTimeUtils.getTimeZone +import org.apache.spark.sql.streaming.ui.UIUtils._ +import org.apache.spark.ui.{UIUtils => SparkUIUtils, WebUIPage} + +class StreamingQueryPage(parent: StreamingQueryTab, statusListener: StreamingQueryStatusListener) Review comment: `statusListener` is also available in `StreamingQueryTab`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run
SparkQA removed a comment on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run URL: https://github.com/apache/spark/pull/25987#issuecomment-573492644 **[Test build #116586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116586/testReport)** for PR 25987 at commit [`0f59ee2`](https://github.com/apache/spark/commit/0f59ee25818b2323199c3789fe4c45ab97326031). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run
SparkQA commented on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run URL: https://github.com/apache/spark/pull/25987#issuecomment-573537576 **[Test build #116586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116586/testReport)** for PR 25987 at commit [`0f59ee2`](https://github.com/apache/spark/commit/0f59ee25818b2323199c3789fe4c45ab97326031). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#discussion_r365667390 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVFilters.scala ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.csv + +import scala.util.Try + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources +import org.apache.spark.sql.types.{BooleanType, StructType} + +/** + * An instance of the class compiles filters to predicates and allows to + * apply the predicates to an internal row with partially initialized values + * converted from parsed CSV fields. + * + * @param filters The filters pushed down to CSV datasource. + * @param dataSchema The full schema with all fields in CSV files. + * @param requiredSchema The schema with only fields requested by the upper layer. + * @param columnPruning true if CSV parser can read sub-set of columns otherwise false. + */ +class CSVFilters( +filters: Seq[sources.Filter], +dataSchema: StructType, +requiredSchema: StructType, +columnPruning: Boolean) { + require(checkFilters(), "All filters must be applicable to the data schema.") + + /** + * The schema to read from the underlying CSV parser. + * It combines the required schema and the fields referenced by filters. + */ + val readSchema: StructType = { +if (columnPruning) { + val refs = filters.flatMap(_.references).toSet + val readFields = dataSchema.filter { field => +requiredSchema.contains(field) || refs.contains(field.name) + } + StructType(readFields) +} else { + dataSchema +} + } + + /** + * Converted filters to predicates and grouped by maximum field index + * in the read schema. For example, if an filter refers to 2 attributes + * attrA with field index 5 and attrB with field index 10 in the read schema: + * 0 === $"attrA" or $"attrB" < 100 + * the filter is compiled to a predicate, and placed to the `predicates` + * array at the position 10. In this way, if there is a row with initialized + * fields from the 0 to 10 index, the predicate can be applied to the row + * to check that the row should be skipped or not. + * Multiple predicates with the same maximum reference index are combined + * by the `And` expression. + */ + private val predicates: Array[BasePredicate] = { +val len = readSchema.fields.length +val groupedPredicates = Array.fill[BasePredicate](len)(null) +if (SQLConf.get.csvFilterPushDown) { + val groupedExprs = Array.fill(len)(Seq.empty[Expression]) + for (filter <- filters) { +val expr = CSVFilters.filterToExpression(filter, toRef) +val refs = filter.references +if (refs.isEmpty) { + // For example, AlwaysTrue and AlwaysFalse doesn't have any references + for (i <- 0 until len) { +groupedExprs(i) ++= expr Review comment: Even more, `AlwaysTrue` could be removed because it does not impact on the result. `AlwaysFalse` could be put at index 0, and other filters can be ignored. But this is some kind of ad-hoc optimization. The optimization above can work for other literal filters. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#discussion_r365667390 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVFilters.scala ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.csv + +import scala.util.Try + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources +import org.apache.spark.sql.types.{BooleanType, StructType} + +/** + * An instance of the class compiles filters to predicates and allows to + * apply the predicates to an internal row with partially initialized values + * converted from parsed CSV fields. + * + * @param filters The filters pushed down to CSV datasource. + * @param dataSchema The full schema with all fields in CSV files. + * @param requiredSchema The schema with only fields requested by the upper layer. + * @param columnPruning true if CSV parser can read sub-set of columns otherwise false. + */ +class CSVFilters( +filters: Seq[sources.Filter], +dataSchema: StructType, +requiredSchema: StructType, +columnPruning: Boolean) { + require(checkFilters(), "All filters must be applicable to the data schema.") + + /** + * The schema to read from the underlying CSV parser. + * It combines the required schema and the fields referenced by filters. + */ + val readSchema: StructType = { +if (columnPruning) { + val refs = filters.flatMap(_.references).toSet + val readFields = dataSchema.filter { field => +requiredSchema.contains(field) || refs.contains(field.name) + } + StructType(readFields) +} else { + dataSchema +} + } + + /** + * Converted filters to predicates and grouped by maximum field index + * in the read schema. For example, if an filter refers to 2 attributes + * attrA with field index 5 and attrB with field index 10 in the read schema: + * 0 === $"attrA" or $"attrB" < 100 + * the filter is compiled to a predicate, and placed to the `predicates` + * array at the position 10. In this way, if there is a row with initialized + * fields from the 0 to 10 index, the predicate can be applied to the row + * to check that the row should be skipped or not. + * Multiple predicates with the same maximum reference index are combined + * by the `And` expression. + */ + private val predicates: Array[BasePredicate] = { +val len = readSchema.fields.length +val groupedPredicates = Array.fill[BasePredicate](len)(null) +if (SQLConf.get.csvFilterPushDown) { + val groupedExprs = Array.fill(len)(Seq.empty[Expression]) + for (filter <- filters) { +val expr = CSVFilters.filterToExpression(filter, toRef) +val refs = filter.references +if (refs.isEmpty) { + // For example, AlwaysTrue and AlwaysFalse doesn't have any references + for (i <- 0 until len) { +groupedExprs(i) ++= expr Review comment: Even more, `AlwaysTrue` could be removed because it does not impact on the result. `AlwaysFalse` could be put at index 0, and other filters can be ignored. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure
AmplabJenkins removed a comment on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure URL: https://github.com/apache/spark/pull/27183#issuecomment-573536618 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116606/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration
dongjoon-hyun closed pull request #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration URL: https://github.com/apache/spark/pull/27146 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure
AmplabJenkins removed a comment on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure URL: https://github.com/apache/spark/pull/27183#issuecomment-573536614 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure
AmplabJenkins commented on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure URL: https://github.com/apache/spark/pull/27183#issuecomment-573536614 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure
AmplabJenkins commented on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure URL: https://github.com/apache/spark/pull/27183#issuecomment-573536618 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116606/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure
SparkQA removed a comment on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure URL: https://github.com/apache/spark/pull/27183#issuecomment-573528226 **[Test build #116606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116606/testReport)** for PR 27183 at commit [`f46e394`](https://github.com/apache/spark/commit/f46e3948ce3b88a255a3b0e5560eb0550c84b37c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure
SparkQA commented on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure URL: https://github.com/apache/spark/pull/27183#issuecomment-573536326 **[Test build #116606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116606/testReport)** for PR 27183 at commit [`f46e394`](https://github.com/apache/spark/commit/f46e3948ce3b88a255a3b0e5560eb0550c84b37c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#discussion_r365666768 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVFilters.scala ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.csv + +import scala.util.Try + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources +import org.apache.spark.sql.types.{BooleanType, StructType} + +/** + * An instance of the class compiles filters to predicates and allows to + * apply the predicates to an internal row with partially initialized values + * converted from parsed CSV fields. + * + * @param filters The filters pushed down to CSV datasource. + * @param dataSchema The full schema with all fields in CSV files. + * @param requiredSchema The schema with only fields requested by the upper layer. + * @param columnPruning true if CSV parser can read sub-set of columns otherwise false. + */ +class CSVFilters( +filters: Seq[sources.Filter], +dataSchema: StructType, +requiredSchema: StructType, +columnPruning: Boolean) { + require(checkFilters(), "All filters must be applicable to the data schema.") + + /** + * The schema to read from the underlying CSV parser. + * It combines the required schema and the fields referenced by filters. + */ + val readSchema: StructType = { +if (columnPruning) { + val refs = filters.flatMap(_.references).toSet + val readFields = dataSchema.filter { field => +requiredSchema.contains(field) || refs.contains(field.name) + } + StructType(readFields) +} else { + dataSchema +} + } + + /** + * Converted filters to predicates and grouped by maximum field index + * in the read schema. For example, if an filter refers to 2 attributes + * attrA with field index 5 and attrB with field index 10 in the read schema: + * 0 === $"attrA" or $"attrB" < 100 + * the filter is compiled to a predicate, and placed to the `predicates` + * array at the position 10. In this way, if there is a row with initialized + * fields from the 0 to 10 index, the predicate can be applied to the row + * to check that the row should be skipped or not. + * Multiple predicates with the same maximum reference index are combined + * by the `And` expression. + */ + private val predicates: Array[BasePredicate] = { +val len = readSchema.fields.length +val groupedPredicates = Array.fill[BasePredicate](len)(null) +if (SQLConf.get.csvFilterPushDown) { + val groupedExprs = Array.fill(len)(Seq.empty[Expression]) + for (filter <- filters) { +val expr = CSVFilters.filterToExpression(filter, toRef) +val refs = filter.references +if (refs.isEmpty) { + // For example, AlwaysTrue and AlwaysFalse doesn't have any references + for (i <- 0 until len) { +groupedExprs(i) ++= expr Review comment: You are right since we combine all pushed filters via `And`. Also I think all filters with references (literals) could be put at the beginning of the group before reducing here https://github.com/apache/spark/pull/26973/files#diff-44a98c4a53980cb04e57f0489b257a37R95 So, we have pushed filters: `Seq(AlwaysFalse, StringContains(ref0, "abc"))`, and they are reduced to `And(AlwaysFalse, StringContains(ref0, "abc"))`, the second filter `StringContains(ref0, "abc")` will not be evaluated at all. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: