date:20200112

[GitHub] [spark] AmplabJenkins removed a comment on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #27164: [SPARK-30479][SQL] Apply 
compaction of event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-573545599
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116605/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] uncleGen commented on issue #26201: [SPARK-29543][SS][UI] Init structured streaming ui

2020-01-12 Thread GitBox

uncleGen commented on issue #26201: [SPARK-29543][SS][UI] Init structured 
streaming ui
URL: https://github.com/apache/spark/pull/26201#issuecomment-573546090
 
 
   retest this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL]  migrate 
DESCRIBE TABLE to the new framework
URL: https://github.com/apache/spark/pull/27187#issuecomment-573545480
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21393/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #27164: [SPARK-30479][SQL] Apply 
compaction of event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-573545589
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL]  migrate 
DESCRIBE TABLE to the new framework
URL: https://github.com/apache/spark/pull/27187#issuecomment-573545475
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-12 Thread GitBox

cloud-fan commented on a change in pull request #27187: [SPARK-30497][SQL]  
migrate DESCRIBE TABLE to the new framework
URL: https://github.com/apache/spark/pull/27187#discussion_r365675326
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2Unresolved.scala
 ##
 @@ -19,15 +19,49 @@ package org.apache.spark.sql.catalyst.analysis
 
 import org.apache.spark.sql.catalyst.expressions.Attribute
 import org.apache.spark.sql.catalyst.plans.logical.LeafNode
-import org.apache.spark.sql.connector.catalog.SupportsNamespaces
+import org.apache.spark.sql.connector.catalog.{Identifier, SupportsNamespaces, 
Table, TableCatalog}
+
+/**
+ * Holds the name of a namespace that has yet to be looked up in a catalog. It 
will be resolved to
+ * [[ResolvedNamespace]] during analysis.
+ */
+case class UnresolvedNamespace(multipartIdentifier: Seq[String]) extends 
LeafNode {
 
 Review comment:
   how about `v2ResolutionPlans`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-12 Thread GitBox

yaooqinn commented on a change in pull request #27187: [SPARK-30497][SQL]  
migrate DESCRIBE TABLE to the new framework
URL: https://github.com/apache/spark/pull/27187#discussion_r365674995
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2Unresolved.scala
 ##
 @@ -19,15 +19,49 @@ package org.apache.spark.sql.catalyst.analysis
 
 import org.apache.spark.sql.catalyst.expressions.Attribute
 import org.apache.spark.sql.catalyst.plans.logical.LeafNode
-import org.apache.spark.sql.connector.catalog.SupportsNamespaces
+import org.apache.spark.sql.connector.catalog.{Identifier, SupportsNamespaces, 
Table, TableCatalog}
+
+/**
+ * Holds the name of a namespace that has yet to be looked up in a catalog. It 
will be resolved to
+ * [[ResolvedNamespace]] during analysis.
+ */
+case class UnresolvedNamespace(multipartIdentifier: Seq[String]) extends 
LeafNode {
 
 Review comment:
   nit: `v2Unresolved` seems not to fit all


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #27164: [SPARK-30479][SQL] Apply compaction of 
event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-573545599
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116605/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-12 Thread GitBox

SparkQA removed a comment on issue #27164: [SPARK-30479][SQL] Apply compaction 
of event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-573519700
 
 
   **[Test build #116605 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116605/testReport)**
 for PR 27164 at commit 
[`5f37b64`](https://github.com/apache/spark/commit/5f37b64010bf669d1426db53e9e4c35770cf37b4).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #27164: [SPARK-30479][SQL] Apply compaction of 
event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-573545589
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #27187: [SPARK-30497][SQL]  migrate DESCRIBE 
TABLE to the new framework
URL: https://github.com/apache/spark/pull/27187#issuecomment-573545475
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #27187: [SPARK-30497][SQL]  migrate DESCRIBE 
TABLE to the new framework
URL: https://github.com/apache/spark/pull/27187#issuecomment-573545480
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21393/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-12 Thread GitBox

SparkQA commented on issue #27187: [SPARK-30497][SQL]  migrate DESCRIBE TABLE 
to the new framework
URL: https://github.com/apache/spark/pull/27187#issuecomment-573545090
 
 
   **[Test build #116611 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116611/testReport)**
 for PR 27187 at commit 
[`4e0e8c1`](https://github.com/apache/spark/commit/4e0e8c11031cf9b38f8970996c08190b5c2eeac7).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-12 Thread GitBox

SparkQA commented on issue #27164: [SPARK-30479][SQL] Apply compaction of event 
log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-573545284
 
 
   **[Test build #116605 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116605/testReport)**
 for PR 27164 at commit 
[`5f37b64`](https://github.com/apache/spark/commit/5f37b64010bf669d1426db53e9e4c35770cf37b4).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable AQE

2020-01-12 Thread GitBox

SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] 
Resolve the failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-573545115
 
 
   **[Test build #116612 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116612/testReport)**
 for PR 26813 at commit 
[`8b5e744`](https://github.com/apache/spark/commit/8b5e7442c63fe326db7c7f46f7a194fbae8f0d46).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26921: [SPARK-30282][SQL] Integrate V2 commands with UnresolvedV2Relation into new resolution framework

2020-01-12 Thread GitBox

cloud-fan commented on a change in pull request #26921: [SPARK-30282][SQL] 
Integrate V2 commands with UnresolvedV2Relation into new resolution framework
URL: https://github.com/apache/spark/pull/26921#discussion_r365673895
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
 ##
 @@ -201,22 +201,21 @@ class ResolveSessionCatalog(
 case RenameTableStatement(SessionCatalogAndTable(_, oldName), 
newNameParts, isView) =>
   AlterTableRenameCommand(oldName.asTableIdentifier, 
newNameParts.asTableIdentifier, isView)
 
-case DescribeTableStatement(
- nameParts @ SessionCatalogAndTable(catalog, tbl), partitionSpec, 
isExtended) =>
-  loadTable(catalog, tbl.asIdentifier).collect {
-case v1Table: V1Table =>
-  DescribeTableCommand(tbl.asTableIdentifier, partitionSpec, 
isExtended)
-  }.getOrElse {
-// The v1 `DescribeTableCommand` can describe view as well.
-if (isView(tbl)) {
-  DescribeTableCommand(tbl.asTableIdentifier, partitionSpec, 
isExtended)
-} else {
-  if (partitionSpec.nonEmpty) {
-throw new AnalysisException("DESCRIBE TABLE does not support 
partition for v2 tables.")
+case d @ DescribeTable(SessionCatalogAndResolvedTable(resolved), 
partitionSpec, isExtended) =>
+  resolved.table match {
+case _: V1Table =>
+  DescribeTableCommand(getTableIdentifier(resolved), partitionSpec, 
isExtended)
+case _ =>
+  // The v1 `DescribeTableCommand` can describe view as well.
+  if (isView(resolved.identifier.asMultipartIdentifier)) {
 
 Review comment:
   I've open https://github.com/apache/spark/pull/27187 to refine it.
   
   This is a feature we missed when designing the new framework, so I opened a 
separated PR to update the framework to support accepting both table and view 
like DESCRIBE command.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] guykhazma edited a comment on issue #27157: [SPARK-30475][SQL] File source V2: Push data filters for file listing

2020-01-12 Thread GitBox

guykhazma edited a comment on issue #27157: [SPARK-30475][SQL] File source V2: 
Push data filters for file listing
URL: https://github.com/apache/spark/pull/27157#issuecomment-573543733
 
 
   @gengliangwang by `"data skipping uniformly for all file based data 
sources"` I mean that the above approach works uniformly for all formats 
whether they support pushdown or not. 
   (It has also benefits for formats which support pushdown such as parquet by 
avoiding the need to read the footer of each file). See for example this [Spark 
Summit 
talk](https://databricks.com/session/using-pluggable-apache-spark-sql-filters-to-help-gridpocket-users-keep-up-with-the-jones-and-save-the-planet).
   
   Note that in datasource v1 the `dataFilters` are also passed to the 
`listFiles` method in the 
[`FileSourceScanExec`](https://github.com/apache/spark/blob/eefcc7d762a627bf19cab7041a1a82f88862e7e1/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L210)
 case class which is used by all of the file based datasources.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR edited a comment on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration

2020-01-12 Thread GitBox

HeartSaVioR edited a comment on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] 
Document Kafka producer pool configuration
URL: https://github.com/apache/spark/pull/27146#issuecomment-573542906
 
 
   Btw I'm also seeing different understanding of the section "Does this PR 
introduce any user-facing change?" around many open PRs. 
   
   My understanding of intention for the section is emphasizing the fact and 
enumerating if there's any behavioral changes / API side changes so that end 
users are likely to change their query/code. (So if the answer of section is 
yes then the patch should have to be reviewed carefully.) Expanding this to 
anything end users are facing would lead the answer of section to be most 
likely "yes", lighten the meaning of the section. I might be missing anything, 
welcome discussion around this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27176: [WIP][SPIP]Support year-month and day-time interval types

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #27176: [WIP][SPIP]Support year-month 
and day-time interval types
URL: https://github.com/apache/spark/pull/27176#issuecomment-573543302
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116603/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-12 Thread GitBox

cloud-fan commented on issue #27187: [SPARK-30497][SQL]  migrate DESCRIBE TABLE 
to the new framework
URL: https://github.com/apache/spark/pull/27187#issuecomment-573543790
 
 
   cc @imback82 @yaooqinn @viirya  @HyukjinKwon 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] guykhazma commented on issue #27157: [SPARK-30475][SQL] File source V2: Push data filters for file listing

2020-01-12 Thread GitBox

guykhazma commented on issue #27157: [SPARK-30475][SQL] File source V2: Push 
data filters for file listing
URL: https://github.com/apache/spark/pull/27157#issuecomment-573543733
 
 
   @gengliangwang by `"data skipping uniformly for all file based data 
sources"` I mean that the above approach works uniformly for all formats 
whether they support pushdown or not. 
   (It has also benefits for formats which support pushdown such as parquet by 
avoiding the need to read the footer of each file).
   See for example this [Spark Summit 
talk](https://databricks.com/session/using-pluggable-apache-spark-sql-filters-to-help-gridpocket-users-keep-up-with-the-jones-and-save-the-planet).
   
   Note that in datasource v1 the `dataFilters` are also passed to the 
`listFiles` method in the 
[`FileSourceScanExec`](https://github.com/apache/spark/blob/eefcc7d762a627bf19cab7041a1a82f88862e7e1/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L210)
 case class which is used by all of the file based datasources.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable AQE

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #26813: 
[SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable 
AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-573543303
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-12 Thread GitBox

cloud-fan commented on a change in pull request #27187: [SPARK-30497][SQL]  
migrate DESCRIBE TABLE to the new framework
URL: https://github.com/apache/spark/pull/27187#discussion_r365673286
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2Unresolved.scala
 ##
 @@ -19,15 +19,49 @@ package org.apache.spark.sql.catalyst.analysis
 
 import org.apache.spark.sql.catalyst.expressions.Attribute
 import org.apache.spark.sql.catalyst.plans.logical.LeafNode
-import org.apache.spark.sql.connector.catalog.SupportsNamespaces
+import org.apache.spark.sql.connector.catalog.{Identifier, SupportsNamespaces, 
Table, TableCatalog}
+
+/**
+ * Holds the name of a namespace that has yet to be looked up in a catalog. It 
will be resolved to
+ * [[ResolvedNamespace]] during analysis.
+ */
+case class UnresolvedNamespace(multipartIdentifier: Seq[String]) extends 
LeafNode {
 
 Review comment:
   move these new unresolve plans to one file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize 
skewed partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-573543313
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21392/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable AQE

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #26813: 
[SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable 
AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-573543311
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21391/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR edited a comment on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration

2020-01-12 Thread GitBox

HeartSaVioR edited a comment on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] 
Document Kafka producer pool configuration
URL: https://github.com/apache/spark/pull/27146#issuecomment-573542906
 
 
   Btw I'm also seeing different understanding of the section "Does this PR 
introduce any user-facing change?". 
   
   My understanding of intention for the section is emphasizing the fact and 
enumerating if there's any behavioral changes / API side changes so that end 
users are likely to change their query/code. (So if the answer of section is 
yes then the patch should have to be reviewed carefully.) Expanding this to 
anything end users are facing would lead the answer of section to be most 
likely "yes", lighten the meaning of the section. I might be missing anything, 
welcome discussion around this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27176: [WIP][SPIP]Support year-month and day-time interval types

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #27176: [WIP][SPIP]Support year-month 
and day-time interval types
URL: https://github.com/apache/spark/pull/27176#issuecomment-573543296
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize 
skewed partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-573543305
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27176: [WIP][SPIP]Support year-month and day-time interval types

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #27176: [WIP][SPIP]Support year-month and 
day-time interval types
URL: https://github.com/apache/spark/pull/27176#issuecomment-573543296
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable AQE

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] 
Resolve the failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-573543311
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21391/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed 
partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-573543305
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed 
partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-573543313
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21392/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] Resolve the failed unit tests when enable AQE

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP][test-hive1.2] 
Resolve the failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-573543303
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24990: [SPARK-28191][SS] New data source - state - reader part

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #24990: [SPARK-28191][SS] New data source - 
state - reader part
URL: https://github.com/apache/spark/pull/24990#issuecomment-573543177
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27176: [WIP][SPIP]Support year-month and day-time interval types

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #27176: [WIP][SPIP]Support year-month and 
day-time interval types
URL: https://github.com/apache/spark/pull/27176#issuecomment-573543302
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116603/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24990: [SPARK-28191][SS] New data source - state - reader part

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #24990: [SPARK-28191][SS] New data 
source - state - reader part
URL: https://github.com/apache/spark/pull/24990#issuecomment-573543177
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24990: [SPARK-28191][SS] New data source - state - reader part

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #24990: [SPARK-28191][SS] New data 
source - state - reader part
URL: https://github.com/apache/spark/pull/24990#issuecomment-573543186
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116588/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #27187: [SPARK-30497][SQL]  migrate DESCRIBE 
TABLE to the new framework
URL: https://github.com/apache/spark/pull/27187#issuecomment-573543237
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24990: [SPARK-28191][SS] New data source - state - reader part

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #24990: [SPARK-28191][SS] New data source - 
state - reader part
URL: https://github.com/apache/spark/pull/24990#issuecomment-573543186
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116588/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL]  migrate 
DESCRIBE TABLE to the new framework
URL: https://github.com/apache/spark/pull/27187#issuecomment-573543237
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27176: [WIP][SPIP]Support year-month and day-time interval types

2020-01-12 Thread GitBox

SparkQA removed a comment on issue #27176: [WIP][SPIP]Support year-month and 
day-time interval types
URL: https://github.com/apache/spark/pull/27176#issuecomment-573516694
 
 
   **[Test build #116603 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116603/testReport)**
 for PR 27176 at commit 
[`7ad45b4`](https://github.com/apache/spark/commit/7ad45b484c02b5a1e893c956e309a45f7d28e87b).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #27187: [SPARK-30497][SQL]  migrate DESCRIBE 
TABLE to the new framework
URL: https://github.com/apache/spark/pull/27187#issuecomment-573543246
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21390/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #27187: [SPARK-30497][SQL]  migrate 
DESCRIBE TABLE to the new framework
URL: https://github.com/apache/spark/pull/27187#issuecomment-573543246
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21390/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration

2020-01-12 Thread GitBox

HeartSaVioR commented on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] 
Document Kafka producer pool configuration
URL: https://github.com/apache/spark/pull/27146#issuecomment-573542906
 
 
   Btw I'm also seeing different understanding of the section "Does this PR 
introduce any user-facing change?". 
   
   My understanding of intention for the section is emphasizing the fact and 
enumerating if there's any behavioral changes / API side changes so that end 
users are likely to change their query/code. Expanding this to anything end 
users are facing would lead the answer of section to be most likely "yes", 
lighten the meaning of the section. I might be missing anything, welcome 
discussion around this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27176: [WIP][SPIP]Support year-month and day-time interval types

2020-01-12 Thread GitBox

SparkQA commented on issue #27176: [WIP][SPIP]Support year-month and day-time 
interval types
URL: https://github.com/apache/spark/pull/27176#issuecomment-573542945
 
 
   **[Test build #116603 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116603/testReport)**
 for PR 27176 at commit 
[`7ad45b4`](https://github.com/apache/spark/commit/7ad45b484c02b5a1e893c956e309a45f7d28e87b).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24990: [SPARK-28191][SS] New data source - state - reader part

2020-01-12 Thread GitBox

SparkQA removed a comment on issue #24990: [SPARK-28191][SS] New data source - 
state - reader part
URL: https://github.com/apache/spark/pull/24990#issuecomment-573492657
 
 
   **[Test build #116588 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116588/testReport)**
 for PR 24990 at commit 
[`64a08b9`](https://github.com/apache/spark/commit/64a08b957e085b5489b0278d0972889d6e1f1e20).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2020-01-12 Thread GitBox

SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed 
partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-573542861
 
 
   **[Test build #116610 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116610/testReport)**
 for PR 26434 at commit 
[`cee1c8c`](https://github.com/apache/spark/commit/cee1c8cb7b4c4714dd3d59fbaa693d1dab217767).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-12 Thread GitBox

SparkQA commented on issue #27187: [SPARK-30497][SQL]  migrate DESCRIBE TABLE 
to the new framework
URL: https://github.com/apache/spark/pull/27187#issuecomment-573542839
 
 
   **[Test build #116609 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116609/testReport)**
 for PR 27187 at commit 
[`0b729b9`](https://github.com/apache/spark/commit/0b729b9b2913e03ca9eefa4a3c58f1b990fa2e24).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24990: [SPARK-28191][SS] New data source - state - reader part

2020-01-12 Thread GitBox

SparkQA commented on issue #24990: [SPARK-28191][SS] New data source - state - 
reader part
URL: https://github.com/apache/spark/pull/24990#issuecomment-573542708
 
 
   **[Test build #116588 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116588/testReport)**
 for PR 24990 at commit 
[`64a08b9`](https://github.com/apache/spark/commit/64a08b957e085b5489b0278d0972889d6e1f1e20).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types

2020-01-12 Thread GitBox

HyukjinKwon commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type 
hints in pandas UDF and rename/move inconsistent pandas UDF types
URL: https://github.com/apache/spark/pull/27165#issuecomment-573542451
 
 
   Should be ready for a look.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JkSelf commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2020-01-12 Thread GitBox

JkSelf commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition 
based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-573542055
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan opened a new pull request #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-12 Thread GitBox

cloud-fan opened a new pull request #27187: [SPARK-30497][SQL] migrate
DESCRIBE TABLE to the new framework
URL: https://github.com/apache/spark/pull/27187

### What changes were proposed in this pull request?

Use the new framework to resolve the DESCRIBE TABLE command.

The v1 DESCRIBE TABLE command supports both table and view. Checked with
Hive and Presto, they don't have DESCRIBE TABLE but only DESCRIBE, which
supports both table and view:
1.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DescribeTable/View/MaterializedView/Column
2. https://prestodb.io/docs/current/sql/describe.html

We should make it clear that DESCRIBE support both table and view, by
renaming the command to `DescribeRelation`.

This PR also tunes the framework a little bit to support the case that a
command accepts both table and view.

### Why are the changes needed?

This is a part of effort to make the relation lookup behavior consistent:
SPARK-2990.

Note that I make a separated PR here, as I need to update the framework to
support a new use case: accept both table and view.

### Does this PR introduce any user-facing change?

### How was this patch tested?

existing tests

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27181: [SPARK-30493][PYTHON][ML] Remove OneVsRestModel setClassifier, setLabelCol and setWeightCol methods

2020-01-12 Thread GitBox

HyukjinKwon commented on issue #27181: [SPARK-30493][PYTHON][ML] Remove 
OneVsRestModel setClassifier, setLabelCol and setWeightCol methods
URL: https://github.com/apache/spark/pull/27181#issuecomment-573541927
 
 
   Before merging it, can we update PR description properly please? It's best 
to avoid having bad PR examples.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #27186: [SPARK-30480][PYTHON][TESTS] 
Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
URL: https://github.com/apache/spark/pull/27186#issuecomment-573541230
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #27186: [SPARK-30480][PYTHON][TESTS] 
Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
URL: https://github.com/apache/spark/pull/27186#issuecomment-573541231
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21389/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases 
the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
URL: https://github.com/apache/spark/pull/27186#issuecomment-573541230
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases 
the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
URL: https://github.com/apache/spark/pull/27186#issuecomment-573541231
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21389/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'

2020-01-12 Thread GitBox

SparkQA commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the 
memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
URL: https://github.com/apache/spark/pull/27186#issuecomment-573540951
 
 
   **[Test build #116608 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116608/testReport)**
 for PR 27186 at commit 
[`057e4e9`](https://github.com/apache/spark/commit/057e4e99c782aafa235b081ac4247d1a0314a1a1).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR edited a comment on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration

2020-01-12 Thread GitBox

HeartSaVioR edited a comment on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] 
Document Kafka producer pool configuration
URL: https://github.com/apache/spark/pull/27146#issuecomment-573540460
 
 
   Thanks all for reviewing and merging!
   
   @dongjoon-hyun What about enhancing the steps on contribution for 
documentation? (Either adding this to 'contributing' page or even adding this 
to PR template.) I'm not sure we explicitly have some requirements, and it 
would be helpful if we standardize this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration

2020-01-12 Thread GitBox

HeartSaVioR commented on issue #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] 
Document Kafka producer pool configuration
URL: https://github.com/apache/spark/pull/27146#issuecomment-573540460
 
 
   Thanks all for reviewing and merging!
   
   @dongjoon-hyun What about enhancing the steps on contribution for 
documentation? I'm not sure we explicitly have some requirements, and it would 
be helpful if we standardize this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'

2020-01-12 Thread GitBox

HyukjinKwon commented on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases 
the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
URL: https://github.com/apache/spark/pull/27186#issuecomment-573539570
 
 
   cc @HeartSaVioR, @dongjoon-hyun, @xuanyuanking 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon edited a comment on issue #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'

2020-01-12 Thread GitBox

HyukjinKwon edited a comment on issue #27186: [SPARK-30480][PYTHON][TESTS] 
Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
URL: https://github.com/apache/spark/pull/27186#issuecomment-573539570
 
 
   cc @HeartSaVioR, @dongjoon-hyun, @xuanyuanking, @MaxGekk 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure

2020-01-12 Thread GitBox

HyukjinKwon closed pull request #27183: [DO-NOT-MERGE] Investigate 
'WorkerMemoryTest.test_memory_limit' failure
URL: https://github.com/apache/spark/pull/27183
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon opened a new pull request #27186: [SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'

2020-01-12 Thread GitBox

HyukjinKwon opened a new pull request #27186: [SPARK-30480][PYTHON][TESTS] 
Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'
URL: https://github.com/apache/spark/pull/27186
 
 
   ### What changes were proposed in this pull request?
   
   This PR proposes to increase the memory in 
`WorkerMemoryTest.test_memory_limit` in order to make the test pass with PyPy.
   
   The test is currently failed only in PyPy as below:
   
   ```
   Current mem limits: 18446744073709551615 of max 18446744073709551615
   
   Setting mem limits to 1048576 of max 1048576
   
   RPython traceback:
 File "pypy_module_pypyjit_interp_jit.c", line 289, in portal_5
 File "pypy_interpreter_pyopcode.c", line 3468, in 
handle_bytecode__AccessDirect_None
 File "pypy_interpreter_pyopcode.c", line 5558, in 
dispatch_bytecode__AccessDirect_None
   out of memory: couldn't allocate the next arena
   ERROR
   ```
   
   It seems related to how PyPy allocates the memory and GC works 
PyPy-specifically. There seems nothing wrong in this configuration 
implementation itself in PySpark side.
   
   I roughly tested in higher PyPy versions on Ubuntu and this test seems 
passing fine so I suspect this might be an issue in old PyPy behaviours.
   
   The change only increases the limit so it would not affect actual memory 
allocations. It just needs to test if the limit is properly set in worker 
sides. For clarification, this limit is maximum memory in the machine if not 
set. 
   
   ### Why are the changes needed?
   
   To make the tests pass and unblock other PRs.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Manually and Jenkins should test it out.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure

2020-01-12 Thread GitBox

HyukjinKwon commented on issue #27183: [DO-NOT-MERGE] Investigate 
'WorkerMemoryTest.test_memory_limit' failure
URL: https://github.com/apache/spark/pull/27183#issuecomment-573539312
 
 
   Made a PR at https://github.com/apache/spark/pull/27186


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #24173: [SPARK-27237][SS] Introduce 
State schema validation among query restart
URL: https://github.com/apache/spark/pull/24173#issuecomment-573538923
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116590/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more 
constraint checks in file streaming source to avoid checkpoint corruption
URL: https://github.com/apache/spark/pull/25965#issuecomment-573539020
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #25965: [SPARK-26425][SS] Add more constraint 
checks in file streaming source to avoid checkpoint corruption
URL: https://github.com/apache/spark/pull/25965#issuecomment-573539020
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #24173: [SPARK-27237][SS] Introduce 
State schema validation among query restart
URL: https://github.com/apache/spark/pull/24173#issuecomment-573538913
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #25965: [SPARK-26425][SS] Add more constraint 
checks in file streaming source to avoid checkpoint corruption
URL: https://github.com/apache/spark/pull/25965#issuecomment-573539029
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116587/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more 
constraint checks in file streaming source to avoid checkpoint corruption
URL: https://github.com/apache/spark/pull/25965#issuecomment-573539029
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116587/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #24173: [SPARK-27237][SS] Introduce State 
schema validation among query restart
URL: https://github.com/apache/spark/pull/24173#issuecomment-573538923
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116590/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #24173: [SPARK-27237][SS] Introduce State 
schema validation among query restart
URL: https://github.com/apache/spark/pull/24173#issuecomment-573538913
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption

2020-01-12 Thread GitBox

SparkQA commented on issue #25965: [SPARK-26425][SS] Add more constraint checks 
in file streaming source to avoid checkpoint corruption
URL: https://github.com/apache/spark/pull/25965#issuecomment-573538544
 
 
   **[Test build #116587 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116587/testReport)**
 for PR 25965 at commit 
[`726d920`](https://github.com/apache/spark/commit/726d920db6969a46174f4dac96db8af1a9991cad).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-01-12 Thread GitBox

SparkQA removed a comment on issue #24173: [SPARK-27237][SS] Introduce State 
schema validation among query restart
URL: https://github.com/apache/spark/pull/24173#issuecomment-573492660
 
 
   **[Test build #116590 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116590/testReport)**
 for PR 24173 at commit 
[`1fcfff5`](https://github.com/apache/spark/commit/1fcfff5c2ca78049eb38cf4ef7c041d0005ab9b3).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption

2020-01-12 Thread GitBox

SparkQA removed a comment on issue #25965: [SPARK-26425][SS] Add more 
constraint checks in file streaming source to avoid checkpoint corruption
URL: https://github.com/apache/spark/pull/25965#issuecomment-573492649
 
 
   **[Test build #116587 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116587/testReport)**
 for PR 25965 at commit 
[`726d920`](https://github.com/apache/spark/commit/726d920db6969a46174f4dac96db8af1a9991cad).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-01-12 Thread GitBox

SparkQA commented on issue #24173: [SPARK-27237][SS] Introduce State schema 
validation among query restart
URL: https://github.com/apache/spark/pull/24173#issuecomment-573538431
 
 
   **[Test build #116590 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116590/testReport)**
 for PR 24173 at commit 
[`1fcfff5`](https://github.com/apache/spark/commit/1fcfff5c2ca78049eb38cf4ef7c041d0005ab9b3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #25987: [SPARK-29314][SS] Don't 
overwrite the metric "updated" of state operator to 0 if empty batch is run
URL: https://github.com/apache/spark/pull/25987#issuecomment-573537980
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116586/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #25987: [SPARK-29314][SS] Don't 
overwrite the metric "updated" of state operator to 0 if empty batch is run
URL: https://github.com/apache/spark/pull/25987#issuecomment-573537977
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #25987: [SPARK-29314][SS] Don't overwrite the 
metric "updated" of state operator to 0 if empty batch is run
URL: https://github.com/apache/spark/pull/25987#issuecomment-573537980
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116586/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui

2020-01-12 Thread GitBox

HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] 
Init structured streaming ui
URL: https://github.com/apache/spark/pull/26201#discussion_r365650148
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/ui/UIUtils.scala
 ##
 @@ -572,4 +628,23 @@ private[spark] object UIUtils extends Logging {
   def buildErrorResponse(status: Response.Status, msg: String): Response = {
 Response.status(status).entity(msg).`type`(MediaType.TEXT_PLAIN).build()
   }
+
+  /**
+   * There may be different duration labels in each batch. So we need to
+   * mark those missing duration label as '0d' to avoid UI rending error.
+   */
+  def durationDataPadding(
+  values: Array[(Long, ju.Map[String, JLong])]): Array[(Long, Map[String, 
Double])] = {
+val operationLabels = values.flatMap(_._2.keySet().asScala).toSet
+values.map { case (x, y) =>
+  val dataPadding = operationLabels.map(d =>
 
 Review comment:
   nit: let's be consistent on style
   
   ```
   val dataPadding = operationLabels.map { d =>
 ...
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui

2020-01-12 Thread GitBox

HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] 
Init structured streaming ui
URL: https://github.com/apache/spark/pull/26201#discussion_r365654786
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala
 ##
 @@ -68,6 +69,9 @@ class StreamingQueryManager private[sql] (sparkSession: 
SparkSession) extends Lo
 logInfo(s"Registered listener ${listener.getClass.getName}")
   })
 }
+if (sparkSession.sparkContext.conf.get(UI_ENABLED)) {
 
 Review comment:
   `UI_ENABLED` is being checked twice, and `.get` is called anyway, defeating 
the purpose of Option.
   
   This would behave similar but checks UI_ENABLED only once and leverages 
Option properly.
   ```
   sparkSession.sharedState.streamingQueryStatusListener.foreach { listener =>
 addListener(listener)
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #25987: [SPARK-29314][SS] Don't overwrite the 
metric "updated" of state operator to 0 if empty batch is run
URL: https://github.com/apache/spark/pull/25987#issuecomment-573537977
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui

2020-01-12 Thread GitBox

HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] 
Init structured streaming ui
URL: https://github.com/apache/spark/pull/26201#discussion_r365656488
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
 ##
 @@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming.ui
+
+import java.text.SimpleDateFormat
+import java.util.TimeZone
+import javax.servlet.http.HttpServletRequest
+
+import scala.xml.Node
+
+import org.apache.commons.lang3.StringEscapeUtils
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.execution.streaming.{QuerySummary, 
StreamQueryStore}
+import org.apache.spark.sql.execution.ui.SQLTab
+import org.apache.spark.sql.streaming.StreamingQuery
+import org.apache.spark.sql.streaming.ui.UIUtils._
+import org.apache.spark.ui.{UIUtils => SparkUIUtils, WebUIPage}
+
+class StreamingQueryPage(parent: SQLTab, store: Option[StreamQueryStore])
+  extends WebUIPage("streaming") with Logging {
+  val df = new SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'")
+  df.setTimeZone(TimeZone.getDefault)
+
+  override def render(request: HttpServletRequest): Seq[Node] = {
+val content = store.synchronized {
+  generateStreamingQueryTable(request)
+}
+SparkUIUtils.headerSparkPage(request, "Streaming Query", content, parent)
+  }
+
+  def generateDataRow(request: HttpServletRequest, isActive: Boolean)
+(streamQuery: (StreamingQuery, Long)): Seq[Node] = {
+
+val (query, timeSinceStart) = streamQuery
+def details(detail: Any): Seq[Node] = {
 
 Review comment:
   Just a note: review comment is not addressed yet. Most of parts are same, 
and they don't seem to be likely diverged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui

2020-01-12 Thread GitBox

HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] 
Init structured streaming ui
URL: https://github.com/apache/spark/pull/26201#discussion_r365660047
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
 ##
 @@ -0,0 +1,273 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming.ui
+
+import java.{util => ju}
+import java.lang.{Long => JLong}
+import java.text.SimpleDateFormat
+import java.util.UUID
+import javax.servlet.http.HttpServletRequest
+
+import scala.xml.{Node, Unparsed}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.util.DateTimeUtils.getTimeZone
+import org.apache.spark.sql.streaming.ui.UIUtils._
+import org.apache.spark.ui.{GraphUIData, JsCollector, UIUtils => SparkUIUtils, 
WebUIPage}
+
+class StreamingQueryStatisticsPage(
+parent: StreamingQueryTab,
+statusListener: StreamingQueryStatusListener)
 
 Review comment:
   `statusListener` is also available in `StreamingQueryTab`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui

2020-01-12 Thread GitBox

HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] 
Init structured streaming ui
URL: https://github.com/apache/spark/pull/26201#discussion_r365658608
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
 ##
 @@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming.ui
+
+import java.text.SimpleDateFormat
+import javax.servlet.http.HttpServletRequest
+
+import scala.xml.Node
+
+import org.apache.commons.lang3.StringEscapeUtils
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.util.DateTimeUtils.getTimeZone
+import org.apache.spark.sql.streaming.ui.UIUtils._
+import org.apache.spark.ui.{UIUtils => SparkUIUtils, WebUIPage}
+
+class StreamingQueryPage(parent: StreamingQueryTab, statusListener: 
StreamingQueryStatusListener)
+extends WebUIPage("") with Logging {
+  val df = new SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'")
+  df.setTimeZone(getTimeZone("UTC"))
+
+  override def render(request: HttpServletRequest): Seq[Node] = {
+val content = generateStreamingQueryTable(request)
+SparkUIUtils.headerSparkPage(request, "Streaming Query", content, parent)
+  }
+
+  def generateDataRow(request: HttpServletRequest, queryActive: Boolean)
+(query: StreamingQueryUIData): Seq[Node] = {
+
+def details(detail: Any): Seq[Node] = {
+  if (queryActive) {
+return Seq.empty[Node]
+  }
+  val s = detail.asInstanceOf[String]
+  val isMultiline = s.indexOf('\n') >= 0
+  val summary = StringEscapeUtils.escapeHtml4(
+if (isMultiline) s.substring(0, s.indexOf('\n')) else s
+  )
+  val details = if (isMultiline) {
+// scalastyle:off
+
+  +details
+ ++
+  
+{s}
+  
+// scalastyle:on
+  } else {
+""
+  }
+  {summary}{details}
+}
+
+val statisticsLink = "%s/%s/statistics?id=%s"
+  .format(SparkUIUtils.prependBaseUri(request, parent.basePath), 
parent.prefix, query.runId)
+
+val name = UIUtils.getQueryName(query)
+val status = UIUtils.getQueryStatus(query)
+val duration = if (queryActive) {
+  SparkUIUtils.formatDurationVerbose(System.currentTimeMillis() - 
query.submitTime)
+} else {
+  withNoProgress(query, {
+val endTimeMs = query.lastProgress.timestamp
+SparkUIUtils.formatDurationVerbose(df.parse(endTimeMs).getTime - 
query.submitTime)
+  }, "-")
+}
+
+
+   {name} 
+   {status} 
+   {query.id} 
+{query.runId}  
+   {SparkUIUtils.formatDate(query.submitTime)} 
+   {duration} 
+   {withNoProgress(query, {
+(query.recentProgress.map(p => 
withNumberInvalid(p.inputRowsPerSecond)).sum /
 
 Review comment:
   Given we have a function for constructing exception message, why not add 
another one for constructing average message for this and below?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui

2020-01-12 Thread GitBox

HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] 
Init structured streaming ui
URL: https://github.com/apache/spark/pull/26201#discussion_r365662919
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming.ui
+
+import java.util.UUID
+import java.util.concurrent.ConcurrentHashMap
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.streaming.{StreamingQueryListener, 
StreamingQueryProgress}
+
+/**
+ * A customized StreamingQueryListener used in structured streaming UI, which 
contains all
+ * UI data for both active and inactive query.
+ * TODO: Add support for history server.
+ */
+class StreamingQueryStatusListener(sqlConf: SQLConf) extends 
StreamingQueryListener {
+
+  /**
+   * We use runId as the key here instead of id in active query status map,
+   * because the runId is unique for every started query, even it its a 
restart.
+   */
+  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, 
StreamingQueryUIData]()
+  private[ui] val inactiveQueryStatus = new 
mutable.Queue[StreamingQueryUIData]()
+
+  private val streamingProgressRetention = sqlConf.streamingProgressRetention
+  private val inactiveQueryStatusRetention = 
sqlConf.streamingUIInactiveQueryRetention
+
+  override def onQueryStarted(event: 
StreamingQueryListener.QueryStartedEvent): Unit = {
+activeQueryStatus.putIfAbsent(event.runId,
+  new StreamingQueryUIData(event.name, event.id, event.runId))
+  }
+
+  override def onQueryProgress(event: 
StreamingQueryListener.QueryProgressEvent): Unit = {
+val queryStatus = activeQueryStatus.getOrDefault(
+  event.progress.runId,
+  new StreamingQueryUIData(event.progress.name, event.progress.id, 
event.progress.runId))
+queryStatus.updateProcess(event.progress, streamingProgressRetention)
+  }
+
+  override def onQueryTerminated(event: 
StreamingQueryListener.QueryTerminatedEvent): Unit = {
+val queryStatus = activeQueryStatus.remove(event.runId)
+if (queryStatus != null) {
+  queryStatus.queryTerminated(event)
+  inactiveQueryStatus.synchronized {
+inactiveQueryStatus += queryStatus
+while (inactiveQueryStatus.length >= inactiveQueryStatusRetention) {
+  inactiveQueryStatus.dequeue()
+}
+  }
+}
+  }
+
+  def allQueryStatus: Seq[StreamingQueryUIData] = 
inactiveQueryStatus.synchronized {
+activeQueryStatus.values().asScala.toSeq ++ inactiveQueryStatus
+  }
+}
+
+/**
+ * This class contains all message related to UI display, each instance 
corresponds to a single
+ * [[org.apache.spark.sql.streaming.StreamingQuery]].
+ */
+private[ui] class StreamingQueryUIData(
+val name: String,
+val id: UUID,
+val runId: UUID) {
+  val submitTime: Long = System.currentTimeMillis()
 
 Review comment:
   Ideally we may want to take timestamp for a batch if this 
StreamingQueryUIData is constructed from onQueryProgress; otherwise submit time 
will be greater than batch timestamp.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] Init structured streaming ui

2020-01-12 Thread GitBox

HeartSaVioR commented on a change in pull request #26201: [SPARK-29543][SS][UI] 
Init structured streaming ui
URL: https://github.com/apache/spark/pull/26201#discussion_r365660084
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
 ##
 @@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming.ui
+
+import java.text.SimpleDateFormat
+import javax.servlet.http.HttpServletRequest
+
+import scala.xml.Node
+
+import org.apache.commons.lang3.StringEscapeUtils
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.util.DateTimeUtils.getTimeZone
+import org.apache.spark.sql.streaming.ui.UIUtils._
+import org.apache.spark.ui.{UIUtils => SparkUIUtils, WebUIPage}
+
+class StreamingQueryPage(parent: StreamingQueryTab, statusListener: 
StreamingQueryStatusListener)
 
 Review comment:
   `statusListener` is also available in `StreamingQueryTab`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run

2020-01-12 Thread GitBox

SparkQA removed a comment on issue #25987: [SPARK-29314][SS] Don't overwrite 
the metric "updated" of state operator to 0 if empty batch is run
URL: https://github.com/apache/spark/pull/25987#issuecomment-573492644
 
 
   **[Test build #116586 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116586/testReport)**
 for PR 25987 at commit 
[`0f59ee2`](https://github.com/apache/spark/commit/0f59ee25818b2323199c3789fe4c45ab97326031).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25987: [SPARK-29314][SS] Don't overwrite the metric "updated" of state operator to 0 if empty batch is run

2020-01-12 Thread GitBox

SparkQA commented on issue #25987: [SPARK-29314][SS] Don't overwrite the metric 
"updated" of state operator to 0 if empty batch is run
URL: https://github.com/apache/spark/pull/25987#issuecomment-573537576
 
 
   **[Test build #116586 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116586/testReport)**
 for PR 25987 at commit 
[`0f59ee2`](https://github.com/apache/spark/commit/0f59ee25818b2323199c3789fe4c45ab97326031).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-12 Thread GitBox

MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r365667390
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVFilters.scala
 ##
 @@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.csv
+
+import scala.util.Try
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources
+import org.apache.spark.sql.types.{BooleanType, StructType}
+
+/**
+ * An instance of the class compiles filters to predicates and allows to
+ * apply the predicates to an internal row with partially initialized values
+ * converted from parsed CSV fields.
+ *
+ * @param filters The filters pushed down to CSV datasource.
+ * @param dataSchema The full schema with all fields in CSV files.
+ * @param requiredSchema The schema with only fields requested by the upper 
layer.
+ * @param columnPruning true if CSV parser can read sub-set of columns 
otherwise false.
+ */
+class CSVFilters(
+filters: Seq[sources.Filter],
+dataSchema: StructType,
+requiredSchema: StructType,
+columnPruning: Boolean) {
+  require(checkFilters(), "All filters must be applicable to the data schema.")
+
+  /**
+   * The schema to read from the underlying CSV parser.
+   * It combines the required schema and the fields referenced by filters.
+   */
+  val readSchema: StructType = {
+if (columnPruning) {
+  val refs = filters.flatMap(_.references).toSet
+  val readFields = dataSchema.filter { field =>
+requiredSchema.contains(field) || refs.contains(field.name)
+  }
+  StructType(readFields)
+} else {
+  dataSchema
+}
+  }
+
+  /**
+   * Converted filters to predicates and grouped by maximum field index
+   * in the read schema. For example, if an filter refers to 2 attributes
+   * attrA with field index 5 and attrB with field index 10 in the read schema:
+   *   0 === $"attrA" or $"attrB" < 100
+   * the filter is compiled to a predicate, and placed to the `predicates`
+   * array at the position 10. In this way, if there is a row with initialized
+   * fields from the 0 to 10 index, the predicate can be applied to the row
+   * to check that the row should be skipped or not.
+   * Multiple predicates with the same maximum reference index are combined
+   * by the `And` expression.
+   */
+  private val predicates: Array[BasePredicate] = {
+val len = readSchema.fields.length
+val groupedPredicates = Array.fill[BasePredicate](len)(null)
+if (SQLConf.get.csvFilterPushDown) {
+  val groupedExprs = Array.fill(len)(Seq.empty[Expression])
+  for (filter <- filters) {
+val expr = CSVFilters.filterToExpression(filter, toRef)
+val refs = filter.references
+if (refs.isEmpty) {
+  // For example, AlwaysTrue and AlwaysFalse doesn't have any 
references
+  for (i <- 0 until len) {
+groupedExprs(i) ++= expr
 
 Review comment:
   Even more, `AlwaysTrue` could be removed because it does not impact on the 
result. `AlwaysFalse` could be put at index 0, and other filters can be ignored.
   
   But this is some kind of ad-hoc optimization. The optimization above can 
work for other literal filters.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-12 Thread GitBox

MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r365667390
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVFilters.scala
 ##
 @@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.csv
+
+import scala.util.Try
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources
+import org.apache.spark.sql.types.{BooleanType, StructType}
+
+/**
+ * An instance of the class compiles filters to predicates and allows to
+ * apply the predicates to an internal row with partially initialized values
+ * converted from parsed CSV fields.
+ *
+ * @param filters The filters pushed down to CSV datasource.
+ * @param dataSchema The full schema with all fields in CSV files.
+ * @param requiredSchema The schema with only fields requested by the upper 
layer.
+ * @param columnPruning true if CSV parser can read sub-set of columns 
otherwise false.
+ */
+class CSVFilters(
+filters: Seq[sources.Filter],
+dataSchema: StructType,
+requiredSchema: StructType,
+columnPruning: Boolean) {
+  require(checkFilters(), "All filters must be applicable to the data schema.")
+
+  /**
+   * The schema to read from the underlying CSV parser.
+   * It combines the required schema and the fields referenced by filters.
+   */
+  val readSchema: StructType = {
+if (columnPruning) {
+  val refs = filters.flatMap(_.references).toSet
+  val readFields = dataSchema.filter { field =>
+requiredSchema.contains(field) || refs.contains(field.name)
+  }
+  StructType(readFields)
+} else {
+  dataSchema
+}
+  }
+
+  /**
+   * Converted filters to predicates and grouped by maximum field index
+   * in the read schema. For example, if an filter refers to 2 attributes
+   * attrA with field index 5 and attrB with field index 10 in the read schema:
+   *   0 === $"attrA" or $"attrB" < 100
+   * the filter is compiled to a predicate, and placed to the `predicates`
+   * array at the position 10. In this way, if there is a row with initialized
+   * fields from the 0 to 10 index, the predicate can be applied to the row
+   * to check that the row should be skipped or not.
+   * Multiple predicates with the same maximum reference index are combined
+   * by the `And` expression.
+   */
+  private val predicates: Array[BasePredicate] = {
+val len = readSchema.fields.length
+val groupedPredicates = Array.fill[BasePredicate](len)(null)
+if (SQLConf.get.csvFilterPushDown) {
+  val groupedExprs = Array.fill(len)(Seq.empty[Expression])
+  for (filter <- filters) {
+val expr = CSVFilters.filterToExpression(filter, toRef)
+val refs = filter.references
+if (refs.isEmpty) {
+  // For example, AlwaysTrue and AlwaysFalse doesn't have any 
references
+  for (i <- 0 until len) {
+groupedExprs(i) ++= expr
 
 Review comment:
   Even more, `AlwaysTrue` could be removed because it does not impact on the 
result. `AlwaysFalse` could be put at index 0, and other filters can be ignored.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #27183: [DO-NOT-MERGE] Investigate 
'WorkerMemoryTest.test_memory_limit' failure
URL: https://github.com/apache/spark/pull/27183#issuecomment-573536618
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116606/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration

2020-01-12 Thread GitBox

dongjoon-hyun closed pull request #27146: [SPARK-21869][SS][DOCS][FOLLOWUP] 
Document Kafka producer pool configuration
URL: https://github.com/apache/spark/pull/27146
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure

2020-01-12 Thread GitBox

AmplabJenkins removed a comment on issue #27183: [DO-NOT-MERGE] Investigate 
'WorkerMemoryTest.test_memory_limit' failure
URL: https://github.com/apache/spark/pull/27183#issuecomment-573536614
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #27183: [DO-NOT-MERGE] Investigate 
'WorkerMemoryTest.test_memory_limit' failure
URL: https://github.com/apache/spark/pull/27183#issuecomment-573536614
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure

2020-01-12 Thread GitBox

AmplabJenkins commented on issue #27183: [DO-NOT-MERGE] Investigate 
'WorkerMemoryTest.test_memory_limit' failure
URL: https://github.com/apache/spark/pull/27183#issuecomment-573536618
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116606/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure

2020-01-12 Thread GitBox

SparkQA removed a comment on issue #27183: [DO-NOT-MERGE] Investigate 
'WorkerMemoryTest.test_memory_limit' failure
URL: https://github.com/apache/spark/pull/27183#issuecomment-573528226
 
 
   **[Test build #116606 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116606/testReport)**
 for PR 27183 at commit 
[`f46e394`](https://github.com/apache/spark/commit/f46e3948ce3b88a255a3b0e5560eb0550c84b37c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27183: [DO-NOT-MERGE] Investigate 'WorkerMemoryTest.test_memory_limit' failure

2020-01-12 Thread GitBox

SparkQA commented on issue #27183: [DO-NOT-MERGE] Investigate 
'WorkerMemoryTest.test_memory_limit' failure
URL: https://github.com/apache/spark/pull/27183#issuecomment-573536326
 
 
   **[Test build #116606 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116606/testReport)**
 for PR 27183 at commit 
[`f46e394`](https://github.com/apache/spark/commit/f46e3948ce3b88a255a3b0e5560eb0550c84b37c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-12 Thread GitBox

MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r365666768
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVFilters.scala
 ##
 @@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.csv
+
+import scala.util.Try
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources
+import org.apache.spark.sql.types.{BooleanType, StructType}
+
+/**
+ * An instance of the class compiles filters to predicates and allows to
+ * apply the predicates to an internal row with partially initialized values
+ * converted from parsed CSV fields.
+ *
+ * @param filters The filters pushed down to CSV datasource.
+ * @param dataSchema The full schema with all fields in CSV files.
+ * @param requiredSchema The schema with only fields requested by the upper 
layer.
+ * @param columnPruning true if CSV parser can read sub-set of columns 
otherwise false.
+ */
+class CSVFilters(
+filters: Seq[sources.Filter],
+dataSchema: StructType,
+requiredSchema: StructType,
+columnPruning: Boolean) {
+  require(checkFilters(), "All filters must be applicable to the data schema.")
+
+  /**
+   * The schema to read from the underlying CSV parser.
+   * It combines the required schema and the fields referenced by filters.
+   */
+  val readSchema: StructType = {
+if (columnPruning) {
+  val refs = filters.flatMap(_.references).toSet
+  val readFields = dataSchema.filter { field =>
+requiredSchema.contains(field) || refs.contains(field.name)
+  }
+  StructType(readFields)
+} else {
+  dataSchema
+}
+  }
+
+  /**
+   * Converted filters to predicates and grouped by maximum field index
+   * in the read schema. For example, if an filter refers to 2 attributes
+   * attrA with field index 5 and attrB with field index 10 in the read schema:
+   *   0 === $"attrA" or $"attrB" < 100
+   * the filter is compiled to a predicate, and placed to the `predicates`
+   * array at the position 10. In this way, if there is a row with initialized
+   * fields from the 0 to 10 index, the predicate can be applied to the row
+   * to check that the row should be skipped or not.
+   * Multiple predicates with the same maximum reference index are combined
+   * by the `And` expression.
+   */
+  private val predicates: Array[BasePredicate] = {
+val len = readSchema.fields.length
+val groupedPredicates = Array.fill[BasePredicate](len)(null)
+if (SQLConf.get.csvFilterPushDown) {
+  val groupedExprs = Array.fill(len)(Seq.empty[Expression])
+  for (filter <- filters) {
+val expr = CSVFilters.filterToExpression(filter, toRef)
+val refs = filter.references
+if (refs.isEmpty) {
+  // For example, AlwaysTrue and AlwaysFalse doesn't have any 
references
+  for (i <- 0 until len) {
+groupedExprs(i) ++= expr
 
 Review comment:
   You are right since we combine all pushed filters via `And`. Also I think 
all filters with references (literals) could be put at the beginning of the 
group before reducing here 
https://github.com/apache/spark/pull/26973/files#diff-44a98c4a53980cb04e57f0489b257a37R95
   So, we have pushed filters: 
   `Seq(AlwaysFalse, StringContains(ref0, "abc"))`, and they are reduced to
   `And(AlwaysFalse, StringContains(ref0, "abc"))`, the second filter 
`StringContains(ref0, "abc")` will not be evaluated at all.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 921 matches

Mail list logo