[GitHub] [spark] HyukjinKwon closed pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-23 Thread GitBox
HyukjinKwon closed pull request #34687: URL: https://github.com/apache/spark/pull/34687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] HyukjinKwon commented on pull request #34687: [SPARK-36231][PYTHON] Support arithmetic operations of decimal(nan) series

2021-11-23 Thread GitBox
HyukjinKwon commented on pull request #34687: URL: https://github.com/apache/spark/pull/34687#issuecomment-976348748 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] SparkQA commented on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-23 Thread GitBox
SparkQA commented on pull request #34677: URL: https://github.com/apache/spark/pull/34677#issuecomment-976432192 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50019/ -- This is an automated message from the

[GitHub] [spark] peter-toth opened a new pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

2021-11-23 Thread GitBox
peter-toth opened a new pull request #34693: URL: https://github.com/apache/spark/pull/34693 ### What changes were proposed in this pull request? CTE queries are not supported with MSSQL server via JDBC as MSSQL server doesn't support statements with nested CTEs. When Spark builds the

[GitHub] [spark] SparkQA commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

2021-11-23 Thread GitBox
SparkQA commented on pull request #34693: URL: https://github.com/apache/spark/pull/34693#issuecomment-976489807 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50020/ -- This is an automated message from the Apache

[GitHub] [spark] AmplabJenkins commented on pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-23 Thread GitBox
AmplabJenkins commented on pull request #34611: URL: https://github.com/apache/spark/pull/34611#issuecomment-976490264 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145540/ -- This

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-23 Thread GitBox
AmplabJenkins removed a comment on pull request #34677: URL: https://github.com/apache/spark/pull/34677#issuecomment-976494224 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145547/

[GitHub] [spark] SparkQA removed a comment on pull request #34691: [SPARK-37447][SQL] Cache LogicalPlan.isStreaming() result in a lazy val

2021-11-23 Thread GitBox
SparkQA removed a comment on pull request #34691: URL: https://github.com/apache/spark/pull/34691#issuecomment-976296694 **[Test build #145544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145544/testReport)** for PR 34691 at commit

[GitHub] [spark] cloud-fan commented on a change in pull request #34668: [SPARK-37389][SQL] Check unclosed bracketed comments

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34668: URL: https://github.com/apache/spark/pull/34668#discussion_r755141437 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -73,6 +78,19 @@ grammar SqlBase; return false;

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755165986 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -355,7 +377,14 @@ case class FileSourceScanExec(

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755170678 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala ## @@ -103,6 +116,135 @@ class FileScanRDD(

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34668: [SPARK-37389][SQL] Check unclosed bracketed comments

2021-11-23 Thread GitBox
AmplabJenkins removed a comment on pull request #34668: URL: https://github.com/apache/spark/pull/34668#issuecomment-97983 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50021/

[GitHub] [spark] cloud-fan commented on a change in pull request #34504: [SPARK-37226][SQL] Filter push down through window if partitionSpec isEmpty

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34504: URL: https://github.com/apache/spark/pull/34504#discussion_r755214307 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -1548,6 +1548,31 @@ object

[GitHub] [spark] cloud-fan closed pull request #34454: [SPARK-37013][CORE][SQL][FOLLOWUP] Use the new error framework to throw error in `FormatString`

2021-11-23 Thread GitBox
cloud-fan closed pull request #34454: URL: https://github.com/apache/spark/pull/34454 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] cloud-fan commented on pull request #34454: [SPARK-37013][CORE][SQL][FOLLOWUP] Use the new error framework to throw error in `FormatString`

2021-11-23 Thread GitBox
cloud-fan commented on pull request #34454: URL: https://github.com/apache/spark/pull/34454#issuecomment-976686019 Sorry I missed the ping. Merging to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] srowen commented on a change in pull request #34679: [SPARK-37437][BUILD] Remove unused hive profile

2021-11-23 Thread GitBox
srowen commented on a change in pull request #34679: URL: https://github.com/apache/spark/pull/34679#discussion_r755238638 ## File path: pom.xml ## @@ -3353,11 +3353,6 @@ - Review comment: We could possibly leave the profile in and have it still do

[GitHub] [spark] cloud-fan commented on a change in pull request #34668: [SPARK-37389][SQL] Check unclosed bracketed comments

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34668: URL: https://github.com/apache/spark/pull/34668#discussion_r755142758 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -73,6 +78,19 @@ grammar SqlBase; return false;

[GitHub] [spark] cloud-fan commented on a change in pull request #34668: [SPARK-37389][SQL] Check unclosed bracketed comments

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34668: URL: https://github.com/apache/spark/pull/34668#discussion_r755141872 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -73,6 +78,19 @@ grammar SqlBase; return false;

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755178439 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala ## @@ -103,6 +116,135 @@ class FileScanRDD(

[GitHub] [spark] cloud-fan commented on pull request #34642: [SPARK-37369][SQL] Avoid redundant ColumnarToRow transistion on InMemoryTableScan

2021-11-23 Thread GitBox
cloud-fan commented on pull request #34642: URL: https://github.com/apache/spark/pull/34642#issuecomment-976690695 I'm trying to understand the motivation. Is it because in-memory table can output rows efficiently? Parquet scan can also output rows but we try our best to output columnar

[GitHub] [spark] srowen commented on pull request #34692: [SPARK-11792][FOLLOWUP] Update scaladoc of KnownSizeEstimation

2021-11-23 Thread GitBox
srowen commented on pull request #34692: URL: https://github.com/apache/spark/pull/34692#issuecomment-976711598 Because that JIRA is so old, we wouldn't really treat this as part of that JIRA. I'll just make it a minor docs PR. -- This is an automated message from the Apache Git

[GitHub] [spark] gengliangwang commented on a change in pull request #34596: [SPARK-37326][SQL] Support TimestampNTZ in CSV data source

2021-11-23 Thread GitBox
gengliangwang commented on a change in pull request #34596: URL: https://github.com/apache/spark/pull/34596#discussion_r755249223 ## File path: sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out ## @@ -373,17 +374,19 @@

[GitHub] [spark] peter-toth commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

2021-11-23 Thread GitBox
peter-toth commented on pull request #34693: URL: https://github.com/apache/spark/pull/34693#issuecomment-976828400 Hmm, failures in `ExpressionsSchemaSuite` look unrelated... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] SparkQA commented on pull request #34691: [SPARK-37447][SQL] Cache LogicalPlan.isStreaming() result in a lazy val

2021-11-23 Thread GitBox
SparkQA commented on pull request #34691: URL: https://github.com/apache/spark/pull/34691#issuecomment-976549263 **[Test build #145544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145544/testReport)** for PR 34691 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #34070: [SPARK-36840][SQL] Support DPP if there is no selective predicate on the filtering side

2021-11-23 Thread GitBox
AmplabJenkins commented on pull request #34070: URL: https://github.com/apache/spark/pull/34070#issuecomment-976570564 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145545/ -- This

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34070: [SPARK-36840][SQL] Support DPP if there is no selective predicate on the filtering side

2021-11-23 Thread GitBox
AmplabJenkins removed a comment on pull request #34070: URL: https://github.com/apache/spark/pull/34070#issuecomment-976570564 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145545/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

2021-11-23 Thread GitBox
AmplabJenkins removed a comment on pull request #34693: URL: https://github.com/apache/spark/pull/34693#issuecomment-976570563 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50020/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34691: [SPARK-37447][SQL] Cache LogicalPlan.isStreaming() result in a lazy val

2021-11-23 Thread GitBox
AmplabJenkins removed a comment on pull request #34691: URL: https://github.com/apache/spark/pull/34691#issuecomment-976570562 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145544/

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755163320 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -194,10 +195,22 @@ case class FileSourceScanExec(

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755163320 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -194,10 +195,22 @@ case class FileSourceScanExec(

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755163320 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -194,10 +195,22 @@ case class FileSourceScanExec(

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755167684 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala ## @@ -171,6 +171,28 @@ trait FileFormat { def

[GitHub] [spark] SparkQA removed a comment on pull request #34070: [SPARK-36840][SQL] Support DPP if there is no selective predicate on the filtering side

2021-11-23 Thread GitBox
SparkQA removed a comment on pull request #34070: URL: https://github.com/apache/spark/pull/34070#issuecomment-976297392 **[Test build #145545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145545/testReport)** for PR 34070 at commit

[GitHub] [spark] SparkQA commented on pull request #34070: [SPARK-36840][SQL] Support DPP if there is no selective predicate on the filtering side

2021-11-23 Thread GitBox
SparkQA commented on pull request #34070: URL: https://github.com/apache/spark/pull/34070#issuecomment-976551994 **[Test build #145545 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145545/testReport)** for PR 34070 at commit

[GitHub] [spark] cloud-fan commented on a change in pull request #34668: [SPARK-37389][SQL] Check unclosed bracketed comments

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34668: URL: https://github.com/apache/spark/pull/34668#discussion_r755150850 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala ## @@ -78,9 +79,30 @@ abstract class

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755164613 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -212,7 +225,16 @@ case class FileSourceScanExec(

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755183339 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala ## @@ -103,6 +116,135 @@ class FileScanRDD(

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755188315 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataColumnsSuite.scala ## @@ -0,0 +1,481 @@ +/* + * Licensed

[GitHub] [spark] cloud-fan commented on a change in pull request #34504: [SPARK-37226][SQL] Filter push down through window if partitionSpec isEmpty

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34504: URL: https://github.com/apache/spark/pull/34504#discussion_r755213627 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -1548,6 +1548,31 @@ object

[GitHub] [spark] peter-toth commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

2021-11-23 Thread GitBox
peter-toth commented on pull request #34693: URL: https://github.com/apache/spark/pull/34693#issuecomment-97672 This change also seem to work with MSSQL's temp table syntax: ``` val withClause = "(SELECT * INTO #TempTable FROM (SELECT * FROM tbl) t)" val query = "SELECT * FROM

[GitHub] [spark] SparkQA commented on pull request #34671: [SPARK-37399][SPARK-37403][PySpark][ML] Merge {ml, mllib}/common.pyi into common.py

2021-11-23 Thread GitBox
SparkQA commented on pull request #34671: URL: https://github.com/apache/spark/pull/34671#issuecomment-976793578 **[Test build #145550 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145550/testReport)** for PR 34671 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-23 Thread GitBox
AmplabJenkins commented on pull request #34677: URL: https://github.com/apache/spark/pull/34677#issuecomment-976494224 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145547/ -- This

[GitHub] [spark] thejdeep commented on pull request #34607: [SPARK-36038][CORE] Speculation metrics summary at stage level

2021-11-23 Thread GitBox
thejdeep commented on pull request #34607: URL: https://github.com/apache/spark/pull/34607#issuecomment-976532846 > > I changed maybeUpdate to update when we encounter a speculative task. The problem with the previous approach was that - if we have a speculative task, then future task end

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755180169 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala ## @@ -103,6 +116,135 @@ class FileScanRDD(

[GitHub] [spark] nchammas commented on pull request #34655: [SPARK-37380][PYTHON] Miscellaneous Python lint infra cleanup

2021-11-23 Thread GitBox
nchammas commented on pull request #34655: URL: https://github.com/apache/spark/pull/34655#issuecomment-976699676 @HyukjinKwon - Is there anyone else you think should review this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan commented on pull request #34684: [SPARK-37442][SQL] InMemoryRelation statistics bug causing broadcast join failures with AQE enabled

2021-11-23 Thread GitBox
cloud-fan commented on pull request #34684: URL: https://github.com/apache/spark/pull/34684#issuecomment-976714972 Why is there no problem with AQE off? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] srowen commented on a change in pull request #34692: [MINOR][DOCS] Update scaladoc of KnownSizeEstimation

2021-11-23 Thread GitBox
srowen commented on a change in pull request #34692: URL: https://github.com/apache/spark/pull/34692#discussion_r755242633 ## File path: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ## @@ -33,10 +33,9 @@ import org.apache.spark.util.collection.OpenHashSet

[GitHub] [spark] cloud-fan commented on a change in pull request #34596: [SPARK-37326][SQL] Support TimestampNTZ in CSV data source

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34596: URL: https://github.com/apache/spark/pull/34596#discussion_r755252311 ## File path: sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out ## @@ -373,17 +374,19 @@ struct

[GitHub] [spark] SparkQA removed a comment on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

2021-11-23 Thread GitBox
SparkQA removed a comment on pull request #34693: URL: https://github.com/apache/spark/pull/34693#issuecomment-976453455 **[Test build #145548 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145548/testReport)** for PR 34693 at commit

[GitHub] [spark] SparkQA commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

2021-11-23 Thread GitBox
SparkQA commented on pull request #34693: URL: https://github.com/apache/spark/pull/34693#issuecomment-976827223 **[Test build #145548 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145548/testReport)** for PR 34693 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34611: [SPARK-35867][SQL] Enable vectorized read for VectorizedPlainValuesReader.readBooleans

2021-11-23 Thread GitBox
AmplabJenkins removed a comment on pull request #34611: URL: https://github.com/apache/spark/pull/34611#issuecomment-976490264 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145540/

[GitHub] [spark] SparkQA commented on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-23 Thread GitBox
SparkQA commented on pull request #34677: URL: https://github.com/apache/spark/pull/34677#issuecomment-976491507 **[Test build #145547 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145547/testReport)** for PR 34677 at commit

[GitHub] [spark] thejdeep commented on a change in pull request #34607: [SPARK-36038][CORE] Speculation metrics summary at stage level

2021-11-23 Thread GitBox
thejdeep commented on a change in pull request #34607: URL: https://github.com/apache/spark/pull/34607#discussion_r755122578 ## File path: core/src/main/scala/org/apache/spark/status/storeTypes.scala ## @@ -399,6 +399,20 @@ private[spark] class ExecutorStageSummaryWrapper(

[GitHub] [spark] SparkQA commented on pull request #34668: [SPARK-37389][SQL] Check unclosed bracketed comments

2021-11-23 Thread GitBox
SparkQA commented on pull request #34668: URL: https://github.com/apache/spark/pull/34668#issuecomment-976553555 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50021/ -- This is an automated message from the Apache

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755161598 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -194,10 +195,22 @@ case class FileSourceScanExec(

[GitHub] [spark] sarutak commented on pull request #34607: [SPARK-36038][CORE] Speculation metrics summary at stage level

2021-11-23 Thread GitBox
sarutak commented on pull request #34607: URL: https://github.com/apache/spark/pull/34607#issuecomment-976610243 > since it would require changing a class val to a var. What's the problem? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755175121 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala ## @@ -103,6 +116,135 @@ class FileScanRDD(

[GitHub] [spark] cloud-fan commented on a change in pull request #34504: [SPARK-37226][SQL] Filter push down through window if partitionSpec isEmpty

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34504: URL: https://github.com/apache/spark/pull/34504#discussion_r755210674 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -1548,6 +1548,31 @@ object

[GitHub] [spark] SparkQA commented on pull request #34671: [SPARK-37399][SPARK-37403][PySpark][ML] Merge {ml, mllib}/common.pyi into common.py

2021-11-23 Thread GitBox
SparkQA commented on pull request #34671: URL: https://github.com/apache/spark/pull/34671#issuecomment-976763063 **[Test build #145550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145550/testReport)** for PR 34671 at commit

[GitHub] [spark] SparkQA commented on pull request #34671: [SPARK-37399][SPARK-37403][PySpark][ML] Merge {ml, mllib}/common.pyi into common.py

2021-11-23 Thread GitBox
SparkQA commented on pull request #34671: URL: https://github.com/apache/spark/pull/34671#issuecomment-976826549 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50022/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA removed a comment on pull request #34677: [SPARK-37436][PYTHON] Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-23 Thread GitBox
SparkQA removed a comment on pull request #34677: URL: https://github.com/apache/spark/pull/34677#issuecomment-976356000 **[Test build #145547 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145547/testReport)** for PR 34677 at commit

[GitHub] [spark] SparkQA commented on pull request #34668: [SPARK-37389][SQL] Check unclosed bracketed comments

2021-11-23 Thread GitBox
SparkQA commented on pull request #34668: URL: https://github.com/apache/spark/pull/34668#issuecomment-976492795 **[Test build #145549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145549/testReport)** for PR 34668 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

2021-11-23 Thread GitBox
AmplabJenkins commented on pull request #34693: URL: https://github.com/apache/spark/pull/34693#issuecomment-976570563 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50020/ --

[GitHub] [spark] AmplabJenkins commented on pull request #34691: [SPARK-37447][SQL] Cache LogicalPlan.isStreaming() result in a lazy val

2021-11-23 Thread GitBox
AmplabJenkins commented on pull request #34691: URL: https://github.com/apache/spark/pull/34691#issuecomment-976570562 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145544/ -- This

[GitHub] [spark] SparkQA commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

2021-11-23 Thread GitBox
SparkQA commented on pull request #34693: URL: https://github.com/apache/spark/pull/34693#issuecomment-976568407 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50020/ -- This is an automated message from the

[GitHub] [spark] cloud-fan commented on a change in pull request #34668: [SPARK-37389][SQL] Check unclosed bracketed comments

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34668: URL: https://github.com/apache/spark/pull/34668#discussion_r755144886 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala ## @@ -78,9 +79,30 @@ abstract class

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755169306 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala ## @@ -57,11 +66,15 @@ case class PartitionedFile(

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755169884 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala ## @@ -57,11 +66,15 @@ case class PartitionedFile(

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755176293 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala ## @@ -103,6 +116,135 @@ class FileScanRDD(

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755181619 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala ## @@ -103,6 +116,135 @@ class FileScanRDD(

[GitHub] [spark] cloud-fan commented on a change in pull request #34575: [SPARK-37273][SQL] Support hidden file metadata columns in Spark SQL

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34575: URL: https://github.com/apache/spark/pull/34575#discussion_r755185991 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala ## @@ -212,7 +212,9 @@ object

[GitHub] [spark] SparkQA commented on pull request #34668: [SPARK-37389][SQL] Check unclosed bracketed comments

2021-11-23 Thread GitBox
SparkQA commented on pull request #34668: URL: https://github.com/apache/spark/pull/34668#issuecomment-976633320 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50021/ -- This is an automated message from the

[GitHub] [spark] AmplabJenkins commented on pull request #34668: [SPARK-37389][SQL] Check unclosed bracketed comments

2021-11-23 Thread GitBox
AmplabJenkins commented on pull request #34668: URL: https://github.com/apache/spark/pull/34668#issuecomment-97983 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50021/ --

[GitHub] [spark] srowen commented on a change in pull request #34689: [SPARK-37445][BUILD] Upgrade hadoop profile to hadoop-3.3 since we support hadoop-3.3 as default now

2021-11-23 Thread GitBox
srowen commented on a change in pull request #34689: URL: https://github.com/apache/spark/pull/34689#discussion_r755239983 ## File path: hadoop-cloud/pom.xml ## @@ -201,7 +201,7 @@ enables store-specific committers. --> - hadoop-3.2 + hadoop-3.3

[GitHub] [spark] cloud-fan commented on a change in pull request #34634: [SPARK-37357][SQL] Create skew partition specs should respect min partition size

2021-11-23 Thread GitBox
cloud-fan commented on a change in pull request #34634: URL: https://github.com/apache/spark/pull/34634#discussion_r755239272 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala ## @@ -316,21 +316,25 @@ object

[GitHub] [spark] peter-toth edited a comment on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

2021-11-23 Thread GitBox
peter-toth edited a comment on pull request #34693: URL: https://github.com/apache/spark/pull/34693#issuecomment-97672 This change also seem to work with MSSQL's temp table syntax: ``` val withClause = "(SELECT * INTO #TempTable FROM (SELECT * FROM tbl WHERE x > 10) t)" val

[GitHub] [spark] tgravescs commented on pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

2021-11-23 Thread GitBox
tgravescs commented on pull request #34622: URL: https://github.com/apache/spark/pull/34622#issuecomment-976749417 yes, it would be nice to have the actual stagIds in the ui, I'll need to look closer at the logic though, which likely won't be til next week. -- This is an automated

[GitHub] [spark] SparkQA removed a comment on pull request #34671: [SPARK-37399][SPARK-37403][PySpark][ML] Merge {ml, mllib}/common.pyi into common.py

2021-11-23 Thread GitBox
SparkQA removed a comment on pull request #34671: URL: https://github.com/apache/spark/pull/34671#issuecomment-976763063 **[Test build #145550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145550/testReport)** for PR 34671 at commit

[GitHub] [spark] tdg5 commented on a change in pull request #29024: [SPARK-32001][SQL]Create JDBC authentication provider developer API

2021-11-23 Thread GitBox
tdg5 commented on a change in pull request #29024: URL: https://github.com/apache/spark/pull/29024#discussion_r755337180 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProvider.scala ## @@ -18,60 +18,45 @@ package

[GitHub] [spark] ChenMichael edited a comment on pull request #34684: [SPARK-37442][SQL] InMemoryRelation statistics bug causing broadcast join failures with AQE enabled

2021-11-23 Thread GitBox
ChenMichael edited a comment on pull request #34684: URL: https://github.com/apache/spark/pull/34684#issuecomment-976849576 In order for this problem to manifest, we have to do join planning in between the time an InMemoryRelation is converted to a rdd and the time where the job executing

[GitHub] [spark] SparkQA commented on pull request #34668: [SPARK-37389][SQL] Check unclosed bracketed comments

2021-11-23 Thread GitBox
SparkQA commented on pull request #34668: URL: https://github.com/apache/spark/pull/34668#issuecomment-976912054 **[Test build #145549 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145549/testReport)** for PR 34668 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #34671: [SPARK-37399][SPARK-37403][PySpark][ML] Merge {ml, mllib}/common.pyi into common.py

2021-11-23 Thread GitBox
AmplabJenkins commented on pull request #34671: URL: https://github.com/apache/spark/pull/34671#issuecomment-976940851 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50022/ --

[GitHub] [spark] sadikovi commented on a change in pull request #34596: [SPARK-37326][SQL] Support TimestampNTZ in CSV data source

2021-11-23 Thread GitBox
sadikovi commented on a change in pull request #34596: URL: https://github.com/apache/spark/pull/34596#discussion_r755539240 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -442,17 +442,22 @@ object DateTimeUtils {

[GitHub] [spark] SparkQA commented on pull request #34593: [SPARK-37324][SQL] Adds support for decimal rounding mode up, down, half_down

2021-11-23 Thread GitBox
SparkQA commented on pull request #34593: URL: https://github.com/apache/spark/pull/34593#issuecomment-977249138 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50023/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA commented on pull request #34593: [SPARK-37324][SQL] Adds support for decimal rounding mode up, down, half_down

2021-11-23 Thread GitBox
SparkQA commented on pull request #34593: URL: https://github.com/apache/spark/pull/34593#issuecomment-977252277 **[Test build #145552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145552/testReport)** for PR 34593 at commit

[GitHub] [spark] ChenMichael edited a comment on pull request #34684: [SPARK-37442][SQL] InMemoryRelation statistics bug causing broadcast join failures with AQE enabled

2021-11-23 Thread GitBox
ChenMichael edited a comment on pull request #34684: URL: https://github.com/apache/spark/pull/34684#issuecomment-976849576 In order for this problem to manifest, we have to do join planning between the time a InMemoryRelation is converted to an rdd and the time where the job executing

[GitHub] [spark] ChenMichael edited a comment on pull request #34684: [SPARK-37442][SQL] InMemoryRelation statistics bug causing broadcast join failures with AQE enabled

2021-11-23 Thread GitBox
ChenMichael edited a comment on pull request #34684: URL: https://github.com/apache/spark/pull/34684#issuecomment-976849576 In order for this problem to manifest, we have to do join planning in between the time an InMemoryRelation is converted to a rdd and the time where the job executing

[GitHub] [spark] SparkQA commented on pull request #34671: [SPARK-37399][SPARK-37403][PySpark][ML] Merge {ml, mllib}/common.pyi into common.py

2021-11-23 Thread GitBox
SparkQA commented on pull request #34671: URL: https://github.com/apache/spark/pull/34671#issuecomment-976884853 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50022/ -- This is an automated message from the

[GitHub] [spark] sadikovi commented on a change in pull request #34596: [SPARK-37326][SQL] Support TimestampNTZ in CSV data source

2021-11-23 Thread GitBox
sadikovi commented on a change in pull request #34596: URL: https://github.com/apache/spark/pull/34596#discussion_r755538280 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ## @@ -164,6 +164,10 @@ class CSVOptions(

[GitHub] [spark] sadikovi commented on a change in pull request #34596: [SPARK-37326][SQL] Support TimestampNTZ in CSV data source

2021-11-23 Thread GitBox
sadikovi commented on a change in pull request #34596: URL: https://github.com/apache/spark/pull/34596#discussion_r755538940 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala ## @@ -66,10 +68,23 @@ sealed trait

[GitHub] [spark] SparkQA commented on pull request #34593: [SPARK-37324][SQL] Adds support for decimal rounding mode up, down, half_down

2021-11-23 Thread GitBox
SparkQA commented on pull request #34593: URL: https://github.com/apache/spark/pull/34593#issuecomment-977218053 **[Test build #145551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145551/testReport)** for PR 34593 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34671: [SPARK-37399][SPARK-37403][PySpark][ML] Merge {ml, mllib}/common.pyi into common.py

2021-11-23 Thread GitBox
AmplabJenkins removed a comment on pull request #34671: URL: https://github.com/apache/spark/pull/34671#issuecomment-976848362 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145550/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

2021-11-23 Thread GitBox
AmplabJenkins removed a comment on pull request #34693: URL: https://github.com/apache/spark/pull/34693#issuecomment-976848358 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145548/

[GitHub] [spark] viirya commented on pull request #34642: [SPARK-37369][SQL] Avoid redundant ColumnarToRow transistion on InMemoryTableScan

2021-11-23 Thread GitBox
viirya commented on pull request #34642: URL: https://github.com/apache/spark/pull/34642#issuecomment-977016485 > I'm trying to understand the motivation. Is it because in-memory table can output rows efficiently? Parquet scan can also output rows but we try our best to output columnar

[GitHub] [spark] SparkQA commented on pull request #34386: [WIP] - Changes to PySpark doc homepage and User Guide

2021-11-23 Thread GitBox
SparkQA commented on pull request #34386: URL: https://github.com/apache/spark/pull/34386#issuecomment-977462625 **[Test build #145553 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145553/testReport)** for PR 34386 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #34386: [WIP] - Changes to PySpark doc homepage and User Guide

2021-11-23 Thread GitBox
AmplabJenkins commented on pull request #34386: URL: https://github.com/apache/spark/pull/34386#issuecomment-977463411 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145553/ -- This

[GitHub] [spark] srowen commented on a change in pull request #34689: [SPARK-37445][BUILD] Upgrade hadoop profile to hadoop-3.3 since we support hadoop-3.3 as default now

2021-11-23 Thread GitBox
srowen commented on a change in pull request #34689: URL: https://github.com/apache/spark/pull/34689#discussion_r755671589 ## File path: hadoop-cloud/pom.xml ## @@ -201,7 +201,7 @@ enables store-specific committers. --> - hadoop-3.2 + hadoop-3.3

[GitHub] [spark] AmplabJenkins commented on pull request #34593: [SPARK-37324][SQL] Adds support for decimal rounding mode up, down, half_down

2021-11-23 Thread GitBox
AmplabJenkins commented on pull request #34593: URL: https://github.com/apache/spark/pull/34593#issuecomment-977464156 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145551/ -- This

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #34679: [SPARK-37437][BUILD] Remove unused hive profile

2021-11-23 Thread GitBox
AngersZh commented on a change in pull request #34679: URL: https://github.com/apache/spark/pull/34679#discussion_r755672634 ## File path: pom.xml ## @@ -3353,11 +3353,6 @@ - Review comment: > We could possibly leave the profile in and have it

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #34679: [SPARK-37437][BUILD] Remove unused hive profile

2021-11-23 Thread GitBox
AngersZh commented on a change in pull request #34679: URL: https://github.com/apache/spark/pull/34679#discussion_r755674382 ## File path: pom.xml ## @@ -3353,11 +3353,6 @@ - Review comment: > No, I just mean do not remove the profile, so that

<    1   2   3   4   >