[GitHub] [spark] AmplabJenkins commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-845675575


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43310/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-20 Thread GitBox


SparkQA commented on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-845672347


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43310/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32609: [SPARK-29223][SQL][SS] New option to specify timestamp on all subscribing topic-partitions in Kafka source

2021-05-20 Thread GitBox


SparkQA commented on pull request #32609:
URL: https://github.com/apache/spark/pull/32609#issuecomment-845671780


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43309/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #32615: [SPARK-35479][SQL] Format PartitionFilters IN strings in scan nodes

2021-05-20 Thread GitBox


cloud-fan closed pull request #32615:
URL: https://github.com/apache/spark/pull/32615


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #32615: [SPARK-35479][SQL] Format PartitionFilters IN strings in scan nodes

2021-05-20 Thread GitBox


cloud-fan commented on pull request #32615:
URL: https://github.com/apache/spark/pull/32615#issuecomment-845671031


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #32590: [SPARK-35445][SQL] Reduce the execution time of DeduplicateRelations

2021-05-20 Thread GitBox


Ngone51 commented on pull request #32590:
URL: https://github.com/apache/spark/pull/32590#issuecomment-845665122


   thanks all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang closed pull request #32590: [SPARK-35445][SQL] Reduce the execution time of DeduplicateRelations

2021-05-20 Thread GitBox


gengliangwang closed pull request #32590:
URL: https://github.com/apache/spark/pull/32590


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #32590: [SPARK-35445][SQL] Reduce the execution time of DeduplicateRelations

2021-05-20 Thread GitBox


gengliangwang commented on pull request #32590:
URL: https://github.com/apache/spark/pull/32590#issuecomment-845662291


   Thanks, merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #32600: [SPARK-35456][CORE] Print the invalid value in config validation error message

2021-05-20 Thread GitBox


HyukjinKwon closed pull request #32600:
URL: https://github.com/apache/spark/pull/32600


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32600: [SPARK-35456][CORE] Print the invalid value in config validation error message

2021-05-20 Thread GitBox


HyukjinKwon commented on pull request #32600:
URL: https://github.com/apache/spark/pull/32600#issuecomment-845661750


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32600: [SPARK-35456][CORE] Print the invalid value in config validation error message

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #32600:
URL: https://github.com/apache/spark/pull/32600#issuecomment-845658495


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138781/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32600: [SPARK-35456][CORE] Print the invalid value in config validation error message

2021-05-20 Thread GitBox


AmplabJenkins removed a comment on pull request #32600:
URL: https://github.com/apache/spark/pull/32600#issuecomment-845658495


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138781/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32600: [SPARK-35456][CORE] Print the invalid value in config validation error message

2021-05-20 Thread GitBox


SparkQA removed a comment on pull request #32600:
URL: https://github.com/apache/spark/pull/32600#issuecomment-845597607


   **[Test build #138781 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138781/testReport)**
 for PR 32600 at commit 
[`d41a1be`](https://github.com/apache/spark/commit/d41a1be2a50bd9b7c7afb49f3e6e7adc12860d78).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32600: [SPARK-35456][CORE] Print the invalid value in config validation error message

2021-05-20 Thread GitBox


SparkQA commented on pull request #32600:
URL: https://github.com/apache/spark/pull/32600#issuecomment-845657587


   **[Test build #138781 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138781/testReport)**
 for PR 32600 at commit 
[`d41a1be`](https://github.com/apache/spark/commit/d41a1be2a50bd9b7c7afb49f3e6e7adc12860d78).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #32574: [SPARK-35427][SQL][TESTS] Check the `EXCEPTION` rebase mode for Avro/Parquet

2021-05-20 Thread GitBox


MaxGekk commented on pull request #32574:
URL: https://github.com/apache/spark/pull/32574#issuecomment-845655136


   @gengliangwang @HyukjinKwon Could you take a look at the PR, please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a change in pull request #32606: [SPARK-35287][SQL] Allow RemoveRedundantProjects to preserve ProjectExec which generates UnsafeRow for DataSourceV2ScanRel

2021-05-20 Thread GitBox


allisonwang-db commented on a change in pull request #32606:
URL: https://github.com/apache/spark/pull/32606#discussion_r636635430



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/RemoveRedundantProjectsSuite.scala
##
@@ -215,6 +217,27 @@ abstract class RemoveRedundantProjectsSuiteBase
 |LIMIT 10
 |""".stripMargin
 assertProjectExec(query, 0, 3)
+
+  }
+  Seq("true", "false").foreach { codegenEnabled =>
+test("SPARK-35287: project generating unsafe row " +

Review comment:
   "project generating unsafe row for DataSourceV2ScanRelation should not 
be removed"

##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/RemoveRedundantProjectsSuite.scala
##
@@ -215,6 +217,27 @@ abstract class RemoveRedundantProjectsSuiteBase
 |LIMIT 10
 |""".stripMargin
 assertProjectExec(query, 0, 3)
+
+  }
+  Seq("true", "false").foreach { codegenEnabled =>
+test("SPARK-35287: project generating unsafe row " +
+  s"should not be removed (codegen=$codegenEnabled)") {
+  withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1",

Review comment:
   Why do we need to set the broadcast hash join threshold and the leaf 
node default parallelism?

##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/RemoveRedundantProjectsSuite.scala
##
@@ -215,6 +217,27 @@ abstract class RemoveRedundantProjectsSuiteBase
 |LIMIT 10
 |""".stripMargin
 assertProjectExec(query, 0, 3)
+
+  }
+  Seq("true", "false").foreach { codegenEnabled =>
+test("SPARK-35287: project generating unsafe row " +
+  s"should not be removed (codegen=$codegenEnabled)") {
+  withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1",
+SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> codegenEnabled,
+SQLConf.LEAF_NODE_DEFAULT_PARALLELISM.key -> "1") {
+withTempPath { path =>
+  val format = classOf[SimpleWritableDataSource].getName
+  spark.range(3).select($"id" as "i", $"id" as "j")
+.write.format(format).mode("overwrite").save(path.getCanonicalPath)
+
+  val df = spark.read.format(format).load(path.getCanonicalPath)
+  val dfLeft = df.as("x")
+  val dfRight = df.as("y")
+  val join = dfLeft.filter(dfLeft("i") > 0).join(dfRight, "i")
+  assert(join.collect === Array(Row(1, 1, 1), Row(2, 2, 2)))

Review comment:
   Instead of having two tests, how about providing a more meaningful error 
message here with the codegenEnabled value. Then it's easy to tell which case 
fails from the failure message. Also, let's assert the number of project nods 
in the plan using `assertProjectExecCount`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31830: [SPARK-34735][SQL][UI] Add modified configs for SQL execution in UI

2021-05-20 Thread GitBox


AmplabJenkins removed a comment on pull request #31830:
URL: https://github.com/apache/spark/pull/31830#issuecomment-845652568


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43304/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


AmplabJenkins removed a comment on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845652569


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43308/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845652569


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43308/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31830: [SPARK-34735][SQL][UI] Add modified configs for SQL execution in UI

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #31830:
URL: https://github.com/apache/spark/pull/31830#issuecomment-845652568


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43304/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


SparkQA commented on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845652182


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43308/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #32574: [SPARK-35427][SQL][TESTS] Check the `EXCEPTION` rebase mode for Avro/Parquet

2021-05-20 Thread GitBox


MaxGekk commented on pull request #32574:
URL: https://github.com/apache/spark/pull/32574#issuecomment-845649011


   @cloud-fan Any objections to the changes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31830: [SPARK-34735][SQL][UI] Add modified configs for SQL execution in UI

2021-05-20 Thread GitBox


SparkQA commented on pull request #31830:
URL: https://github.com/apache/spark/pull/31830#issuecomment-845645932


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43304/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a change in pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


itholic commented on a change in pull request #32546:
URL: https://github.com/apache/spark/pull/32546#discussion_r636625094



##
File path: docs/sql-data-sources-orc.md
##
@@ -172,3 +172,29 @@ When reading from Hive metastore ORC tables and inserting 
to Hive metastore ORC
   2.0.0
   
 
+
+## Data Source Option
+
+Data source options of ORC can be set via:
+* the `.option`/`.options` methods of
+  *  `DataFrameReader`
+  *  `DataFrameWriter`
+  *  `DataStreamReader`
+  *  `DataStreamWriter`
+
+
+  Property 
NameDefaultMeaningScope
+  
+mergeSchema
+None
+sets whether we should merge schemas collected from all ORC 
part-files. This will override spark.sql.orc.mergeSchema. The 
default value is specified in spark.sql.orc.mergeSchema.
+read
+  
+  
+compression
+None
+compression codec to use when saving to file. This can be one of the 
known case-insensitive shorten names (none, snappy, zlib, lzo, and zstd). This 
will override orc.compress and 
spark.sql.orc.compression.codec. If None is set, it uses the value 
specified in spark.sql.orc.compression.codec.
+write
+  
+
+Other generic options can be found in https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html;>
 Generic File Source Options.

Review comment:
   Thanks, @dongjoon-hyun .
   I took a look for that but seems tricky to create a link for each release in 
Scaladoc ..
   I created a JIRA to track it separately here: SPARK-35481.
   I will take a separate look if that's fine to you too!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] chrismbryant commented on pull request #27278: [SPARK-30569][SQL][PYSPARK][SPARKR] Add percentile_approx DSL functions.

2021-05-20 Thread GitBox


chrismbryant commented on pull request #27278:
URL: https://github.com/apache/spark/pull/27278#issuecomment-845642088


   @HyukjinKwon Thanks, here's that ticket: 
https://issues.apache.org/jira/browse/SPARK-35480


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32577: [SPARK-35422][SQL] Fix plan-printing issues to pass the TPCDS plan stability tests in Scala v2.13

2021-05-20 Thread GitBox


HyukjinKwon commented on pull request #32577:
URL: https://github.com/apache/spark/pull/32577#issuecomment-845639671


   Nice, LGTM2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #32600: [SPARK-35456][CORE] Print the invalid value in config validation error message

2021-05-20 Thread GitBox


yaooqinn commented on a change in pull request #32600:
URL: https://github.com/apache/spark/pull/32600#discussion_r636621258



##
File path: 
core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
##
@@ -104,7 +104,7 @@ private[spark] class TypedConfigBuilder[T](
   /** Checks if the user-provided value for the config matches the validator. 
*/
   def checkValue(validator: T => Boolean, errorMsg: String): 
TypedConfigBuilder[T] = {
 transform { v =>
-  if (!validator(v)) throw new IllegalArgumentException(errorMsg)
+  if (!validator(v)) throw new IllegalArgumentException(s"'$v' is invalid 
because: $errorMsg")

Review comment:
   this is better. updated. thanks @HyukjinKwon 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32600: [SPARK-35456][CORE] Print the invalid value in config validation error message

2021-05-20 Thread GitBox


AmplabJenkins removed a comment on pull request #32600:
URL: https://github.com/apache/spark/pull/32600#issuecomment-845638217


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43305/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32600: [SPARK-35456][CORE] Print the invalid value in config validation error message

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #32600:
URL: https://github.com/apache/spark/pull/32600#issuecomment-845638217


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43305/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32600: [SPARK-35456][CORE] Print the invalid value in config validation error message

2021-05-20 Thread GitBox


SparkQA commented on pull request #32600:
URL: https://github.com/apache/spark/pull/32600#issuecomment-845638203


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43305/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr

2021-05-20 Thread GitBox


AmplabJenkins removed a comment on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-845634701


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138770/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr

2021-05-20 Thread GitBox


SparkQA removed a comment on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-845535109


   **[Test build #138770 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138770/testReport)**
 for PR 32586 at commit 
[`3819bf3`](https://github.com/apache/spark/commit/3819bf3e544a316234f94292a2acdb8aae1d9ab1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


AmplabJenkins removed a comment on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845635353


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138788/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32609: [SPARK-29223][SQL][SS] New option to specify timestamp on all subscribing topic-partitions in Kafka source

2021-05-20 Thread GitBox


AmplabJenkins removed a comment on pull request #32609:
URL: https://github.com/apache/spark/pull/32609#issuecomment-845634865


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138786/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


SparkQA removed a comment on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845635027


   **[Test build #138788 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138788/testReport)**
 for PR 32611 at commit 
[`e626c52`](https://github.com/apache/spark/commit/e626c52245521fdeb0ee46c68950605f87a987bd).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


AmplabJenkins removed a comment on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-845634702


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43302/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


AmplabJenkins removed a comment on pull request #32546:
URL: https://github.com/apache/spark/pull/32546#issuecomment-845634703


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43300/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32609: [SPARK-29223][SQL][SS] New option to specify timestamp on all subscribing topic-partitions in Kafka source

2021-05-20 Thread GitBox


SparkQA removed a comment on pull request #32609:
URL: https://github.com/apache/spark/pull/32609#issuecomment-845619416


   **[Test build #138786 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138786/testReport)**
 for PR 32609 at commit 
[`ec1f662`](https://github.com/apache/spark/commit/ec1f662234dc3e986ae460077e16a6c557f03cf7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845635353


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138788/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


SparkQA commented on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845635337


   **[Test build #138788 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138788/testReport)**
 for PR 32611 at commit 
[`e626c52`](https://github.com/apache/spark/commit/e626c52245521fdeb0ee46c68950605f87a987bd).
* This patch **fails Python style tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32587: [SPARK-35440][SQL] Add function type to `ExpressionInfo` for UDF

2021-05-20 Thread GitBox


SparkQA commented on pull request #32587:
URL: https://github.com/apache/spark/pull/32587#issuecomment-845635076


   **[Test build #138789 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138789/testReport)**
 for PR 32587 at commit 
[`f7d01f1`](https://github.com/apache/spark/commit/f7d01f113ae6b999bbda39a521b24dfe3747ebf8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31830: [SPARK-34735][SQL][UI] Add modified configs for SQL execution in UI

2021-05-20 Thread GitBox


SparkQA commented on pull request #31830:
URL: https://github.com/apache/spark/pull/31830#issuecomment-845635070


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43304/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


SparkQA commented on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845635027


   **[Test build #138788 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138788/testReport)**
 for PR 32611 at commit 
[`e626c52`](https://github.com/apache/spark/commit/e626c52245521fdeb0ee46c68950605f87a987bd).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32609: [SPARK-29223][SQL][SS] New option to specify timestamp on all subscribing topic-partitions in Kafka source

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #32609:
URL: https://github.com/apache/spark/pull/32609#issuecomment-845634865


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138786/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-845634702


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43302/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-845634701


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138770/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32609: [SPARK-29223][SQL][SS] New option to specify timestamp on all subscribing topic-partitions in Kafka source

2021-05-20 Thread GitBox


SparkQA commented on pull request #32609:
URL: https://github.com/apache/spark/pull/32609#issuecomment-845634723


   **[Test build #138786 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138786/testReport)**
 for PR 32609 at commit 
[`ec1f662`](https://github.com/apache/spark/commit/ec1f662234dc3e986ae460077e16a6c557f03cf7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #32546:
URL: https://github.com/apache/spark/pull/32546#issuecomment-845634703


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43300/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32610: [SPARK-35460][K8S] invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang

2021-05-20 Thread GitBox


dongjoon-hyun commented on a change in pull request #32610:
URL: https://github.com/apache/spark/pull/32610#discussion_r636616275



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##
@@ -250,11 +250,21 @@ private[spark] object Config extends Logging {
   .stringConf
   .createOptional
 
+  private val podConfValidator =
+
"^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$".r.pattern
+
   val KUBERNETES_EXECUTOR_POD_NAME_PREFIX =
 ConfigBuilder("spark.kubernetes.executor.podNamePrefix")
-  .doc("Prefix to use in front of the executor pod names.")
+  .doc("Prefix to use in front of the executor pod names. Note that pod 
names must consist" +
+" of lower case alphanumeric characters, '-' or '.', and must start 
and end with an" +
+" alphanumeric character (e.g. 'example.com', regex used for 
validation is:" +
+s" ${podConfValidator.toString}")

Review comment:
   +1 for following their implementation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


dongjoon-hyun edited a comment on pull request #32546:
URL: https://github.com/apache/spark/pull/32546#issuecomment-845632925


   Thank you, @itholic and @HyukjinKwon . The refactoring idea looks good to 
me. I commented only a technical issue about the link usage. I'll leave this to 
@HyukjinKwon .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


dongjoon-hyun commented on pull request #32546:
URL: https://github.com/apache/spark/pull/32546#issuecomment-845632925


   Thank you, @itholic and @HyukjinKwon . The refactoring idea looks good to 
me. I commented only a technical issue about the link usage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


dongjoon-hyun commented on a change in pull request #32546:
URL: https://github.com/apache/spark/pull/32546#discussion_r636615565



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -874,23 +874,10 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   /**
* Loads ORC files and returns the result as a `DataFrame`.
*
-   * You can set the following ORC-specific option(s) for reading ORC files:
-   * 
-   * `mergeSchema` (default is the value specified in 
`spark.sql.orc.mergeSchema`): sets whether
-   * we should merge schemas collected from all ORC part-files. This will 
override
-   * `spark.sql.orc.mergeSchema`.
-   * `pathGlobFilter`: an optional glob pattern to only include files with 
paths matching
-   * the pattern. The syntax follows 
org.apache.hadoop.fs.GlobFilter.
-   * It does not change the behavior of partition discovery.
-   * `modifiedBefore` (batch only): an optional timestamp to only include 
files with
-   * modification times  occurring before the specified Time. The provided 
timestamp
-   * must be in the following form: -MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)
-   * `modifiedAfter` (batch only): an optional timestamp to only include 
files with
-   * modification times occurring after the specified Time. The provided 
timestamp
-   * must be in the following form: -MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)
-   * `recursiveFileLookup`: recursively scan a directory for files. Using 
this option
-   * disables partition discovery
-   * 
+   * ORC-specific option(s) for reading ORC files can be found in
+   * https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-option;>

Review comment:
   Ditto.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


dongjoon-hyun commented on a change in pull request #32546:
URL: https://github.com/apache/spark/pull/32546#discussion_r636614962



##
File path: docs/sql-data-sources-orc.md
##
@@ -172,3 +172,29 @@ When reading from Hive metastore ORC tables and inserting 
to Hive metastore ORC
   2.0.0
   
 
+
+## Data Source Option
+
+Data source options of ORC can be set via:
+* the `.option`/`.options` methods of
+  *  `DataFrameReader`
+  *  `DataFrameWriter`
+  *  `DataStreamReader`
+  *  `DataStreamWriter`
+
+
+  Property 
NameDefaultMeaningScope
+  
+mergeSchema
+None
+sets whether we should merge schemas collected from all ORC 
part-files. This will override spark.sql.orc.mergeSchema. The 
default value is specified in spark.sql.orc.mergeSchema.
+read
+  
+  
+compression
+None
+compression codec to use when saving to file. This can be one of the 
known case-insensitive shorten names (none, snappy, zlib, lzo, and zstd). This 
will override orc.compress and 
spark.sql.orc.compression.codec. If None is set, it uses the value 
specified in spark.sql.orc.compression.codec.
+write
+  
+
+Other generic options can be found in https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html;>
 Generic File Source Options.

Review comment:
   Although I know that this is inherited, 
`https://spark.apache.org/docs/latest/` looks fragile to me because it is going 
to be a broken link when we cut `branch-3.2` on July 1st. In `branch-3.2`, it 
should point `3.2` document only. Shall we use a relative link instead of 
`/latest/`?
   
   Like this PR, we don't know what refactoring happens in the future.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr

2021-05-20 Thread GitBox


SparkQA commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-845632152


   **[Test build #138770 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138770/testReport)**
 for PR 32586 at commit 
[`3819bf3`](https://github.com/apache/spark/commit/3819bf3e544a316234f94292a2acdb8aae1d9ab1).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


dongjoon-hyun commented on a change in pull request #32546:
URL: https://github.com/apache/spark/pull/32546#discussion_r636615149



##
File path: python/pyspark/sql/readwriter.py
##
@@ -793,28 +793,13 @@ def orc(self, path, mergeSchema=None, 
pathGlobFilter=None, recursiveFileLookup=N
 Parameters
 --
 path : str or list
-mergeSchema : str or bool, optional
-sets whether we should merge schemas collected from all
-ORC part-files. This will override ``spark.sql.orc.mergeSchema``.
-The default value is specified in ``spark.sql.orc.mergeSchema``.
-pathGlobFilter : str or bool
-an optional glob pattern to only include files with paths matching
-the pattern. The syntax follows `org.apache.hadoop.fs.GlobFilter`.
-It does not change the behavior of
-`partition discovery 
`_.
  # noqa
-recursiveFileLookup : str or bool
-recursively scan a directory for files. Using this option
-disables
-`partition discovery 
`_.
  # noqa
 
-modification times occurring before the specified time. The 
provided timestamp
-must be in the following format: -MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)
-modifiedBefore : an optional timestamp to only include files with
-modification times occurring before the specified time. The 
provided timestamp
-must be in the following format: -MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)
-modifiedAfter : an optional timestamp to only include files with
-modification times occurring after the specified time. The 
provided timestamp
-must be in the following format: -MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)
+Other Parameters
+
+Extra options
+For the extra options, refer to
+`Data Source Option 
`_
  # noqa

Review comment:
   Ditto. Can we have a more robust link here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


dongjoon-hyun commented on a change in pull request #32546:
URL: https://github.com/apache/spark/pull/32546#discussion_r636614962



##
File path: docs/sql-data-sources-orc.md
##
@@ -172,3 +172,29 @@ When reading from Hive metastore ORC tables and inserting 
to Hive metastore ORC
   2.0.0
   
 
+
+## Data Source Option
+
+Data source options of ORC can be set via:
+* the `.option`/`.options` methods of
+  *  `DataFrameReader`
+  *  `DataFrameWriter`
+  *  `DataStreamReader`
+  *  `DataStreamWriter`
+
+
+  Property 
NameDefaultMeaningScope
+  
+mergeSchema
+None
+sets whether we should merge schemas collected from all ORC 
part-files. This will override spark.sql.orc.mergeSchema. The 
default value is specified in spark.sql.orc.mergeSchema.
+read
+  
+  
+compression
+None
+compression codec to use when saving to file. This can be one of the 
known case-insensitive shorten names (none, snappy, zlib, lzo, and zstd). This 
will override orc.compress and 
spark.sql.orc.compression.codec. If None is set, it uses the value 
specified in spark.sql.orc.compression.codec.
+write
+  
+
+Other generic options can be found in https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html;>
 Generic File Source Options.

Review comment:
   Although I know that this is inherited, 
`https://spark.apache.org/docs/latest/` looks fragile to me because it is going 
to be a broken link when we cut `branch-3.2` on July 1st. In `branch-3.2`, it 
should point `3.2` document only. Shall we use a relative link instead of 
`/latest/`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #32610: [SPARK-35460][K8S] invalid `spark.kubernetes.executor.podNamePrefix` causes app to hang

2021-05-20 Thread GitBox


yaooqinn commented on a change in pull request #32610:
URL: https://github.com/apache/spark/pull/32610#discussion_r636615055



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##
@@ -250,11 +250,21 @@ private[spark] object Config extends Logging {
   .stringConf
   .createOptional
 
+  private val podConfValidator =
+
"^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$".r.pattern
+
   val KUBERNETES_EXECUTOR_POD_NAME_PREFIX =
 ConfigBuilder("spark.kubernetes.executor.podNamePrefix")
-  .doc("Prefix to use in front of the executor pod names.")
+  .doc("Prefix to use in front of the executor pod names. Note that pod 
names must consist" +
+" of lower case alphanumeric characters, '-' or '.', and must start 
and end with an" +
+" alphanumeric character (e.g. 'example.com', regex used for 
validation is:" +
+s" ${podConfValidator.toString}")

Review comment:
   > Hi, @yaooqinn . Thank you, but this is still incomplete because we 
don't check contain at most 63 characters rule.
   yes we shall do the length checking too
   
   > If they change, this will become wrong again. So, we had better give the 
official reference because this is not defined by Apache Spark. We just follow 
K8s's rule. Technically, the official definition is RFC 1123 in K8s document, 
https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-subdomain-names,
 isn't it?
   
   Make sense. and at least  they seemed already change the error message :) 
https://github.com/kubernetes/kubernetes/pull/94182
   
   
   FYI, just found a `go` implementation for k8s API validation -  
https://github.com/kubernetes/apimachinery/blob/master/pkg/util/validation/validation.go,
 I think we can follow it in this PR and later for other resource checks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


dongjoon-hyun commented on a change in pull request #32546:
URL: https://github.com/apache/spark/pull/32546#discussion_r636614962



##
File path: docs/sql-data-sources-orc.md
##
@@ -172,3 +172,29 @@ When reading from Hive metastore ORC tables and inserting 
to Hive metastore ORC
   2.0.0
   
 
+
+## Data Source Option
+
+Data source options of ORC can be set via:
+* the `.option`/`.options` methods of
+  *  `DataFrameReader`
+  *  `DataFrameWriter`
+  *  `DataStreamReader`
+  *  `DataStreamWriter`
+
+
+  Property 
NameDefaultMeaningScope
+  
+mergeSchema
+None
+sets whether we should merge schemas collected from all ORC 
part-files. This will override spark.sql.orc.mergeSchema. The 
default value is specified in spark.sql.orc.mergeSchema.
+read
+  
+  
+compression
+None
+compression codec to use when saving to file. This can be one of the 
known case-insensitive shorten names (none, snappy, zlib, lzo, and zstd). This 
will override orc.compress and 
spark.sql.orc.compression.codec. If None is set, it uses the value 
specified in spark.sql.orc.compression.codec.
+write
+  
+
+Other generic options can be found in https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html;>
 Generic File Source Options.

Review comment:
   `https://spark.apache.org/docs/latest/` looks fragile to me because it 
is going to be a broken link when we cut `branch-3.2` on July 1st. In 
`branch-3.2`, it should point `3.2` document only. Shall we use a relative link 
instead of `/latest/`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


dongjoon-hyun commented on pull request #32546:
URL: https://github.com/apache/spark/pull/32546#issuecomment-845630052


   Thank you for pinging me, @HyukjinKwon .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


SparkQA commented on pull request #32546:
URL: https://github.com/apache/spark/pull/32546#issuecomment-845629618


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43300/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32589: [SPARK-35444][SQL] Imporve the logic of createTable if table already exist and ignoreIfExists=true

2021-05-20 Thread GitBox


HyukjinKwon commented on a change in pull request #32589:
URL: https://github.com/apache/spark/pull/32589#discussion_r636613685



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
##
@@ -367,6 +367,7 @@ class SessionCatalog(
   if (!ignoreIfExists) {
 throw new TableAlreadyExistsException(db = db, table = table)
   }
+  return

Review comment:
   Hm, this will disable external catalogs to handle on 
`ignoreIfExists=true`. Other external catalogs might have some logics on this 
call e.g.) sending a create table event.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


SparkQA commented on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-845629104


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43302/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-databricks commented on a change in pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


xinrong-databricks commented on a change in pull request #32611:
URL: https://github.com/apache/spark/pull/32611#discussion_r636607352



##
File path: python/pyspark/pandas/data_type_ops/num_ops.py
##
@@ -16,31 +16,46 @@
 #
 
 import numbers
-from typing import TYPE_CHECKING, Union
+from typing import Any, TYPE_CHECKING, Union
 
 import numpy as np
 from pandas.api.types import CategoricalDtype
 
 from pyspark.sql import Column, functions as F
 from pyspark.sql.types import (
+BooleanType,
 NumericType,
 StringType,
 TimestampType,
 )
 
 from pyspark.pandas.base import column_op, IndexOpsMixin, numpy_column_op
-from pyspark.pandas.data_type_ops.base import DataTypeOps
+from pyspark.pandas.data_type_ops.base import DataTypeOps, 
transform_boolean_operand_to_numeric
 from pyspark.pandas.spark import functions as SF
+from pyspark.sql.column import Column
+
 
 if TYPE_CHECKING:
 from pyspark.pandas.indexes import Index  # noqa: F401 (SPARK-34943)
 from pyspark.pandas.series import Series  # noqa: F401 (SPARK-34943)
 
 
+def is_valid_operand_for_numeric_arithmetic(operand: Any) -> bool:
+"""Check whether the operand is valid for arithmetic operations against 
numerics."""
+if isinstance(operand, numbers.Number):
+return True
+elif isinstance(operand, IndexOpsMixin):
+if isinstance(operand.dtype, CategoricalDtype):
+return False
+else:
+return isinstance(operand.spark.data_type, NumericType) or 
isinstance(
+operand.spark.data_type, BooleanType)
+else:
+return isinstance(operand, Column)

Review comment:
   Returning True when the operand is a Column to cover the cases as below:
   
   `left.astype("long") - F.lit(right).cast(as_spark_type("long"))` (when a 
datatime series - a datatime)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-20 Thread GitBox


SparkQA commented on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-845620740


   **[Test build #138787 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138787/testReport)**
 for PR 32161 at commit 
[`d6417a8`](https://github.com/apache/spark/commit/d6417a8124eb61390089313d108eff18fd89e412).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32236: [WIP][SPARK-35137][SQL] Revise outputpartitioning in some SparkPlan

2021-05-20 Thread GitBox


AmplabJenkins removed a comment on pull request #32236:
URL: https://github.com/apache/spark/pull/32236#issuecomment-845620125


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43301/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32236: [WIP][SPARK-35137][SQL] Revise outputpartitioning in some SparkPlan

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #32236:
URL: https://github.com/apache/spark/pull/32236#issuecomment-845620125


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43301/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32236: [WIP][SPARK-35137][SQL] Revise outputpartitioning in some SparkPlan

2021-05-20 Thread GitBox


SparkQA commented on pull request #32236:
URL: https://github.com/apache/spark/pull/32236#issuecomment-845620098


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43301/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-databricks commented on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


xinrong-databricks commented on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845619671


   @HyukjinKwon Certainly, examples are added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32609: [SPARK-29223][SQL][SS] New option to specify timestamp on all subscribing topic-partitions in Kafka source

2021-05-20 Thread GitBox


SparkQA commented on pull request #32609:
URL: https://github.com/apache/spark/pull/32609#issuecomment-845619416


   **[Test build #138786 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138786/testReport)**
 for PR 32609 at commit 
[`ec1f662`](https://github.com/apache/spark/commit/ec1f662234dc3e986ae460077e16a6c557f03cf7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


AmplabJenkins removed a comment on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845618643


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138785/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


SparkQA removed a comment on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845618215


   **[Test build #138785 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138785/testReport)**
 for PR 32611 at commit 
[`3fef44b`](https://github.com/apache/spark/commit/3fef44bea9b90463b01eda946939faddd88740a5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845618643


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138785/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


SparkQA commented on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845618630


   **[Test build #138785 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138785/testReport)**
 for PR 32611 at commit 
[`3fef44b`](https://github.com/apache/spark/commit/3fef44bea9b90463b01eda946939faddd88740a5).
* This patch **fails Python style tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


SparkQA commented on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845618215


   **[Test build #138785 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138785/testReport)**
 for PR 32611 at commit 
[`3fef44b`](https://github.com/apache/spark/commit/3fef44bea9b90463b01eda946939faddd88740a5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32236: [WIP][SPARK-35137][SQL] Revise outputpartitioning in some SparkPlan

2021-05-20 Thread GitBox


SparkQA removed a comment on pull request #32236:
URL: https://github.com/apache/spark/pull/32236#issuecomment-845596330


   **[Test build #138777 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138777/testReport)**
 for PR 32236 at commit 
[`a33157b`](https://github.com/apache/spark/commit/a33157b441b38a6ae697eb6c2a34bf4927e1b1e7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32236: [WIP][SPARK-35137][SQL] Revise outputpartitioning in some SparkPlan

2021-05-20 Thread GitBox


AmplabJenkins removed a comment on pull request #32236:
URL: https://github.com/apache/spark/pull/32236#issuecomment-845617484


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138777/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32236: [WIP][SPARK-35137][SQL] Revise outputpartitioning in some SparkPlan

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #32236:
URL: https://github.com/apache/spark/pull/32236#issuecomment-845617484


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138777/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


SparkQA commented on pull request #32546:
URL: https://github.com/apache/spark/pull/32546#issuecomment-845617446


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43300/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32236: [WIP][SPARK-35137][SQL] Revise outputpartitioning in some SparkPlan

2021-05-20 Thread GitBox


SparkQA commented on pull request #32236:
URL: https://github.com/apache/spark/pull/32236#issuecomment-845617297


   **[Test build #138777 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138777/testReport)**
 for PR 32236 at commit 
[`a33157b`](https://github.com/apache/spark/commit/a33157b441b38a6ae697eb6c2a34bf4927e1b1e7).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


SparkQA commented on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-845617127


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43302/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr

2021-05-20 Thread GitBox


AmplabJenkins removed a comment on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-845615196


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138767/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


AmplabJenkins removed a comment on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845615298


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43288/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32595: [SPARK-35449][SQL] Only extract common expressions from CaseWhen values if elseValue is set

2021-05-20 Thread GitBox


HyukjinKwon commented on pull request #32595:
URL: https://github.com/apache/spark/pull/32595#issuecomment-845616346


   @Kimahriman would you mind describing what behaviour change (bug fix) 
happens in "Does this PR introduce any user-facing change?"? The fix itself 
looks making sense but it would be great to clarify what bug is fixed by this 
too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-20 Thread GitBox


SparkQA commented on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-845616015


   **[Test build #138784 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138784/testReport)**
 for PR 32204 at commit 
[`a10586c`](https://github.com/apache/spark/commit/a10586c3d2887463de16984adb72d205f85f3796).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #32391: [SPARK-35264][SQL] Support AQE side broadcastJoin threshold

2021-05-20 Thread GitBox


cloud-fan commented on pull request #32391:
URL: https://github.com/apache/spark/pull/32391#issuecomment-845616002


   To add a bit more color: The static size estimation in Spark is usually 
underestimated, due to things like file compression. We can set the AQE 
broadcast threshold a bit higher as AQE size estimation is more precise.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32616: [SPARK-35454][SQL] One LogicalPlan can match multiple dataset ids

2021-05-20 Thread GitBox


SparkQA commented on pull request #32616:
URL: https://github.com/apache/spark/pull/32616#issuecomment-845615736


   **[Test build #138783 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138783/testReport)**
 for PR 32616 at commit 
[`423f2b5`](https://github.com/apache/spark/commit/423f2b5ee2cd63bc4d0853305f7219d7891a1848).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32611: [SPARK-35314][PYTHON] Support arithmetic operations against bool IndexOpsMixin

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #32611:
URL: https://github.com/apache/spark/pull/32611#issuecomment-845615298


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43288/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-845615196


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138767/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] weixiuli commented on pull request #32571: [SPARK-35424][SHUFFLE] Remove some useless code in the ExternalBlockHandler.

2021-05-20 Thread GitBox


weixiuli commented on pull request #32571:
URL: https://github.com/apache/spark/pull/32571#issuecomment-845613438


   Thank you so  much ! @HyukjinKwon  @srowen @mridulm


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #32616: [SPARK-35454][SQL] One LogicalPlan can match multiple dataset ids

2021-05-20 Thread GitBox


maropu commented on a change in pull request #32616:
URL: https://github.com/apache/spark/pull/32616#discussion_r636598360



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##
@@ -231,9 +231,10 @@ class Dataset[T] private[sql](
   case _ =>
 queryExecution.analyzed
 }
-if 
(sparkSession.sessionState.conf.getConf(SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED)
 &&
-plan.getTagValue(Dataset.DATASET_ID_TAG).isEmpty) {
-  plan.setTagValue(Dataset.DATASET_ID_TAG, id)
+if 
(sparkSession.sessionState.conf.getConf(SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED))
 {
+  val dsIds = plan.getTagValue(Dataset.DATASET_ID_TAG).getOrElse(new 
HashSet[Long])
+  dsIds.add(id)

Review comment:
   Q: Is there the possibility that the set will continue to increase in a 
long-running application?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr

2021-05-20 Thread GitBox


SparkQA removed a comment on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-845504266


   **[Test build #138767 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138767/testReport)**
 for PR 32586 at commit 
[`b517df8`](https://github.com/apache/spark/commit/b517df8f5d755000849d1a45e3ae3d3f5e0a51d6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr

2021-05-20 Thread GitBox


SparkQA commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-845610984


   **[Test build #138767 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138767/testReport)**
 for PR 32586 at commit 
[`b517df8`](https://github.com/apache/spark/commit/b517df8f5d755000849d1a45e3ae3d3f5e0a51d6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes

2021-05-20 Thread GitBox


itholic commented on pull request #32516:
URL: https://github.com/apache/spark/pull/32516#issuecomment-845603789


   Thanks for all the reviews :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on pull request #32391: [SPARK-35264][SQL] Support AQE side broadcastJoin threshold

2021-05-20 Thread GitBox


ulysses-you commented on pull request #32391:
URL: https://github.com/apache/spark/pull/32391#issuecomment-845603097


   @Gabriel39 I guess you misunderstand the logic of AQE.
   
   > AQE should not optimize it to other join type since static stats (e.g 
sizeInBytes) is always larger or equal the actual value
   
   That's wrong, AQE can never change a BHJ to other join strategy which is 
decided at normal planner side. It's not about the stats, you can see some key 
code in `LogicalQueryStageStrategy`.
   
   And this new config is assuming a join is not a BHJ before AQE, so that AQE 
can use the new config and runtime stats to make a join (mostly is SMJ) as BHJ.
   
   So, usually the right way of using this new config is 1) forbid the normal 
auto broadcast or reduce the value 2) tune the new config value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #32616: [SPARK-35454][SQL] One LogicalPlan can match multiple dataset ids

2021-05-20 Thread GitBox


Ngone51 commented on pull request #32616:
URL: https://github.com/apache/spark/pull/32616#issuecomment-845602413


   cc @cloud-fan @maropu Please take a look, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 opened a new pull request #32616: [SPARK-35454][SQL] One LogicalPlan can match multiple dataset ids

2021-05-20 Thread GitBox


Ngone51 opened a new pull request #32616:
URL: https://github.com/apache/spark/pull/32616


   
   
   ### What changes were proposed in this pull request?
   
   
   Change the type of `DATASET_ID_TAG` from `Long` to `HashSet[Long]` to allow 
the logical plan to match multiple datasets.
   
   
   
   ### Why are the changes needed?
   
   
   During the transformation from one Dataset to another Dataset, the 
DATASET_ID_TAG of logical plan won't change if the plan itself doesn't change:
   
   
https://github.com/apache/spark/blob/b5241c97b17a1139a4ff719bfce7f68aef094d95/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L234-L237
   
   However, dataset id always changes even if the logical plan doesn't change:
   
https://github.com/apache/spark/blob/b5241c97b17a1139a4ff719bfce7f68aef094d95/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L207-L208
   
   And this can lead to the mismatch between dataset's id and col's 
__dataset_id. E.g.,
   
   ```scala
 test("SPARK-28344: fail ambiguous self join - Dataset.colRegex as column 
ref") {
   // The test can fail if we change it to:
   // val df1 = spark.range(3).toDF()
   // val df2 = df1.filter($"id" > 0).toDF()
   val df1 = spark.range(3)
   val df2 = df1.filter($"id" > 0)
   
   withSQLConf(
 SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "true",
 SQLConf.CROSS_JOINS_ENABLED.key -> "true") {
 assertAmbiguousSelfJoin(df1.join(df2, df1.colRegex("id") > 
df2.colRegex("id")))
   }
 }
   ```
   
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No.
   
   ### How was this patch tested?
   
   
   
   Added unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-20 Thread GitBox


SparkQA removed a comment on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-845596403


   **[Test build #138779 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138779/testReport)**
 for PR 32161 at commit 
[`ead523d`](https://github.com/apache/spark/commit/ead523de8cc13c8165e98a1b240bf17d782a2b66).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-20 Thread GitBox


AmplabJenkins removed a comment on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-845597711


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138779/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-20 Thread GitBox


AmplabJenkins commented on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-845597711


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138779/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-20 Thread GitBox


SparkQA commented on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-845597698


   **[Test build #138779 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138779/testReport)**
 for PR 32161 at commit 
[`ead523d`](https://github.com/apache/spark/commit/ead523de8cc13c8165e98a1b240bf17d782a2b66).
* This patch **fails Python style tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >