date:20210531

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32725: [SPARK-33933][FOLLOW-UP][SQL] Fix a flaky test case in AdaptiveQueryExecSuite

2021-05-31 Thread GitBox



HyukjinKwon commented on a change in pull request #32725:
URL: https://github.com/apache/spark/pull/32725#discussion_r642801775



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##
@@ -1605,7 +1605,7 @@ class AdaptiveQueryExecSuite
   }
 
   test("SPARK-33933: Materialize BroadcastQueryStage first in AQE") {
-val testAppender = new LogAppender("aqe query stage materialization order 
test")
+val testAppender = new LogAppender("aqe query stage materialization order 
test", 1)

Review comment:
   cc Max @MaxGekk since this max is from Max 
   
(https://github.com/apache/spark/commit/88fc8dbc09c5d24ae89413ab1e1fbabdf1fd8028#diff-1c74a76903c7da8f8424992b46b2f99157609726bc580d60e2d0858ea11c2aecR197)
 :D




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun opened a new pull request #32727: [SPARK-35589][CORE] BlockManagerMasterEndpoint should not ignore index-only shuffle file during updating

2021-05-31 Thread GitBox



dongjoon-hyun opened a new pull request #32727:
URL: https://github.com/apache/spark/pull/32727


   …
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-05-31 Thread GitBox



SparkQA commented on pull request #32723:
URL: https://github.com/apache/spark/pull/32723#issuecomment-851839313


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43654/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32725: [SPARK-33933][FollowUp][SQL] Fix a flaky test case in AdaptiveQueryExecSuite

2021-05-31 Thread GitBox



SparkQA commented on pull request #32725:
URL: https://github.com/apache/spark/pull/32725#issuecomment-851839028


   **[Test build #139137 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139137/testReport)**
 for PR 32725 at commit 
[`f890295`](https://github.com/apache/spark/commit/f890295c38f5c85d8b2c145740279af96e1bfae7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32726: [SPARK-35587][PYTHON][DOCS] Initial porting of Koalas documentation

2021-05-31 Thread GitBox



SparkQA commented on pull request #32726:
URL: https://github.com/apache/spark/pull/32726#issuecomment-851838979


   **[Test build #139136 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139136/testReport)**
 for PR 32726 at commit 
[`262c5e1`](https://github.com/apache/spark/commit/262c5e1b68f229233c0867c003f14549a6cea6a1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter

2021-05-31 Thread GitBox



SparkQA commented on pull request #32724:
URL: https://github.com/apache/spark/pull/32724#issuecomment-851839022


   **[Test build #139138 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139138/testReport)**
 for PR 32724 at commit 
[`42a217e`](https://github.com/apache/spark/commit/42a217e7c97afc2b74016b9432843ea964f44c7d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #32726: [SPARK-35587][PYTHON][DOCS] Initial porting of Koalas documentation

2021-05-31 Thread GitBox



HyukjinKwon commented on pull request #32726:
URL: https://github.com/apache/spark/pull/32726#issuecomment-851837787


   cc @ueshin @xinrong-databricks @itholic FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon opened a new pull request #32726: [SPARK-35587][PYTHON][DOCS] Initial porting of Koalas documentation

2021-05-31 Thread GitBox



HyukjinKwon opened a new pull request #32726:
URL: https://github.com/apache/spark/pull/32726


   ### What changes were proposed in this pull request?
   
   This PR proposes to port Koalas documentation to PySpark documentation as 
its initial step.
   It ports almost as is except that the import was renamed from 
`databricks.koalas` to `pyspark.pandas`.
   
   ### Why are the changes needed?
   
   To document pandas APIs on Spark.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it adds new documentations.
   
   ### How was this patch tested?
   
   Manually built the docs and checked the output.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules

2021-05-31 Thread GitBox



AmplabJenkins removed a comment on pull request #32721:
URL: https://github.com/apache/spark/pull/32721#issuecomment-851836879


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43653/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox



AmplabJenkins removed a comment on pull request #32719:
URL: https://github.com/apache/spark/pull/32719#issuecomment-851836878


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139129/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox



AmplabJenkins commented on pull request #32719:
URL: https://github.com/apache/spark/pull/32719#issuecomment-851836878


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139129/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules

2021-05-31 Thread GitBox



AmplabJenkins commented on pull request #32721:
URL: https://github.com/apache/spark/pull/32721#issuecomment-851836879


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43653/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang opened a new pull request #32725: [SPARK-33933][FollowUp][SQL]

2021-05-31 Thread GitBox



gengliangwang opened a new pull request #32725:
URL: https://github.com/apache/spark/pull/32725


   
   
   ### What changes were proposed in this pull request?
   
   Fix a flaky test case in AdaptiveQueryExecSuite
   
   ### Why are the changes needed?
   
   The test case becomes flaky since there are too many debug logs:
   https://github.com/Yikun/spark/runs/2715222392?check_suite_focus=true
   
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139125/testReport/
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   ### How was this patch tested?
   
   Unit test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser

2021-05-31 Thread GitBox



MaxGekk commented on pull request #32506:
URL: https://github.com/apache/spark/pull/32506#issuecomment-851835967


   cc @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter

2021-05-31 Thread GitBox



ulysses-you commented on pull request #32724:
URL: https://github.com/apache/spark/pull/32724#issuecomment-851834150


   cc @maropu @cloud-fan @yaooqinn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on a change in pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter

2021-05-31 Thread GitBox



ulysses-you commented on a change in pull request #32724:
URL: https://github.com/apache/spark/pull/32724#discussion_r642797088



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##
@@ -1711,7 +1711,7 @@ object DecimalAggregates extends Rule[LogicalPlan] {
  * Converts local operations (i.e. ones that don't require data exchange) on 
`LocalRelation` to
  * another `LocalRelation`.
  */
-object ConvertToLocalRelation extends Rule[LogicalPlan] {
+trait ConvertToLocalRelationBase extends Rule[LogicalPlan] {

Review comment:
   this is for isolation between normal and AQE optimizer




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32722: [SPARK-35586][[K8S][TESTS] Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests

2021-05-31 Thread GitBox



SparkQA commented on pull request #32722:
URL: https://github.com/apache/spark/pull/32722#issuecomment-851832519


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43652/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox



yaooqinn commented on a change in pull request #32718:
URL: https://github.com/apache/spark/pull/32718#discussion_r642796052



##
File path: sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala
##
@@ -40,6 +40,12 @@ class MiscFunctionsSuite extends QueryTest with 
SharedSparkSession {
   Row(SPARK_VERSION_SHORT + " " + SPARK_REVISION))
 assert(df.schema.fieldNames === Seq("version()"))
   }
+
+  test("get current_user and session_user in normal spark apps") {

Review comment:
   my bad, I forgot it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules

2021-05-31 Thread GitBox



SparkQA commented on pull request #32721:
URL: https://github.com/apache/spark/pull/32721#issuecomment-851830047


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43653/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only

2021-05-31 Thread GitBox



MaxGekk commented on a change in pull request #32714:
URL: https://github.com/apache/spark/pull/32714#discussion_r642795083



##
File path: docs/sql-migration-guide.md
##
@@ -91,6 +91,8 @@ license: |
 
   - In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will 
throw `AnalysisException`. To restore the behavior before Spark 3.2, you can 
set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`.
 
+  - In Spark 3.2, the special datetime values such as `epoch`, `today`, 
`yesterday`, `tomorrow` and `now` are supported in typed literals only, for 
instance `select timestamp'now'`. In Spark 3.1 and earlier, such special values 
are supported in any casts of strings to dates/timestamps. To restore the 
behavior before Spark 3.2, you should preprocess string columns and convert the 
strings to desired timestamps explicitly using UDF for instance.

Review comment:
   @yaooqinn What do you mean by: for instance **(add',')** `select 
timestamp'now'`. I didn't get the problem. BTW, you could use the suggestion 
feature, so, I would just commit your suggestions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you opened a new pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter

2021-05-31 Thread GitBox



ulysses-you opened a new pull request #32724:
URL: https://github.com/apache/spark/pull/32724


   
   
   ### What changes were proposed in this pull request?
   
   Add rule `ConvertToLocalRelation` into AQE Optimizer.
   
   ### Why are the changes needed?
   
   Support propagate empty local relation through project and filter like such 
SQL case:
   ```
   Aggregate
 Project
   Join
 ShuffleStage
 ShuffleStage
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Add test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #32653: [SPARK-35312][SS] Introduce new Option in Kafka source to specify minimum number of records to read per trigger

2021-05-31 Thread GitBox



HeartSaVioR commented on a change in pull request #32653:
URL: https://github.com/apache/spark/pull/32653#discussion_r642792749



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/CompositeReadLimit.java
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read.streaming;
+
+import org.apache.spark.annotation.Evolving;
+
+import java.util.Objects;
+
+/**
+ /**
+ * Represents a {@link ReadLimit} where the {@link MicroBatchStream} should 
scan approximately
+ * given maximum number of rows with at least the given minimum number of rows.
+ *
+ * @see SupportsAdmissionControl#latestOffset(Offset, ReadLimit)
+ * @since 3.1.2
+ */
+@Evolving
+public final class CompositeReadLimit implements ReadLimit {

Review comment:
   Looks to be addressed. Thanks :) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox



SparkQA removed a comment on pull request #32719:
URL: https://github.com/apache/spark/pull/32719#issuecomment-851769807


   **[Test build #139129 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139129/testReport)**
 for PR 32719 at commit 
[`941ee9c`](https://github.com/apache/spark/commit/941ee9c1d04f9951598ed8bfb93b5bdaa2819e18).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox



SparkQA commented on pull request #32719:
URL: https://github.com/apache/spark/pull/32719#issuecomment-851821942


   **[Test build #139129 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139129/testReport)**
 for PR 32719 at commit 
[`941ee9c`](https://github.com/apache/spark/commit/941ee9c1d04f9951598ed8bfb93b5bdaa2819e18).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-05-31 Thread GitBox



HyukjinKwon commented on a change in pull request #32723:
URL: https://github.com/apache/spark/pull/32723#discussion_r642788618



##
File path: python/pyspark/sql/readwriter.py
##
@@ -627,8 +627,6 @@ def jdbc(self, url, table, column=None, lowerBound=None, 
upperBound=None, numPar
 
 Parameters
 --
-url : str
-a JDBC URL of the form ``jdbc:subprotocol:subname``
 table : str
 the name of the table
 column : str, optional

Review comment:
   I think we can remove `lowerBound`, `upperBound`, and `numPartitions`.
   And, fix the description of `column` to something like:
   
   Alias of `partitionColumn` option. Refer to `partitionColumn` in `Data 
Source Option <...>`_ in the version you use.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules

2021-05-31 Thread GitBox



SparkQA commented on pull request #32721:
URL: https://github.com/apache/spark/pull/32721#issuecomment-851806715


   **[Test build #139135 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139135/testReport)**
 for PR 32721 at commit 
[`864ee6f`](https://github.com/apache/spark/commit/864ee6fe62400c73ed973c63e67576075c89fa4a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules

2021-05-31 Thread GitBox



SparkQA removed a comment on pull request #32721:
URL: https://github.com/apache/spark/pull/32721#issuecomment-851803742


   **[Test build #139133 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139133/testReport)**
 for PR 32721 at commit 
[`d95d332`](https://github.com/apache/spark/commit/d95d3322b0755f065a76aa094bf384dcaa5dec4c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules

2021-05-31 Thread GitBox



AmplabJenkins removed a comment on pull request #32721:
URL: https://github.com/apache/spark/pull/32721#issuecomment-851806052


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139133/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules

2021-05-31 Thread GitBox



AmplabJenkins commented on pull request #32721:
URL: https://github.com/apache/spark/pull/32721#issuecomment-851806052


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139133/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules

2021-05-31 Thread GitBox



SparkQA commented on pull request #32721:
URL: https://github.com/apache/spark/pull/32721#issuecomment-851806030


   **[Test build #139133 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139133/testReport)**
 for PR 32721 at commit 
[`d95d332`](https://github.com/apache/spark/commit/d95d3322b0755f065a76aa094bf384dcaa5dec4c).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-05-31 Thread GitBox



SparkQA commented on pull request #32723:
URL: https://github.com/apache/spark/pull/32723#issuecomment-851805108


   **[Test build #139134 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139134/testReport)**
 for PR 32723 at commit 
[`5853560`](https://github.com/apache/spark/commit/585356099656c40b43a34cd939d66a6a0fdf9305).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic opened a new pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-05-31 Thread GitBox



itholic opened a new pull request #32723:
URL: https://github.com/apache/spark/pull/32723


   ### What changes were proposed in this pull request?
   
   This PR proposes move missing JDBC data source options from Python, Scala 
and Java into a single page.
   
   ### Why are the changes needed?
   
   So far, the documentation for JDBC data source options is separated into 
different pages for each language API documents. However, this makes managing 
many options inconvenient, so it is efficient to manage all options in a single 
page and provide a link to that page in the API of each language.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the documents will be shown below after this change:
   
   - "JDBC To Other Databases" page
   https://user-images.githubusercontent.com/44108233/120267176-66505100-c2de-11eb-9a03-df027c27fdd2.png;>
   
   
   - Python
   https://user-images.githubusercontent.com/44108233/120267196-71a37c80-c2de-11eb-8909-f41cf3ebd470.png;>
   
   
   - Scala
   https://user-images.githubusercontent.com/44108233/120268675-38204080-c2e1-11eb-94d3-858131799a6b.png;>
   
   
   - Java
   https://user-images.githubusercontent.com/44108233/120268683-3c4c5e00-c2e1-11eb-9f5a-b95b952bf87a.png;>
   
   
   
   ### How was this patch tested?
   
   Manually build docs and confirm the page.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules

2021-05-31 Thread GitBox



SparkQA commented on pull request #32721:
URL: https://github.com/apache/spark/pull/32721#issuecomment-851803742


   **[Test build #139133 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139133/testReport)**
 for PR 32721 at commit 
[`d95d332`](https://github.com/apache/spark/commit/d95d3322b0755f065a76aa094bf384dcaa5dec4c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32722: [SPARK-35586][[K8S][TESTS] Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests

2021-05-31 Thread GitBox



SparkQA commented on pull request #32722:
URL: https://github.com/apache/spark/pull/32722#issuecomment-851803720


   **[Test build #139132 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139132/testReport)**
 for PR 32722 at commit 
[`f2d9f30`](https://github.com/apache/spark/commit/f2d9f30e2f84fcc3fd692daf31934b568134a56c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox



AmplabJenkins removed a comment on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-851802395


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43647/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox



AmplabJenkins removed a comment on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851802400


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139125/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command

2021-05-31 Thread GitBox



AmplabJenkins removed a comment on pull request #32720:
URL: https://github.com/apache/spark/pull/32720#issuecomment-851802399


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43651/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close

2021-05-31 Thread GitBox



AmplabJenkins removed a comment on pull request #32693:
URL: https://github.com/apache/spark/pull/32693#issuecomment-851802397


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43650/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser

2021-05-31 Thread GitBox



AmplabJenkins removed a comment on pull request #32506:
URL: https://github.com/apache/spark/pull/32506#issuecomment-851802398


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43648/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox



AmplabJenkins commented on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851802400


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139125/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command

2021-05-31 Thread GitBox



AmplabJenkins commented on pull request #32720:
URL: https://github.com/apache/spark/pull/32720#issuecomment-851802399


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43651/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox



AmplabJenkins commented on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-851802395


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43647/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close

2021-05-31 Thread GitBox



AmplabJenkins commented on pull request #32693:
URL: https://github.com/apache/spark/pull/32693#issuecomment-851802397


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43650/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser

2021-05-31 Thread GitBox



AmplabJenkins commented on pull request #32506:
URL: https://github.com/apache/spark/pull/32506#issuecomment-851802398


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43648/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #32702: [SPARK-35565][SS] Add config for ignoring metadata directory of FileStreamSink

2021-05-31 Thread GitBox



viirya commented on pull request #32702:
URL: https://github.com/apache/spark/pull/32702#issuecomment-851802006


   Okay, sounds good. Let me change to using a source option.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command

2021-05-31 Thread GitBox



SparkQA commented on pull request #32720:
URL: https://github.com/apache/spark/pull/32720#issuecomment-851801961


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43651/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sarutak opened a new pull request #32722: [SPARK-35586][[K8S][TESTS] Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests

2021-05-31 Thread GitBox



sarutak opened a new pull request #32722:
URL: https://github.com/apache/spark/pull/32722


   ### What changes were proposed in this pull request?
   
   This PR set a default value for `spark.kubernetes.test.sparkTgz` in 
`kubernetes/integration-tests/pom.xml` for Kubernetes integration tests.
   
   ### Why are the changes needed?
   
   In the current master, running the integration tests with the following 
command will fail because there is no default value set for the property.
   ```
   build/mvn -Dspark.kubernetes.test.namespace=default -Pkubernetes 
-Pkubernetes-integration-tests -Psparkr  -pl 
resource-managers/kubernetes/integration-tests integration-test
   ```
   ```
   + mkdir -p 
/home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked
   + tar -xzvf --test-exclude-tags --strip-components=1 -C 
/home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked
   tar (child): --test-exclude-tags: Cannot open: No such file or directory
   tar (child): Error is not recoverable: exiting now
   tar: Child returned status 2
   tar: Error is not recoverable: exiting now
   [ERROR] Command execution failed.
   ```
   
   According to `setup-integration-test-env.sh`, `N/A` is intended as the 
default value so this PR choose it.
   ```
   SPARK_TGZ="N/A"
   MVN="$TEST_ROOT_DIR/build/mvn"
   EXCLUDE_TAGS=""
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Build and tests successfully finish with the command shown above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close

2021-05-31 Thread GitBox



SparkQA commented on pull request #32693:
URL: https://github.com/apache/spark/pull/32693#issuecomment-851797476


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43650/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sigmod opened a new pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules

2021-05-31 Thread GitBox



sigmod opened a new pull request #32721:
URL: https://github.com/apache/spark/pull/32721


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser

2021-05-31 Thread GitBox



SparkQA commented on pull request #32506:
URL: https://github.com/apache/spark/pull/32506#issuecomment-851795183


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43648/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox



SparkQA removed a comment on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851734216


   **[Test build #139125 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139125/testReport)**
 for PR 32686 at commit 
[`8252a6a`](https://github.com/apache/spark/commit/8252a6a93a05c97ed47e3174be76fe1aeb3f6567).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox



SparkQA commented on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851794843


   **[Test build #139125 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139125/testReport)**
 for PR 32686 at commit 
[`8252a6a`](https://github.com/apache/spark/commit/8252a6a93a05c97ed47e3174be76fe1aeb3f6567).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox



SparkQA commented on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-851792608


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43647/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command

2021-05-31 Thread GitBox



SparkQA commented on pull request #32720:
URL: https://github.com/apache/spark/pull/32720#issuecomment-851790068


   **[Test build #139131 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139131/testReport)**
 for PR 32720 at commit 
[`66536fb`](https://github.com/apache/spark/commit/66536fb5b2d8f1499bd4bdb5a9a31435f637bab8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on pull request #32712: [SPARK-35576][SQL] Redact the sensitive info in the result of Set command

2021-05-31 Thread GitBox



gengliangwang commented on pull request #32712:
URL: https://github.com/apache/spark/pull/32712#issuecomment-851789021


   @dongjoon-hyun Thanks for merging. I have opened a cherry-pick PR in 
https://github.com/apache/spark/pull/32720


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang opened a new pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command

2021-05-31 Thread GitBox



gengliangwang opened a new pull request #32720:
URL: https://github.com/apache/spark/pull/32720


   
   
   ### What changes were proposed in this pull request?
   
   Currently, the results of following SQL queries are not redacted:
   ```
   SET [KEY];
   SET;
   ```
   For example:
   
   ```
   scala> spark.sql("set javax.jdo.option.ConnectionPassword=123456").show()
   ++--+
   | key| value|
   ++--+
   |javax.jdo.option|123456|
   ++--+
   
   scala> spark.sql("set javax.jdo.option.ConnectionPassword").show()
   ++--+
   | key| value|
   ++--+
   |javax.jdo.option|123456|
   ++--+
   
   scala> spark.sql("set").show()
   +++
   | key|   value|
   +++
   |javax.jdo.option|  123456|
   
   ```
   
   We should hide the sensitive information and redact the query output.
   
   ### Why are the changes needed?
   
   Security.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the sensitive information in the output of Set commands are redacted
   
   
   ### How was this patch tested?
   
   Unit test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+

2021-05-31 Thread GitBox



viirya commented on pull request #32709:
URL: https://github.com/apache/spark/pull/32709#issuecomment-851788514


   Cool! Thanks @HyukjinKwon!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #32653: [SPARK-35312][SS] Introduce new Option in Kafka source to specify minimum number of records to read per trigger

2021-05-31 Thread GitBox



HeartSaVioR commented on a change in pull request #32653:
URL: https://github.com/apache/spark/pull/32653#discussion_r642765673



##
File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
##
@@ -139,26 +156,78 @@ private[kafka010] class KafkaSource(
   override def latestOffset(startOffset: streaming.Offset, limit: ReadLimit): 
streaming.Offset = {
 // Make sure initialPartitionOffsets is initialized
 initialPartitionOffsets
-
-val latest = kafkaReader.fetchLatestOffsets(
-  currentPartitionOffsets.orElse(Some(initialPartitionOffsets)))
+val currentOffsets = 
currentPartitionOffsets.orElse(Some(initialPartitionOffsets))
+val latest = kafkaReader.fetchLatestOffsets(currentOffsets)
+var skipBatch = false

Review comment:
   Same here as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #32653: [SPARK-35312][SS] Introduce new Option in Kafka source to specify minimum number of records to read per trigger

2021-05-31 Thread GitBox



HeartSaVioR commented on a change in pull request #32653:
URL: https://github.com/apache/spark/pull/32653#discussion_r642765440



##
File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala
##
@@ -95,15 +114,62 @@ private[kafka010] class KafkaMicroBatchStream(
   override def latestOffset(start: Offset, readLimit: ReadLimit): Offset = {
 val startPartitionOffsets = 
start.asInstanceOf[KafkaSourceOffset].partitionToOffsets
 latestPartitionOffsets = 
kafkaOffsetReader.fetchLatestOffsets(Some(startPartitionOffsets))
+var skipBatch = false

Review comment:
   Now I see duplicated codes around due to branches handling each type, 
including CompositeReadLimit which handles both lower and upper hence having 
same code.
   
   How about changing like below:
   
   ```
   val limits: Seq[ReadLimit] = readLimit match {
 case rows: CompositeReadLimit => rows.getReadLimits
 case rows => Seq(rows)
   }
   
   val offsets = if (limits.exists(_.isInstanceOf[ReadAllAvailable])) {
 // ReadAllAvailable has the highest priority
 latestPartitionOffsets
   } else {
 val lowerLimit = 
limits.find(_.isInstanceOf[ReadMinRows]).map(_.asInstanceOf[ReadMinRows])
 val upperLimit = 
limits.find(_.isInstanceOf[ReadMaxRows]).map(_.asInstanceOf[ReadMaxRows])
   
 lowerLimit.flatMap { limit =>
   // checking if we need to skip batch based on minOffsetPerTrigger 
criteria
   val skipBatch = delayBatch(
 limit.minRows, latestPartitionOffsets, startPartitionOffsets, 
limit.maxTriggerDelayMs)
   if (skipBatch) {
 logDebug(
   s"Delaying batch as number of records available is less than 
minOffsetsPerTrigger")
 Some(startPartitionOffsets)
   } else {
 None
   }
 }.orElse {
   // checking if we need to adjust a range of offsets based on 
maxOffsetPerTrigger criteria
   upperLimit.map { limit =>
 rateLimit(limit.maxRows(), startPartitionOffsets, 
latestPartitionOffsets)
   }
 }.getOrElse(latestPartitionOffsets)
   }
   
   endPartitionOffsets = KafkaSourceOffset(offsets)
   endPartitionOffsets
   ```
   
   this would require less change when we want to add more read limits in the 
future.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close

2021-05-31 Thread GitBox



SparkQA commented on pull request #32693:
URL: https://github.com/apache/spark/pull/32693#issuecomment-851785773


   **[Test build #139130 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139130/testReport)**
 for PR 32693 at commit 
[`698bea5`](https://github.com/apache/spark/commit/698bea5d49986f955c0736bff59ceb0c7c6051e8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox



AmplabJenkins removed a comment on pull request #32658:
URL: https://github.com/apache/spark/pull/32658#issuecomment-851784991


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43646/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox



AmplabJenkins commented on pull request #32719:
URL: https://github.com/apache/spark/pull/32719#issuecomment-851784992


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43649/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox



AmplabJenkins commented on pull request #32658:
URL: https://github.com/apache/spark/pull/32658#issuecomment-851784991


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43646/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox



AmplabJenkins removed a comment on pull request #32719:
URL: https://github.com/apache/spark/pull/32719#issuecomment-851784992


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43649/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser

2021-05-31 Thread GitBox



SparkQA commented on pull request #32506:
URL: https://github.com/apache/spark/pull/32506#issuecomment-851784737


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43648/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox



SparkQA commented on pull request #32719:
URL: https://github.com/apache/spark/pull/32719#issuecomment-851784608


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43649/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox



SparkQA commented on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-851782661


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43647/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang closed pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox



gengliangwang closed pull request #32686:
URL: https://github.com/apache/spark/pull/32686


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox



gengliangwang commented on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851781327


   Thanks, merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox



HyukjinKwon commented on a change in pull request #32718:
URL: https://github.com/apache/spark/pull/32718#discussion_r642758972



##
File path: sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala
##
@@ -40,6 +40,12 @@ class MiscFunctionsSuite extends QueryTest with 
SharedSparkSession {
   Row(SPARK_VERSION_SHORT + " " + SPARK_REVISION))
 assert(df.schema.fieldNames === Seq("version()"))
   }
+
+  test("get current_user and session_user in normal spark apps") {

Review comment:
   shall we add the JIRA prefix?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+

2021-05-31 Thread GitBox



HyukjinKwon commented on pull request #32709:
URL: https://github.com/apache/spark/pull/32709#issuecomment-851778790


   CRAN was my env issue. Now the tests and CRAN check should work with R 4.1+ 
too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only

2021-05-31 Thread GitBox



yaooqinn commented on a change in pull request #32714:
URL: https://github.com/apache/spark/pull/32714#discussion_r642757369



##
File path: docs/sql-migration-guide.md
##
@@ -91,6 +91,8 @@ license: |
 
   - In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will 
throw `AnalysisException`. To restore the behavior before Spark 3.2, you can 
set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`.
 
+  - In Spark 3.2, the special datetime values such as `epoch`, `today`, 
`yesterday`, `tomorrow` and `now` are supported in typed literals only, for 
instance `select timestamp'now'`. In Spark 3.1 and earlier, such special values 
are supported in any casts of strings to dates/timestamps. To restore the 
behavior before Spark 3.2, you should preprocess string columns and convert the 
strings to desired timestamps explicitly using UDF for instance.

Review comment:
   In Spark 3.2, ~the~ special datetime values. in typed literals only, 
for instance **(add',')** `select timestamp'now'`. In Spark 3.1 and ~earlier~ 
(3.0?)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer

2021-05-31 Thread GitBox



ulysses-you commented on a change in pull request #32602:
URL: https://github.com/apache/spark/pull/32602#discussion_r642757227



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala
##
@@ -27,7 +28,9 @@ import org.apache.spark.util.Utils
  */
 class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] {
   private val defaultBatches = Seq(
-Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin),
+Batch("Propagate Empty Relations", Once,
+  AQEPropagateEmptyRelation,
+  UpdateAttributeNullability),

Review comment:
   ah I see, will do this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox



SparkQA commented on pull request #32658:
URL: https://github.com/apache/spark/pull/32658#issuecomment-851775047


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43646/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #32719: [SPARK-34059][TESTS]Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox



HyukjinKwon commented on pull request #32719:
URL: https://github.com/apache/spark/pull/32719#issuecomment-851771579


   seems like the JIRA number is wrong in the title


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32719: [SPARK-34059][TESTS]Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox



SparkQA commented on pull request #32719:
URL: https://github.com/apache/spark/pull/32719#issuecomment-851769807


   **[Test build #139129 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139129/testReport)**
 for PR 32719 at commit 
[`941ee9c`](https://github.com/apache/spark/commit/941ee9c1d04f9951598ed8bfb93b5bdaa2819e18).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Yikun opened a new pull request #32719: [SPARK-34059][TESTS]Increase the timeout in FallbackStorageSuite

2021-05-31 Thread GitBox



Yikun opened a new pull request #32719:
URL: https://github.com/apache/spark/pull/32719


   ### What changes were proposed in this pull request?
   ```
   - Upload multi stages *** FAILED ***
   {{ The code passed to eventually never returned normally. Attempted 20 times 
over 10.011176743 seconds. Last failure message: fallbackStorage.exists(0, 
file) was false. (FallbackStorageSuite.scala:243)}}
   ```
   The error like above was raised in aarch64 randomly and also in github 
action test[1][2].
   
   [1] https://github.com/apache/spark/actions/runs/489319612
   [2]https://github.com/apache/spark/actions/runs/479317320
   
   ### Why are the changes needed?
   timeout is too short, need to increase to let test case complete.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   build/mvn test -Dtest=none 
-DwildcardSuites=org.apache.spark.storage.FallbackStorageSuite -pl 
:spark-core_2.12


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer

2021-05-31 Thread GitBox



cloud-fan commented on a change in pull request #32602:
URL: https://github.com/apache/spark/pull/32602#discussion_r642749109



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala
##
@@ -27,7 +28,9 @@ import org.apache.spark.util.Utils
  */
 class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] {
   private val defaultBatches = Seq(
-Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin),
+Batch("Propagate Empty Relations", Once,
+  AQEPropagateEmptyRelation,
+  UpdateAttributeNullability),

Review comment:
   It's a bit different:
   ```
   Project
 Shuffle Stage
   ```
   For the above case, we don't want to optimize it as the benefit is too small 
(removing a shuffle stage may cause regression)
   
   ```
   Project
 Sort
   Shuffle Stage
   ```
   For the above case, we will optimize Sort -> Shuffle Stage to empty relation 
first. Then it makes sense to optimize further and optimize out project, as the 
shuffle stage is already gone.
   
   So adding `ConvertToLocalRelation` looks the best solution here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser

2021-05-31 Thread GitBox



SparkQA commented on pull request #32506:
URL: https://github.com/apache/spark/pull/32506#issuecomment-851767556


   **[Test build #139128 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139128/testReport)**
 for PR 32506 at commit 
[`a361275`](https://github.com/apache/spark/commit/a36127512f4f5eadd9f0b9c9f9b0c3ef90b155e3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox



SparkQA commented on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-851767500


   **[Test build #139127 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139127/testReport)**
 for PR 32718 at commit 
[`ae337c1`](https://github.com/apache/spark/commit/ae337c13b7648c2011976eb8bef4fd8e67fcf44d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer

2021-05-31 Thread GitBox



cloud-fan commented on a change in pull request #32602:
URL: https://github.com/apache/spark/pull/32602#discussion_r642749109



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala
##
@@ -27,7 +28,9 @@ import org.apache.spark.util.Utils
  */
 class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] {
   private val defaultBatches = Seq(
-Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin),
+Batch("Propagate Empty Relations", Once,
+  AQEPropagateEmptyRelation,
+  UpdateAttributeNullability),

Review comment:
   It's a bit different:
   ```
   Project
 Shuffle Stage
   ```
   For the above case, we don't want to optimize it as the benefit is too small
   
   ```
   Project
 Sort
   Shuffle Stage
   ```
   For the above case, we will optimize Sort -> Shuffle Stage to empty relation 
first. Then it makes sense to optimize further and optimize out project, as the 
shuffle stage is already gone.
   
   So adding `ConvertToLocalRelation` looks the best solution here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox



yaooqinn commented on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-851766836


   cc @cloud-fan @wangyum @maropu thanks very much


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn opened a new pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions

2021-05-31 Thread GitBox



yaooqinn opened a new pull request #32718:
URL: https://github.com/apache/spark/pull/32718


   ### What changes were proposed in this pull request?
   
   Currently, we do not have a suitable definition of the `user` concept in 
Spark. We only have a `sparkUser` app widely but do not support identifier or 
retrieve the user information from a session in STS or a runtime query 
execution.
   
   These SQL functions are very popular and supported by plenty of other modern 
or old school databases, and also compliance.
   
   This PR add `current_user()` and `session_user()` as SQL functions. And, 
they are the same.  In this PR, we add these functions w/o ambiguity.
   1. For a normal single-threaded Spark application, clearly the `sparkUser` 
is always equivalent to `current_user()` and `session_user()`. 
   2. For a multi-threaded Spark application, e.g. Spark thrift server, we use 
a `ThreadLocal` variable to store the client-side user(after authenticated) 
before running the query and retrieve it in the parser.
   
   ### Why are the changes needed?
   
   These SQL functions are very popular and supported by plenty of other modern 
or old school databases, and also compliance.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   yes, added  `current_user()` and `session_user()` as SQL functions
   ### How was this patch tested?
   
   
   new tests in thrift server and sql/catalyst


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer

2021-05-31 Thread GitBox



ulysses-you commented on a change in pull request #32602:
URL: https://github.com/apache/spark/pull/32602#discussion_r642747242



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala
##
@@ -27,7 +28,9 @@ import org.apache.spark.util.Utils
  */
 class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] {
   private val defaultBatches = Seq(
-Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin),
+Batch("Propagate Empty Relations", Once,
+  AQEPropagateEmptyRelation,
+  UpdateAttributeNullability),

Review comment:
   yeah, I noticed it. We can put it so that we can propagate empty through 
`project/filter`. like such case:
   ```
   Aggregate
 Project
   Join
 Shuffle
   ```
   But it need to isolate normal and AQE due to `transformWithPruning`.
   
   Otherhand I feel that it's similar if we just let 
`AQEPropagateEmptyRelation` support propagate `project/filter`. and the later 
is simpler. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox



SparkQA commented on pull request #32658:
URL: https://github.com/apache/spark/pull/32658#issuecomment-851763525


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43646/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #32715: [SPARK-35577][TESTS] Allow to log container output for docker integration tests

2021-05-31 Thread GitBox



HyukjinKwon commented on pull request #32715:
URL: https://github.com/apache/spark/pull/32715#issuecomment-851751136


   Looks fine. cc @maropu


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox



HyukjinKwon closed pull request #32658:
URL: https://github.com/apache/spark/pull/32658


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox



HyukjinKwon commented on pull request #32658:
URL: https://github.com/apache/spark/pull/32658#issuecomment-851750789


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #32558: [SPARK-34953][CORE][SQL] Add the code change for adding the DateType in the infer schema while reading in CSV and JSON

2021-05-31 Thread GitBox



HyukjinKwon commented on pull request #32558:
URL: https://github.com/apache/spark/pull/32558#issuecomment-851750660


   Oh I meant this: 
https://github.com/apache/spark/blob/master/python/pyspark/sql/readwriter.py#L342-L350
   These options are listed up as a parameter in Python side specifically. For 
CSV documentation, it's merged at https://github.com/apache/spark/pull/32658 so 
you could add the option in that page.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page

2021-05-31 Thread GitBox



SparkQA commented on pull request #32658:
URL: https://github.com/apache/spark/pull/32658#issuecomment-851749314


   **[Test build #139126 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139126/testReport)**
 for PR 32658 at commit 
[`f55a2fa`](https://github.com/apache/spark/commit/f55a2fa22efd4ac7611d0483b82dd73596bccce7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox



AmplabJenkins removed a comment on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851748863


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43645/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox



AmplabJenkins commented on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851748863


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43645/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino

2021-05-31 Thread GitBox



HyukjinKwon closed pull request #32716:
URL: https://github.com/apache/spark/pull/32716


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino

2021-05-31 Thread GitBox



HyukjinKwon commented on pull request #32716:
URL: https://github.com/apache/spark/pull/32716#issuecomment-851748664


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+

2021-05-31 Thread GitBox



HyukjinKwon commented on pull request #32709:
URL: https://github.com/apache/spark/pull/32709#issuecomment-851745373


   I have backported it to branch-3.1 and branch-3.0 too because this is a 
test-only, and in case other people run the tests with higher R versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+

2021-05-31 Thread GitBox



HyukjinKwon closed pull request #32709:
URL: https://github.com/apache/spark/pull/32709


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+

2021-05-31 Thread GitBox



HyukjinKwon commented on pull request #32709:
URL: https://github.com/apache/spark/pull/32709#issuecomment-851744847






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #32674: [SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor

2021-05-31 Thread GitBox



HyukjinKwon closed pull request #32674:
URL: https://github.com/apache/spark/pull/32674


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #32674: [SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor

2021-05-31 Thread GitBox



HyukjinKwon commented on pull request #32674:
URL: https://github.com/apache/spark/pull/32674#issuecomment-851744212


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

2021-05-31 Thread GitBox



SparkQA commented on pull request #32686:
URL: https://github.com/apache/spark/pull/32686#issuecomment-851743806


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43645/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 496 matches

Mail list logo