[GitHub] [spark] HyukjinKwon commented on a change in pull request #32725: [SPARK-33933][FOLLOW-UP][SQL] Fix a flaky test case in AdaptiveQueryExecSuite
HyukjinKwon commented on a change in pull request #32725: URL: https://github.com/apache/spark/pull/32725#discussion_r642801775 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -1605,7 +1605,7 @@ class AdaptiveQueryExecSuite } test("SPARK-33933: Materialize BroadcastQueryStage first in AQE") { -val testAppender = new LogAppender("aqe query stage materialization order test") +val testAppender = new LogAppender("aqe query stage materialization order test", 1) Review comment: cc Max @MaxGekk since this max is from Max (https://github.com/apache/spark/commit/88fc8dbc09c5d24ae89413ab1e1fbabdf1fd8028#diff-1c74a76903c7da8f8424992b46b2f99157609726bc580d60e2d0858ea11c2aecR197) :D -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #32727: [SPARK-35589][CORE] BlockManagerMasterEndpoint should not ignore index-only shuffle file during updating
dongjoon-hyun opened a new pull request #32727: URL: https://github.com/apache/spark/pull/32727 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
SparkQA commented on pull request #32723: URL: https://github.com/apache/spark/pull/32723#issuecomment-851839313 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43654/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32725: [SPARK-33933][FollowUp][SQL] Fix a flaky test case in AdaptiveQueryExecSuite
SparkQA commented on pull request #32725: URL: https://github.com/apache/spark/pull/32725#issuecomment-851839028 **[Test build #139137 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139137/testReport)** for PR 32725 at commit [`f890295`](https://github.com/apache/spark/commit/f890295c38f5c85d8b2c145740279af96e1bfae7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32726: [SPARK-35587][PYTHON][DOCS] Initial porting of Koalas documentation
SparkQA commented on pull request #32726: URL: https://github.com/apache/spark/pull/32726#issuecomment-851838979 **[Test build #139136 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139136/testReport)** for PR 32726 at commit [`262c5e1`](https://github.com/apache/spark/commit/262c5e1b68f229233c0867c003f14549a6cea6a1). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter
SparkQA commented on pull request #32724: URL: https://github.com/apache/spark/pull/32724#issuecomment-851839022 **[Test build #139138 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139138/testReport)** for PR 32724 at commit [`42a217e`](https://github.com/apache/spark/commit/42a217e7c97afc2b74016b9432843ea964f44c7d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32726: [SPARK-35587][PYTHON][DOCS] Initial porting of Koalas documentation
HyukjinKwon commented on pull request #32726: URL: https://github.com/apache/spark/pull/32726#issuecomment-851837787 cc @ueshin @xinrong-databricks @itholic FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #32726: [SPARK-35587][PYTHON][DOCS] Initial porting of Koalas documentation
HyukjinKwon opened a new pull request #32726: URL: https://github.com/apache/spark/pull/32726 ### What changes were proposed in this pull request? This PR proposes to port Koalas documentation to PySpark documentation as its initial step. It ports almost as is except that the import was renamed from `databricks.koalas` to `pyspark.pandas`. ### Why are the changes needed? To document pandas APIs on Spark. ### Does this PR introduce _any_ user-facing change? Yes, it adds new documentations. ### How was this patch tested? Manually built the docs and checked the output. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules
AmplabJenkins removed a comment on pull request #32721: URL: https://github.com/apache/spark/pull/32721#issuecomment-851836879 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43653/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite
AmplabJenkins removed a comment on pull request #32719: URL: https://github.com/apache/spark/pull/32719#issuecomment-851836878 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139129/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite
AmplabJenkins commented on pull request #32719: URL: https://github.com/apache/spark/pull/32719#issuecomment-851836878 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139129/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules
AmplabJenkins commented on pull request #32721: URL: https://github.com/apache/spark/pull/32721#issuecomment-851836879 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43653/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang opened a new pull request #32725: [SPARK-33933][FollowUp][SQL]
gengliangwang opened a new pull request #32725: URL: https://github.com/apache/spark/pull/32725 ### What changes were proposed in this pull request? Fix a flaky test case in AdaptiveQueryExecSuite ### Why are the changes needed? The test case becomes flaky since there are too many debug logs: https://github.com/Yikun/spark/runs/2715222392?check_suite_focus=true https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139125/testReport/ ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser
MaxGekk commented on pull request #32506: URL: https://github.com/apache/spark/pull/32506#issuecomment-851835967 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter
ulysses-you commented on pull request #32724: URL: https://github.com/apache/spark/pull/32724#issuecomment-851834150 cc @maropu @cloud-fan @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter
ulysses-you commented on a change in pull request #32724: URL: https://github.com/apache/spark/pull/32724#discussion_r642797088 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -1711,7 +1711,7 @@ object DecimalAggregates extends Rule[LogicalPlan] { * Converts local operations (i.e. ones that don't require data exchange) on `LocalRelation` to * another `LocalRelation`. */ -object ConvertToLocalRelation extends Rule[LogicalPlan] { +trait ConvertToLocalRelationBase extends Rule[LogicalPlan] { Review comment: this is for isolation between normal and AQE optimizer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32722: [SPARK-35586][[K8S][TESTS] Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests
SparkQA commented on pull request #32722: URL: https://github.com/apache/spark/pull/32722#issuecomment-851832519 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43652/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
yaooqinn commented on a change in pull request #32718: URL: https://github.com/apache/spark/pull/32718#discussion_r642796052 ## File path: sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala ## @@ -40,6 +40,12 @@ class MiscFunctionsSuite extends QueryTest with SharedSparkSession { Row(SPARK_VERSION_SHORT + " " + SPARK_REVISION)) assert(df.schema.fieldNames === Seq("version()")) } + + test("get current_user and session_user in normal spark apps") { Review comment: my bad, I forgot it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules
SparkQA commented on pull request #32721: URL: https://github.com/apache/spark/pull/32721#issuecomment-851830047 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43653/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only
MaxGekk commented on a change in pull request #32714: URL: https://github.com/apache/spark/pull/32714#discussion_r642795083 ## File path: docs/sql-migration-guide.md ## @@ -91,6 +91,8 @@ license: | - In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will throw `AnalysisException`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`. + - In Spark 3.2, the special datetime values such as `epoch`, `today`, `yesterday`, `tomorrow` and `now` are supported in typed literals only, for instance `select timestamp'now'`. In Spark 3.1 and earlier, such special values are supported in any casts of strings to dates/timestamps. To restore the behavior before Spark 3.2, you should preprocess string columns and convert the strings to desired timestamps explicitly using UDF for instance. Review comment: @yaooqinn What do you mean by: for instance **(add',')** `select timestamp'now'`. I didn't get the problem. BTW, you could use the suggestion feature, so, I would just commit your suggestions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you opened a new pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter
ulysses-you opened a new pull request #32724: URL: https://github.com/apache/spark/pull/32724 ### What changes were proposed in this pull request? Add rule `ConvertToLocalRelation` into AQE Optimizer. ### Why are the changes needed? Support propagate empty local relation through project and filter like such SQL case: ``` Aggregate Project Join ShuffleStage ShuffleStage ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #32653: [SPARK-35312][SS] Introduce new Option in Kafka source to specify minimum number of records to read per trigger
HeartSaVioR commented on a change in pull request #32653: URL: https://github.com/apache/spark/pull/32653#discussion_r642792749 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/CompositeReadLimit.java ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.read.streaming; + +import org.apache.spark.annotation.Evolving; + +import java.util.Objects; + +/** + /** + * Represents a {@link ReadLimit} where the {@link MicroBatchStream} should scan approximately + * given maximum number of rows with at least the given minimum number of rows. + * + * @see SupportsAdmissionControl#latestOffset(Offset, ReadLimit) + * @since 3.1.2 + */ +@Evolving +public final class CompositeReadLimit implements ReadLimit { Review comment: Looks to be addressed. Thanks :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite
SparkQA removed a comment on pull request #32719: URL: https://github.com/apache/spark/pull/32719#issuecomment-851769807 **[Test build #139129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139129/testReport)** for PR 32719 at commit [`941ee9c`](https://github.com/apache/spark/commit/941ee9c1d04f9951598ed8bfb93b5bdaa2819e18). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite
SparkQA commented on pull request #32719: URL: https://github.com/apache/spark/pull/32719#issuecomment-851821942 **[Test build #139129 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139129/testReport)** for PR 32719 at commit [`941ee9c`](https://github.com/apache/spark/commit/941ee9c1d04f9951598ed8bfb93b5bdaa2819e18). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
HyukjinKwon commented on a change in pull request #32723: URL: https://github.com/apache/spark/pull/32723#discussion_r642788618 ## File path: python/pyspark/sql/readwriter.py ## @@ -627,8 +627,6 @@ def jdbc(self, url, table, column=None, lowerBound=None, upperBound=None, numPar Parameters -- -url : str -a JDBC URL of the form ``jdbc:subprotocol:subname`` table : str the name of the table column : str, optional Review comment: I think we can remove `lowerBound`, `upperBound`, and `numPartitions`. And, fix the description of `column` to something like: Alias of `partitionColumn` option. Refer to `partitionColumn` in `Data Source Option <...>`_ in the version you use. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules
SparkQA commented on pull request #32721: URL: https://github.com/apache/spark/pull/32721#issuecomment-851806715 **[Test build #139135 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139135/testReport)** for PR 32721 at commit [`864ee6f`](https://github.com/apache/spark/commit/864ee6fe62400c73ed973c63e67576075c89fa4a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules
SparkQA removed a comment on pull request #32721: URL: https://github.com/apache/spark/pull/32721#issuecomment-851803742 **[Test build #139133 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139133/testReport)** for PR 32721 at commit [`d95d332`](https://github.com/apache/spark/commit/d95d3322b0755f065a76aa094bf384dcaa5dec4c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules
AmplabJenkins removed a comment on pull request #32721: URL: https://github.com/apache/spark/pull/32721#issuecomment-851806052 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139133/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules
AmplabJenkins commented on pull request #32721: URL: https://github.com/apache/spark/pull/32721#issuecomment-851806052 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139133/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules
SparkQA commented on pull request #32721: URL: https://github.com/apache/spark/pull/32721#issuecomment-851806030 **[Test build #139133 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139133/testReport)** for PR 32721 at commit [`d95d332`](https://github.com/apache/spark/commit/d95d3322b0755f065a76aa094bf384dcaa5dec4c). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
SparkQA commented on pull request #32723: URL: https://github.com/apache/spark/pull/32723#issuecomment-851805108 **[Test build #139134 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139134/testReport)** for PR 32723 at commit [`5853560`](https://github.com/apache/spark/commit/585356099656c40b43a34cd939d66a6a0fdf9305). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic opened a new pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
itholic opened a new pull request #32723: URL: https://github.com/apache/spark/pull/32723 ### What changes were proposed in this pull request? This PR proposes move missing JDBC data source options from Python, Scala and Java into a single page. ### Why are the changes needed? So far, the documentation for JDBC data source options is separated into different pages for each language API documents. However, this makes managing many options inconvenient, so it is efficient to manage all options in a single page and provide a link to that page in the API of each language. ### Does this PR introduce _any_ user-facing change? Yes, the documents will be shown below after this change: - "JDBC To Other Databases" page https://user-images.githubusercontent.com/44108233/120267176-66505100-c2de-11eb-9a03-df027c27fdd2.png;> - Python https://user-images.githubusercontent.com/44108233/120267196-71a37c80-c2de-11eb-8909-f41cf3ebd470.png;> - Scala https://user-images.githubusercontent.com/44108233/120268675-38204080-c2e1-11eb-94d3-858131799a6b.png;> - Java https://user-images.githubusercontent.com/44108233/120268683-3c4c5e00-c2e1-11eb-9f5a-b95b952bf87a.png;> ### How was this patch tested? Manually build docs and confirm the page. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules
SparkQA commented on pull request #32721: URL: https://github.com/apache/spark/pull/32721#issuecomment-851803742 **[Test build #139133 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139133/testReport)** for PR 32721 at commit [`d95d332`](https://github.com/apache/spark/commit/d95d3322b0755f065a76aa094bf384dcaa5dec4c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32722: [SPARK-35586][[K8S][TESTS] Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests
SparkQA commented on pull request #32722: URL: https://github.com/apache/spark/pull/32722#issuecomment-851803720 **[Test build #139132 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139132/testReport)** for PR 32722 at commit [`f2d9f30`](https://github.com/apache/spark/commit/f2d9f30e2f84fcc3fd692daf31934b568134a56c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
AmplabJenkins removed a comment on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-851802395 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43647/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
AmplabJenkins removed a comment on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851802400 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139125/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command
AmplabJenkins removed a comment on pull request #32720: URL: https://github.com/apache/spark/pull/32720#issuecomment-851802399 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43651/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close
AmplabJenkins removed a comment on pull request #32693: URL: https://github.com/apache/spark/pull/32693#issuecomment-851802397 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43650/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser
AmplabJenkins removed a comment on pull request #32506: URL: https://github.com/apache/spark/pull/32506#issuecomment-851802398 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43648/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
AmplabJenkins commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851802400 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139125/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command
AmplabJenkins commented on pull request #32720: URL: https://github.com/apache/spark/pull/32720#issuecomment-851802399 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43651/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
AmplabJenkins commented on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-851802395 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43647/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close
AmplabJenkins commented on pull request #32693: URL: https://github.com/apache/spark/pull/32693#issuecomment-851802397 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43650/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser
AmplabJenkins commented on pull request #32506: URL: https://github.com/apache/spark/pull/32506#issuecomment-851802398 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43648/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32702: [SPARK-35565][SS] Add config for ignoring metadata directory of FileStreamSink
viirya commented on pull request #32702: URL: https://github.com/apache/spark/pull/32702#issuecomment-851802006 Okay, sounds good. Let me change to using a source option. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command
SparkQA commented on pull request #32720: URL: https://github.com/apache/spark/pull/32720#issuecomment-851801961 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43651/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak opened a new pull request #32722: [SPARK-35586][[K8S][TESTS] Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests
sarutak opened a new pull request #32722: URL: https://github.com/apache/spark/pull/32722 ### What changes were proposed in this pull request? This PR set a default value for `spark.kubernetes.test.sparkTgz` in `kubernetes/integration-tests/pom.xml` for Kubernetes integration tests. ### Why are the changes needed? In the current master, running the integration tests with the following command will fail because there is no default value set for the property. ``` build/mvn -Dspark.kubernetes.test.namespace=default -Pkubernetes -Pkubernetes-integration-tests -Psparkr -pl resource-managers/kubernetes/integration-tests integration-test ``` ``` + mkdir -p /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked + tar -xzvf --test-exclude-tags --strip-components=1 -C /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked tar (child): --test-exclude-tags: Cannot open: No such file or directory tar (child): Error is not recoverable: exiting now tar: Child returned status 2 tar: Error is not recoverable: exiting now [ERROR] Command execution failed. ``` According to `setup-integration-test-env.sh`, `N/A` is intended as the default value so this PR choose it. ``` SPARK_TGZ="N/A" MVN="$TEST_ROOT_DIR/build/mvn" EXCLUDE_TAGS="" ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Build and tests successfully finish with the command shown above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close
SparkQA commented on pull request #32693: URL: https://github.com/apache/spark/pull/32693#issuecomment-851797476 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43650/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sigmod opened a new pull request #32721: [WIP][SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules
sigmod opened a new pull request #32721: URL: https://github.com/apache/spark/pull/32721 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser
SparkQA commented on pull request #32506: URL: https://github.com/apache/spark/pull/32506#issuecomment-851795183 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43648/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA removed a comment on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851734216 **[Test build #139125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139125/testReport)** for PR 32686 at commit [`8252a6a`](https://github.com/apache/spark/commit/8252a6a93a05c97ed47e3174be76fe1aeb3f6567). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851794843 **[Test build #139125 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139125/testReport)** for PR 32686 at commit [`8252a6a`](https://github.com/apache/spark/commit/8252a6a93a05c97ed47e3174be76fe1aeb3f6567). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
SparkQA commented on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-851792608 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43647/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command
SparkQA commented on pull request #32720: URL: https://github.com/apache/spark/pull/32720#issuecomment-851790068 **[Test build #139131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139131/testReport)** for PR 32720 at commit [`66536fb`](https://github.com/apache/spark/commit/66536fb5b2d8f1499bd4bdb5a9a31435f637bab8). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #32712: [SPARK-35576][SQL] Redact the sensitive info in the result of Set command
gengliangwang commented on pull request #32712: URL: https://github.com/apache/spark/pull/32712#issuecomment-851789021 @dongjoon-hyun Thanks for merging. I have opened a cherry-pick PR in https://github.com/apache/spark/pull/32720 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang opened a new pull request #32720: [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command
gengliangwang opened a new pull request #32720: URL: https://github.com/apache/spark/pull/32720 ### What changes were proposed in this pull request? Currently, the results of following SQL queries are not redacted: ``` SET [KEY]; SET; ``` For example: ``` scala> spark.sql("set javax.jdo.option.ConnectionPassword=123456").show() ++--+ | key| value| ++--+ |javax.jdo.option|123456| ++--+ scala> spark.sql("set javax.jdo.option.ConnectionPassword").show() ++--+ | key| value| ++--+ |javax.jdo.option|123456| ++--+ scala> spark.sql("set").show() +++ | key| value| +++ |javax.jdo.option| 123456| ``` We should hide the sensitive information and redact the query output. ### Why are the changes needed? Security. ### Does this PR introduce _any_ user-facing change? Yes, the sensitive information in the output of Set commands are redacted ### How was this patch tested? Unit test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+
viirya commented on pull request #32709: URL: https://github.com/apache/spark/pull/32709#issuecomment-851788514 Cool! Thanks @HyukjinKwon! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #32653: [SPARK-35312][SS] Introduce new Option in Kafka source to specify minimum number of records to read per trigger
HeartSaVioR commented on a change in pull request #32653: URL: https://github.com/apache/spark/pull/32653#discussion_r642765673 ## File path: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala ## @@ -139,26 +156,78 @@ private[kafka010] class KafkaSource( override def latestOffset(startOffset: streaming.Offset, limit: ReadLimit): streaming.Offset = { // Make sure initialPartitionOffsets is initialized initialPartitionOffsets - -val latest = kafkaReader.fetchLatestOffsets( - currentPartitionOffsets.orElse(Some(initialPartitionOffsets))) +val currentOffsets = currentPartitionOffsets.orElse(Some(initialPartitionOffsets)) +val latest = kafkaReader.fetchLatestOffsets(currentOffsets) +var skipBatch = false Review comment: Same here as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #32653: [SPARK-35312][SS] Introduce new Option in Kafka source to specify minimum number of records to read per trigger
HeartSaVioR commented on a change in pull request #32653: URL: https://github.com/apache/spark/pull/32653#discussion_r642765440 ## File path: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala ## @@ -95,15 +114,62 @@ private[kafka010] class KafkaMicroBatchStream( override def latestOffset(start: Offset, readLimit: ReadLimit): Offset = { val startPartitionOffsets = start.asInstanceOf[KafkaSourceOffset].partitionToOffsets latestPartitionOffsets = kafkaOffsetReader.fetchLatestOffsets(Some(startPartitionOffsets)) +var skipBatch = false Review comment: Now I see duplicated codes around due to branches handling each type, including CompositeReadLimit which handles both lower and upper hence having same code. How about changing like below: ``` val limits: Seq[ReadLimit] = readLimit match { case rows: CompositeReadLimit => rows.getReadLimits case rows => Seq(rows) } val offsets = if (limits.exists(_.isInstanceOf[ReadAllAvailable])) { // ReadAllAvailable has the highest priority latestPartitionOffsets } else { val lowerLimit = limits.find(_.isInstanceOf[ReadMinRows]).map(_.asInstanceOf[ReadMinRows]) val upperLimit = limits.find(_.isInstanceOf[ReadMaxRows]).map(_.asInstanceOf[ReadMaxRows]) lowerLimit.flatMap { limit => // checking if we need to skip batch based on minOffsetPerTrigger criteria val skipBatch = delayBatch( limit.minRows, latestPartitionOffsets, startPartitionOffsets, limit.maxTriggerDelayMs) if (skipBatch) { logDebug( s"Delaying batch as number of records available is less than minOffsetsPerTrigger") Some(startPartitionOffsets) } else { None } }.orElse { // checking if we need to adjust a range of offsets based on maxOffsetPerTrigger criteria upperLimit.map { limit => rateLimit(limit.maxRows(), startPartitionOffsets, latestPartitionOffsets) } }.getOrElse(latestPartitionOffsets) } endPartitionOffsets = KafkaSourceOffset(offsets) endPartitionOffsets ``` this would require less change when we want to add more read limits in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when HiveClientImpl.state close
SparkQA commented on pull request #32693: URL: https://github.com/apache/spark/pull/32693#issuecomment-851785773 **[Test build #139130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139130/testReport)** for PR 32693 at commit [`698bea5`](https://github.com/apache/spark/commit/698bea5d49986f955c0736bff59ceb0c7c6051e8). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
AmplabJenkins removed a comment on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851784991 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43646/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite
AmplabJenkins commented on pull request #32719: URL: https://github.com/apache/spark/pull/32719#issuecomment-851784992 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43649/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
AmplabJenkins commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851784991 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43646/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite
AmplabJenkins removed a comment on pull request #32719: URL: https://github.com/apache/spark/pull/32719#issuecomment-851784992 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43649/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser
SparkQA commented on pull request #32506: URL: https://github.com/apache/spark/pull/32506#issuecomment-851784737 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43648/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32719: [SPARK-35584][TESTS] Increase the timeout in FallbackStorageSuite
SparkQA commented on pull request #32719: URL: https://github.com/apache/spark/pull/32719#issuecomment-851784608 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43649/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
SparkQA commented on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-851782661 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43647/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang closed pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
gengliangwang closed pull request #32686: URL: https://github.com/apache/spark/pull/32686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
gengliangwang commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851781327 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
HyukjinKwon commented on a change in pull request #32718: URL: https://github.com/apache/spark/pull/32718#discussion_r642758972 ## File path: sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala ## @@ -40,6 +40,12 @@ class MiscFunctionsSuite extends QueryTest with SharedSparkSession { Row(SPARK_VERSION_SHORT + " " + SPARK_REVISION)) assert(df.schema.fieldNames === Seq("version()")) } + + test("get current_user and session_user in normal spark apps") { Review comment: shall we add the JIRA prefix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+
HyukjinKwon commented on pull request #32709: URL: https://github.com/apache/spark/pull/32709#issuecomment-851778790 CRAN was my env issue. Now the tests and CRAN check should work with R 4.1+ too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #32714: [SPARK-35581][SQL] Support special datetime values in typed literals only
yaooqinn commented on a change in pull request #32714: URL: https://github.com/apache/spark/pull/32714#discussion_r642757369 ## File path: docs/sql-migration-guide.md ## @@ -91,6 +91,8 @@ license: | - In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will throw `AnalysisException`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`. + - In Spark 3.2, the special datetime values such as `epoch`, `today`, `yesterday`, `tomorrow` and `now` are supported in typed literals only, for instance `select timestamp'now'`. In Spark 3.1 and earlier, such special values are supported in any casts of strings to dates/timestamps. To restore the behavior before Spark 3.2, you should preprocess string columns and convert the strings to desired timestamps explicitly using UDF for instance. Review comment: In Spark 3.2, ~the~ special datetime values. in typed literals only, for instance **(add',')** `select timestamp'now'`. In Spark 3.1 and ~earlier~ (3.0?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer
ulysses-you commented on a change in pull request #32602: URL: https://github.com/apache/spark/pull/32602#discussion_r642757227 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala ## @@ -27,7 +28,9 @@ import org.apache.spark.util.Utils */ class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] { private val defaultBatches = Seq( -Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin), +Batch("Propagate Empty Relations", Once, + AQEPropagateEmptyRelation, + UpdateAttributeNullability), Review comment: ah I see, will do this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851775047 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43646/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32719: [SPARK-34059][TESTS]Increase the timeout in FallbackStorageSuite
HyukjinKwon commented on pull request #32719: URL: https://github.com/apache/spark/pull/32719#issuecomment-851771579 seems like the JIRA number is wrong in the title -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32719: [SPARK-34059][TESTS]Increase the timeout in FallbackStorageSuite
SparkQA commented on pull request #32719: URL: https://github.com/apache/spark/pull/32719#issuecomment-851769807 **[Test build #139129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139129/testReport)** for PR 32719 at commit [`941ee9c`](https://github.com/apache/spark/commit/941ee9c1d04f9951598ed8bfb93b5bdaa2819e18). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Yikun opened a new pull request #32719: [SPARK-34059][TESTS]Increase the timeout in FallbackStorageSuite
Yikun opened a new pull request #32719: URL: https://github.com/apache/spark/pull/32719 ### What changes were proposed in this pull request? ``` - Upload multi stages *** FAILED *** {{ The code passed to eventually never returned normally. Attempted 20 times over 10.011176743 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:243)}} ``` The error like above was raised in aarch64 randomly and also in github action test[1][2]. [1] https://github.com/apache/spark/actions/runs/489319612 [2]https://github.com/apache/spark/actions/runs/479317320 ### Why are the changes needed? timeout is too short, need to increase to let test case complete. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? build/mvn test -Dtest=none -DwildcardSuites=org.apache.spark.storage.FallbackStorageSuite -pl :spark-core_2.12 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer
cloud-fan commented on a change in pull request #32602: URL: https://github.com/apache/spark/pull/32602#discussion_r642749109 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala ## @@ -27,7 +28,9 @@ import org.apache.spark.util.Utils */ class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] { private val defaultBatches = Seq( -Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin), +Batch("Propagate Empty Relations", Once, + AQEPropagateEmptyRelation, + UpdateAttributeNullability), Review comment: It's a bit different: ``` Project Shuffle Stage ``` For the above case, we don't want to optimize it as the benefit is too small (removing a shuffle stage may cause regression) ``` Project Sort Shuffle Stage ``` For the above case, we will optimize Sort -> Shuffle Stage to empty relation first. Then it makes sense to optimize further and optimize out project, as the shuffle stage is already gone. So adding `ConvertToLocalRelation` looks the best solution here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32506: [SPARK-35374][SQL] Add string-to-number conversion support to JacksonParser
SparkQA commented on pull request #32506: URL: https://github.com/apache/spark/pull/32506#issuecomment-851767556 **[Test build #139128 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139128/testReport)** for PR 32506 at commit [`a361275`](https://github.com/apache/spark/commit/a36127512f4f5eadd9f0b9c9f9b0c3ef90b155e3). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
SparkQA commented on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-851767500 **[Test build #139127 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139127/testReport)** for PR 32718 at commit [`ae337c1`](https://github.com/apache/spark/commit/ae337c13b7648c2011976eb8bef4fd8e67fcf44d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer
cloud-fan commented on a change in pull request #32602: URL: https://github.com/apache/spark/pull/32602#discussion_r642749109 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala ## @@ -27,7 +28,9 @@ import org.apache.spark.util.Utils */ class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] { private val defaultBatches = Seq( -Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin), +Batch("Propagate Empty Relations", Once, + AQEPropagateEmptyRelation, + UpdateAttributeNullability), Review comment: It's a bit different: ``` Project Shuffle Stage ``` For the above case, we don't want to optimize it as the benefit is too small ``` Project Sort Shuffle Stage ``` For the above case, we will optimize Sort -> Shuffle Stage to empty relation first. Then it makes sense to optimize further and optimize out project, as the shuffle stage is already gone. So adding `ConvertToLocalRelation` looks the best solution here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
yaooqinn commented on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-851766836 cc @cloud-fan @wangyum @maropu thanks very much -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn opened a new pull request #32718: [SPARK-21957][SQL] Support current_user and session_user functions
yaooqinn opened a new pull request #32718: URL: https://github.com/apache/spark/pull/32718 ### What changes were proposed in this pull request? Currently, we do not have a suitable definition of the `user` concept in Spark. We only have a `sparkUser` app widely but do not support identifier or retrieve the user information from a session in STS or a runtime query execution. These SQL functions are very popular and supported by plenty of other modern or old school databases, and also compliance. This PR add `current_user()` and `session_user()` as SQL functions. And, they are the same. In this PR, we add these functions w/o ambiguity. 1. For a normal single-threaded Spark application, clearly the `sparkUser` is always equivalent to `current_user()` and `session_user()`. 2. For a multi-threaded Spark application, e.g. Spark thrift server, we use a `ThreadLocal` variable to store the client-side user(after authenticated) before running the query and retrieve it in the parser. ### Why are the changes needed? These SQL functions are very popular and supported by plenty of other modern or old school databases, and also compliance. ### Does this PR introduce _any_ user-facing change? yes, added `current_user()` and `session_user()` as SQL functions ### How was this patch tested? new tests in thrift server and sql/catalyst -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #32602: [SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer
ulysses-you commented on a change in pull request #32602: URL: https://github.com/apache/spark/pull/32602#discussion_r642747242 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala ## @@ -27,7 +28,9 @@ import org.apache.spark.util.Utils */ class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] { private val defaultBatches = Seq( -Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin), +Batch("Propagate Empty Relations", Once, + AQEPropagateEmptyRelation, + UpdateAttributeNullability), Review comment: yeah, I noticed it. We can put it so that we can propagate empty through `project/filter`. like such case: ``` Aggregate Project Join Shuffle ``` But it need to isolate normal and AQE due to `transformWithPruning`. Otherhand I feel that it's similar if we just let `AQEPropagateEmptyRelation` support propagate `project/filter`. and the later is simpler. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851763525 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43646/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32715: [SPARK-35577][TESTS] Allow to log container output for docker integration tests
HyukjinKwon commented on pull request #32715: URL: https://github.com/apache/spark/pull/32715#issuecomment-851751136 Looks fine. cc @maropu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
HyukjinKwon closed pull request #32658: URL: https://github.com/apache/spark/pull/32658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
HyukjinKwon commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851750789 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32558: [SPARK-34953][CORE][SQL] Add the code change for adding the DateType in the infer schema while reading in CSV and JSON
HyukjinKwon commented on pull request #32558: URL: https://github.com/apache/spark/pull/32558#issuecomment-851750660 Oh I meant this: https://github.com/apache/spark/blob/master/python/pyspark/sql/readwriter.py#L342-L350 These options are listed up as a parameter in Python side specifically. For CSV documentation, it's merged at https://github.com/apache/spark/pull/32658 so you could add the option in that page. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32658: [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page
SparkQA commented on pull request #32658: URL: https://github.com/apache/spark/pull/32658#issuecomment-851749314 **[Test build #139126 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139126/testReport)** for PR 32658 at commit [`f55a2fa`](https://github.com/apache/spark/commit/f55a2fa22efd4ac7611d0483b82dd73596bccce7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
AmplabJenkins removed a comment on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851748863 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43645/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
AmplabJenkins commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851748863 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43645/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino
HyukjinKwon closed pull request #32716: URL: https://github.com/apache/spark/pull/32716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino
HyukjinKwon commented on pull request #32716: URL: https://github.com/apache/spark/pull/32716#issuecomment-851748664 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+
HyukjinKwon commented on pull request #32709: URL: https://github.com/apache/spark/pull/32709#issuecomment-851745373 I have backported it to branch-3.1 and branch-3.0 too because this is a test-only, and in case other people run the tests with higher R versions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+
HyukjinKwon closed pull request #32709: URL: https://github.com/apache/spark/pull/32709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32709: [SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+
HyukjinKwon commented on pull request #32709: URL: https://github.com/apache/spark/pull/32709#issuecomment-851744847 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32674: [SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor
HyukjinKwon closed pull request #32674: URL: https://github.com/apache/spark/pull/32674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32674: [SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor
HyukjinKwon commented on pull request #32674: URL: https://github.com/apache/spark/pull/32674#issuecomment-851744212 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32686: [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules
SparkQA commented on pull request #32686: URL: https://github.com/apache/spark/pull/32686#issuecomment-851743806 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43645/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org