[GitHub] [spark] maropu commented on issue #28306: [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference
maropu commented on issue #28306: URL: https://github.com/apache/spark/pull/28306#issuecomment-618194775 Oops, I see. LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28306: [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference
AmplabJenkins commented on issue #28306: URL: https://github.com/apache/spark/pull/28306#issuecomment-618194577 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28306: [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference
AmplabJenkins removed a comment on issue #28306: URL: https://github.com/apache/spark/pull/28306#issuecomment-618194577 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28306: [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference
SparkQA commented on issue #28306: URL: https://github.com/apache/spark/pull/28306#issuecomment-618194437 **[Test build #121651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121651/testReport)** for PR 28306 at commit [`9ab90f2`](https://github.com/apache/spark/commit/9ab90f2a5cb305684746db2ec9ec98b1e8b9921e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28306: [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference
SparkQA removed a comment on issue #28306: URL: https://github.com/apache/spark/pull/28306#issuecomment-618190388 **[Test build #121651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121651/testReport)** for PR 28306 at commit [`9ab90f2`](https://github.com/apache/spark/commit/9ab90f2a5cb305684746db2ec9ec98b1e8b9921e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28304: [SPARK-31523][SQL] LogicalPlan doCanonicalize should throw exception if not resolved
maropu commented on a change in pull request #28304: URL: https://github.com/apache/spark/pull/28304#discussion_r413525735 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ## @@ -40,6 +40,13 @@ abstract class LogicalPlan super.verboseString(maxFields) + statsCache.map(", " + _.toString).getOrElse("") } + override protected def doCanonicalize(): LogicalPlan = { +if (!resolved) { Review comment: You think users use canonicalization? I think that is for internal use only though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on issue #28306: [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference
huaxingao commented on issue #28306: URL: https://github.com/apache/spark/pull/28306#issuecomment-618191338 @cloud-fan @maropu Sorry, your guys are too fast for me :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28306: [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference
AmplabJenkins removed a comment on issue #28306: URL: https://github.com/apache/spark/pull/28306#issuecomment-618190834 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28302: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
AmplabJenkins commented on issue #28302: URL: https://github.com/apache/spark/pull/28302#issuecomment-618190779 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28306: [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference
AmplabJenkins commented on issue #28306: URL: https://github.com/apache/spark/pull/28306#issuecomment-618190834 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28302: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
AmplabJenkins removed a comment on issue #28302: URL: https://github.com/apache/spark/pull/28302#issuecomment-618190779 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28302: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
SparkQA commented on issue #28302: URL: https://github.com/apache/spark/pull/28302#issuecomment-618190414 **[Test build #121652 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121652/testReport)** for PR 28302 at commit [`c72ba70`](https://github.com/apache/spark/commit/c72ba701ed5685e89b90fb001dfaf32a4b6a9e4a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28306: [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference
SparkQA commented on issue #28306: URL: https://github.com/apache/spark/pull/28306#issuecomment-618190388 **[Test build #121651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121651/testReport)** for PR 28306 at commit [`9ab90f2`](https://github.com/apache/spark/commit/9ab90f2a5cb305684746db2ec9ec98b1e8b9921e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao opened a new pull request #28306: [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference
huaxingao opened a new pull request #28306: URL: https://github.com/apache/spark/pull/28306 ### What changes were proposed in this pull request? Need to address a few more comments ### Why are the changes needed?Fix a few problems ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? Manually build and check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28302: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
AmplabJenkins removed a comment on issue #28302: URL: https://github.com/apache/spark/pull/28302#issuecomment-618188366 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28294: [SPARK-31519][SQL] Cast in having aggregate expressions returns the wrong result
maropu commented on a change in pull request #28294: URL: https://github.com/apache/spark/pull/28294#discussion_r413521707 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -238,13 +238,13 @@ class Analyzer( ResolveNaturalAndUsingJoin :: ResolveOutputRelation :: ExtractWindowExpressions :: + ResolveTimeZone(conf) :: Review comment: ok, merged in https://github.com/apache/spark/commit/ca90e1932dcdc43748297c627ec857b6ea97dff7 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28302: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
AmplabJenkins commented on issue #28302: URL: https://github.com/apache/spark/pull/28302#issuecomment-618188366 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao edited a comment on issue #28237: [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference
huaxingao edited a comment on issue #28237: URL: https://github.com/apache/spark/pull/28237#issuecomment-618185381 Thank you all for the help! Actually I need to address a couple of more comments. Sorry I was not fast enough. I will have a follow up in a few minutes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #28288: [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone
maropu commented on issue #28288: URL: https://github.com/apache/spark/pull/28288#issuecomment-618187901 Thanks, all! Merged to master/3.0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28302: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
SparkQA commented on issue #28302: URL: https://github.com/apache/spark/pull/28302#issuecomment-618188030 **[Test build #121650 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121650/testReport)** for PR 28302 at commit [`09b87ff`](https://github.com/apache/spark/commit/09b87ff5cfdfc2844f6b94063e1863be71ff5a78). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28288: [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone
AmplabJenkins removed a comment on issue #28288: URL: https://github.com/apache/spark/pull/28288#issuecomment-618184980 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on issue #28237: [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference
huaxingao commented on issue #28237: URL: https://github.com/apache/spark/pull/28237#issuecomment-618185381 Thank you all for the help! Actually I need to address a couple of more comments. Sorry I was not fast enough. I will have a follow up in s few minutes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28288: [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone
AmplabJenkins commented on issue #28288: URL: https://github.com/apache/spark/pull/28288#issuecomment-618184980 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28288: [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone
SparkQA removed a comment on issue #28288: URL: https://github.com/apache/spark/pull/28288#issuecomment-618115365 **[Test build #121641 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121641/testReport)** for PR 28288 at commit [`73f4694`](https://github.com/apache/spark/commit/73f4694e5cee1aa2f256a90b03d1fb09ee5a295d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28288: [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone
SparkQA commented on issue #28288: URL: https://github.com/apache/spark/pull/28288#issuecomment-618184359 **[Test build #121641 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121641/testReport)** for PR 28288 at commit [`73f4694`](https://github.com/apache/spark/commit/73f4694e5cee1aa2f256a90b03d1fb09ee5a295d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #28237: [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference
maropu commented on issue #28237: URL: https://github.com/apache/spark/pull/28237#issuecomment-618182481 Thanks, all! Merged to master/3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #28294: [SPARK-31519][SQL] Cast in having aggregate expressions returns the wrong result
xuanyuanking commented on a change in pull request #28294: URL: https://github.com/apache/spark/pull/28294#discussion_r413511664 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ## @@ -583,6 +583,16 @@ case class Aggregate( } } +case class AggregateWithHaving( Review comment: ``` move it into unresolved.scala? ``` Yeah, make sense, will change it to unresolved.scala. ``` Could we rename this into UnresolvedHaving ``` Since the `group by` not always come with `Aggregate`, it can also be `GroupingSets`, we only handle the Aggregate part with `AggregateWithHaving`. So maybe let's keep it `AggregateWithHaving`? WDYT :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28276: [SPARK-31476][SQL][FOLLOWUP] Add tests for extract('field', source)
cloud-fan commented on issue #28276: URL: https://github.com/apache/spark/pull/28276#issuecomment-618178541 thanks, merging to master/3.0! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28237: [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference
cloud-fan commented on a change in pull request #28237: URL: https://github.com/apache/spark/pull/28237#discussion_r413508809 ## File path: docs/sql-ref-literals.md ## @@ -0,0 +1,532 @@ +--- +layout: global +title: Literals +displayTitle: Literals +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +A literal (also known as a constant) represents a fixed data value. Spark SQL supports the following literals: + + * [String Literal](#string-literal) + * [Binary Literal](#binary-literal) + * [Null Literal](#null-literal) + * [Boolean Literal](#boolean-literal) + * [Numeric Literal](#numeric-literal) + * [Datetime Literal](#datetime-literal) + * [Interval Literal](#interval-literal) + +### String Literal + +A string literal is used to specify a character string value. + + Syntax + +{% highlight sql %} +'c [ ... ]' | "c [ ... ]" +{% endhighlight %} + + Parameters + + + c + +One character from the character set. Use \ to escape special characters (e.g., ' or \). + + + + Examples + +{% highlight sql %} +SELECT 'Hello, World!' AS col; ++-+ +| col| ++-+ +|Hello, World!| ++-+ + +SELECT "SPARK SQL" AS col; ++-+ +| col| ++-+ +|Spark SQL| ++-+ + +SELECT 'it\'s $10.' AS col; ++-+ +| col| ++-+ +|It's $10.| ++-+ +{% endhighlight %} + +### Binary Literal + +A binary literal is used to specify a byte sequence value. + + Syntax + +{% highlight sql %} +X { 'c [ ... ]' | "c [ ... ]" } +{% endhighlight %} + + Parameters + + + c + +One character from the character set. + + + + Examples + +{% highlight sql %} +SELECT X'123456' AS col; ++--+ +| col| ++--+ +|[12 34 56]| ++--+ +{% endhighlight %} + +### Null Literal + +A null literal is used to specify a null value. + + Syntax + +{% highlight sql %} +NULL +{% endhighlight %} + + Examples + +{% highlight sql %} +SELECT NULL AS col; +++ +| col| +++ +|NULL| +++ +{% endhighlight %} + +### Boolean Literal + +A boolean literal is used to specify a boolean value. + + Syntax + +{% highlight sql %} +TRUE | FALSE +{% endhighlight %} + + Examples + +{% highlight sql %} +SELECT TRUE AS col; +++ +| col| +++ +|true| +++ +{% endhighlight %} + +### Numeric Literal + +A numeric literal is used to specify a fixed or floating-point number. + + Integral Literal + + Syntax + +{% highlight sql %} +[ + | - ] digit [ ... ] [ L | S | Y ] +{% endhighlight %} + + Parameters + + + digit + +Any numeral from 0 to 9. + + + + L + +Case insensitive, indicates BIGINT, which is a 8-byte signed integer number. + + + + S + +Case insensitive, indicates SMALLINT, which is a 2-byte signed integer number. + + + + Y + +Case insensitive, indicates TINYINT, which is a 1-byte signed integer number. + + + + default (no postfix) + +Indicates a 4-byte signed integer number. + + + + Examples + +{% highlight sql %} +SELECT -2147483648 AS col; ++---+ +|col| ++---+ +|-2147483648| ++---+ + +SELECT 9223372036854775807l AS col; ++---+ +|col| ++---+ +|9223372036854775807| ++---+ + +SELECT -32Y AS col; ++---+ +|col| ++---+ +|-32| ++---+ + +SELECT 482S AS col; ++---+ +|col| ++---+ +|482| ++---+ +{% endhighlight %} + + Fractional Literals + + Syntax + +decimal literals: +{% highlight sql %} +decimal_digits { [ BD ] | [ exponent BD ] } | digit [ ... ] [ exponent ] BD +{% endhighlight %} + +double literals: +{% highlight sql %} +decimal_digits { D | exponent [ D ] } | digit [ ... ] { exponent [ D ] | [ exponent ] D } +{% endhighlight %} + +While decimal_digits is defined as +{% highlight sql %} +[ + | - ] { digit [ ... ] . [ digit [ ... ] ] | . digit [ ... ] } +{% endhighlight %} + +and exponent is defined as +{% highlight sql %} +E [ + | - ] digit [ ... ] +{% endhighlight %} + + Parameters + + + digit + +Any numeral from 0 to 9. + + + + D + +Case insensitive, indicates DOUBLE, which is a 8-byte double-precision floating point number. + + + + BD + +Case
[GitHub] [spark] cloud-fan commented on a change in pull request #28237: [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference
cloud-fan commented on a change in pull request #28237: URL: https://github.com/apache/spark/pull/28237#discussion_r413508540 ## File path: docs/sql-ref-literals.md ## @@ -0,0 +1,532 @@ +--- +layout: global +title: Literals +displayTitle: Literals +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +A literal (also known as a constant) represents a fixed data value. Spark SQL supports the following literals: + + * [String Literal](#string-literal) + * [Binary Literal](#binary-literal) + * [Null Literal](#null-literal) + * [Boolean Literal](#boolean-literal) + * [Numeric Literal](#numeric-literal) + * [Datetime Literal](#datetime-literal) + * [Interval Literal](#interval-literal) + +### String Literal + +A string literal is used to specify a character string value. + + Syntax + +{% highlight sql %} +'c [ ... ]' | "c [ ... ]" +{% endhighlight %} + + Parameters + + + c + +One character from the character set. Use \ to escape special characters (e.g., ' or \). + + + + Examples + +{% highlight sql %} +SELECT 'Hello, World!' AS col; ++-+ +| col| ++-+ +|Hello, World!| ++-+ + +SELECT "SPARK SQL" AS col; ++-+ +| col| ++-+ +|Spark SQL| ++-+ + +SELECT 'it\'s $10.' AS col; ++-+ +| col| ++-+ +|It's $10.| ++-+ +{% endhighlight %} + +### Binary Literal + +A binary literal is used to specify a byte sequence value. + + Syntax + +{% highlight sql %} +X { 'c [ ... ]' | "c [ ... ]" } +{% endhighlight %} + + Parameters + + + c + +One character from the character set. + + + + Examples + +{% highlight sql %} +SELECT X'123456' AS col; ++--+ +| col| ++--+ +|[12 34 56]| ++--+ +{% endhighlight %} + +### Null Literal + +A null literal is used to specify a null value. + + Syntax + +{% highlight sql %} +NULL +{% endhighlight %} + + Examples + +{% highlight sql %} +SELECT NULL AS col; +++ +| col| +++ +|NULL| +++ +{% endhighlight %} + +### Boolean Literal + +A boolean literal is used to specify a boolean value. + + Syntax + +{% highlight sql %} +TRUE | FALSE +{% endhighlight %} + + Examples + +{% highlight sql %} +SELECT TRUE AS col; +++ +| col| +++ +|true| +++ +{% endhighlight %} + +### Numeric Literal + +A numeric literal is used to specify a fixed or floating-point number. + + Integral Literal + + Syntax + +{% highlight sql %} +[ + | - ] digit [ ... ] [ L | S | Y ] +{% endhighlight %} + + Parameters + + + digit + +Any numeral from 0 to 9. + + + + L + +Case insensitive, indicates BIGINT, which is a 8-byte signed integer number. + + + + S + +Case insensitive, indicates SMALLINT, which is a 2-byte signed integer number. + + + + Y + +Case insensitive, indicates TINYINT, which is a 1-byte signed integer number. + + + + default (no postfix) + +Indicates a 4-byte signed integer number. + + + + Examples + +{% highlight sql %} +SELECT -2147483648 AS col; ++---+ +|col| ++---+ +|-2147483648| ++---+ + +SELECT 9223372036854775807l AS col; ++---+ +|col| ++---+ +|9223372036854775807| ++---+ + +SELECT -32Y AS col; ++---+ +|col| ++---+ +|-32| ++---+ + +SELECT 482S AS col; ++---+ +|col| ++---+ +|482| ++---+ +{% endhighlight %} + + Fractional Literals + + Syntax + +decimal literals: +{% highlight sql %} +decimal_digits { [ BD ] | [ exponent BD ] } | digit [ ... ] [ exponent ] BD +{% endhighlight %} + +double literals: +{% highlight sql %} +decimal_digits { D | exponent [ D ] } | digit [ ... ] { exponent [ D ] | [ exponent ] D } +{% endhighlight %} + +While decimal_digits is defined as +{% highlight sql %} +[ + | - ] { digit [ ... ] . [ digit [ ... ] ] | . digit [ ... ] } +{% endhighlight %} + +and exponent is defined as +{% highlight sql %} +E [ + | - ] digit [ ... ] +{% endhighlight %} + + Parameters + + + digit + +Any numeral from 0 to 9. + + + + D + +Case insensitive, indicates DOUBLE, which is a 8-byte double-precision floating point number. + + + + BD + +Case
[GitHub] [spark] cloud-fan commented on a change in pull request #28237: [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference
cloud-fan commented on a change in pull request #28237: URL: https://github.com/apache/spark/pull/28237#discussion_r413508685 ## File path: docs/sql-ref-literals.md ## @@ -0,0 +1,532 @@ +--- +layout: global +title: Literals +displayTitle: Literals +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +A literal (also known as a constant) represents a fixed data value. Spark SQL supports the following literals: + + * [String Literal](#string-literal) + * [Binary Literal](#binary-literal) + * [Null Literal](#null-literal) + * [Boolean Literal](#boolean-literal) + * [Numeric Literal](#numeric-literal) + * [Datetime Literal](#datetime-literal) + * [Interval Literal](#interval-literal) + +### String Literal + +A string literal is used to specify a character string value. + + Syntax + +{% highlight sql %} +'c [ ... ]' | "c [ ... ]" +{% endhighlight %} + + Parameters + + + c + +One character from the character set. Use \ to escape special characters (e.g., ' or \). + + + + Examples + +{% highlight sql %} +SELECT 'Hello, World!' AS col; ++-+ +| col| ++-+ +|Hello, World!| ++-+ + +SELECT "SPARK SQL" AS col; ++-+ +| col| ++-+ +|Spark SQL| ++-+ + +SELECT 'it\'s $10.' AS col; ++-+ +| col| ++-+ +|It's $10.| ++-+ +{% endhighlight %} + +### Binary Literal + +A binary literal is used to specify a byte sequence value. + + Syntax + +{% highlight sql %} +X { 'c [ ... ]' | "c [ ... ]" } +{% endhighlight %} + + Parameters + + + c + +One character from the character set. + + + + Examples + +{% highlight sql %} +SELECT X'123456' AS col; ++--+ +| col| ++--+ +|[12 34 56]| ++--+ +{% endhighlight %} + +### Null Literal + +A null literal is used to specify a null value. + + Syntax + +{% highlight sql %} +NULL +{% endhighlight %} + + Examples + +{% highlight sql %} +SELECT NULL AS col; +++ +| col| +++ +|NULL| +++ +{% endhighlight %} + +### Boolean Literal + +A boolean literal is used to specify a boolean value. + + Syntax + +{% highlight sql %} +TRUE | FALSE +{% endhighlight %} + + Examples + +{% highlight sql %} +SELECT TRUE AS col; +++ +| col| +++ +|true| +++ +{% endhighlight %} + +### Numeric Literal + +A numeric literal is used to specify a fixed or floating-point number. + + Integral Literal + + Syntax + +{% highlight sql %} +[ + | - ] digit [ ... ] [ L | S | Y ] +{% endhighlight %} + + Parameters + + + digit + +Any numeral from 0 to 9. + + + + L + +Case insensitive, indicates BIGINT, which is a 8-byte signed integer number. + + + + S + +Case insensitive, indicates SMALLINT, which is a 2-byte signed integer number. + + + + Y + +Case insensitive, indicates TINYINT, which is a 1-byte signed integer number. + + + + default (no postfix) + +Indicates a 4-byte signed integer number. + + + + Examples + +{% highlight sql %} +SELECT -2147483648 AS col; ++---+ +|col| ++---+ +|-2147483648| ++---+ + +SELECT 9223372036854775807l AS col; ++---+ +|col| ++---+ +|9223372036854775807| ++---+ + +SELECT -32Y AS col; ++---+ +|col| ++---+ +|-32| ++---+ + +SELECT 482S AS col; ++---+ +|col| ++---+ +|482| ++---+ +{% endhighlight %} + + Fractional Literals + + Syntax + +decimal literals: +{% highlight sql %} +decimal_digits { [ BD ] | [ exponent BD ] } | digit [ ... ] [ exponent ] BD +{% endhighlight %} + +double literals: +{% highlight sql %} +decimal_digits { D | exponent [ D ] } | digit [ ... ] { exponent [ D ] | [ exponent ] D } +{% endhighlight %} + +While decimal_digits is defined as +{% highlight sql %} +[ + | - ] { digit [ ... ] . [ digit [ ... ] ] | . digit [ ... ] } +{% endhighlight %} + +and exponent is defined as +{% highlight sql %} +E [ + | - ] digit [ ... ] +{% endhighlight %} + + Parameters + + + digit + +Any numeral from 0 to 9. + + + + D + +Case insensitive, indicates DOUBLE, which is a 8-byte double-precision floating point number. + + + + BD + +Case
[GitHub] [spark] viirya commented on a change in pull request #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.
viirya commented on a change in pull request #27207: URL: https://github.com/apache/spark/pull/27207#discussion_r413506421 ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ## @@ -319,20 +336,38 @@ private[spark] class TaskSchedulerImpl( taskSetsByStageIdAndAttempt -= manager.taskSet.stageId } } +resetOnPreviousOffer -= manager.taskSet manager.parent.removeSchedulable(manager) logInfo(s"Removed TaskSet ${manager.taskSet.id}, whose tasks have all completed, from pool" + s" ${manager.parent.name}") } + /** + * Offers resources to a single [[TaskSetManager]] at a given max allowed [[TaskLocality]]. + * + * @param taskSet task set manager to offer resources to + * @param maxLocality max locality to allow when scheduling + * @param shuffledOffers shuffled resource offers to use for scheduling, + * remaining resources are tracked by below fields as tasks are scheduled + * @param availableCpus remaining cpus per offer, + * value at index 'i' corresponds to shuffledOffers[i] + * @param availableResources remaining resources per offer, + * value at index 'i' corresponds to shuffledOffers[i] + * @param tasks tasks scheduled per offer, value at index 'i' corresponds to shuffledOffers[i] + * @param addressesWithDescs tasks scheduler per host:port, used for barrier tasks + * @return tuple of (had delay schedule rejects?, option of min locality of launched task) Review comment: If returning true, I think it means no delay schedule rejects, not had delay schedule rejects. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.
viirya commented on a change in pull request #27207: URL: https://github.com/apache/spark/pull/27207#discussion_r413503891 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -543,6 +543,16 @@ package object config { .version("1.2.0") .fallbackConf(DYN_ALLOCATION_SCHEDULER_BACKLOG_TIMEOUT) + private[spark] val LEGACY_LOCALITY_WAIT_RESET = +ConfigBuilder("spark.locality.wait.legacyResetOnTaskLaunch") +.doc("Whether to use the legacy behavior of locality wait, which resets the delay timer " + + "anytime a task is scheduled. See Delay Scheduling section of TaskSchedulerImpl's class " + + "documentation for more details.") +.internal() +.version("3.0.0") Review comment: I think this was not merged into 3.0 branch, right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28264: [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values
AmplabJenkins commented on issue #28264: URL: https://github.com/apache/spark/pull/28264#issuecomment-618173352 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28264: [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values
AmplabJenkins removed a comment on issue #28264: URL: https://github.com/apache/spark/pull/28264#issuecomment-618173352 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28264: [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values
SparkQA removed a comment on issue #28264: URL: https://github.com/apache/spark/pull/28264#issuecomment-618169835 **[Test build #121649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121649/testReport)** for PR 28264 at commit [`ef7611e`](https://github.com/apache/spark/commit/ef7611e870a3ee3069bccfb0804072eb185a5b34). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28264: [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values
SparkQA commented on issue #28264: URL: https://github.com/apache/spark/pull/28264#issuecomment-618173252 **[Test build #121649 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121649/testReport)** for PR 28264 at commit [`ef7611e`](https://github.com/apache/spark/commit/ef7611e870a3ee3069bccfb0804072eb185a5b34). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #28295: [SPARK-31508][SQL] we need convert the type to double type if one is…
HyukjinKwon commented on issue #28295: URL: https://github.com/apache/spark/pull/28295#issuecomment-618171808 Closing as a duplicate of https://github.com/apache/spark/pull/27150. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #28305: [SPARK-31474][SQL][FOLLOWUP] Replace _FUNC_ placeholder with functionname in the note field of expression info
HyukjinKwon commented on issue #28305: URL: https://github.com/apache/spark/pull/28305#issuecomment-618171396 Documentation build passed in the Github Actions. I am going to merge this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #28305: [SPARK-31474][SQL][FOLLOWUP] Replace _FUNC_ placeholder with functionname in the note field of expression info
HyukjinKwon commented on issue #28305: URL: https://github.com/apache/spark/pull/28305#issuecomment-618171468 Merged to master and branch-3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28264: [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values
AmplabJenkins removed a comment on issue #28264: URL: https://github.com/apache/spark/pull/28264#issuecomment-618170176 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28264: [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values
AmplabJenkins commented on issue #28264: URL: https://github.com/apache/spark/pull/28264#issuecomment-618170176 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28264: [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values
SparkQA commented on issue #28264: URL: https://github.com/apache/spark/pull/28264#issuecomment-618169835 **[Test build #121649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121649/testReport)** for PR 28264 at commit [`ef7611e`](https://github.com/apache/spark/commit/ef7611e870a3ee3069bccfb0804072eb185a5b34). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28300: [MINOR][SQL] Add comments for filters values and return values of Row.get()/apply()
cloud-fan commented on issue #28300: URL: https://github.com/apache/spark/pull/28300#issuecomment-618168987 thanks, merging to master/3.0! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28303: [SPARK-31495][SQL][FOLLOW-UP][3.0] Fix test failure of explain-aqe.sql
cloud-fan commented on a change in pull request #28303: URL: https://github.com/apache/spark/pull/28303#discussion_r413496559 ## File path: sql/core/src/test/resources/sql-tests/results/explain-aqe.sql.out ## @@ -314,7 +314,7 @@ Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)) Left keys [1]: [key#x] Right keys [1]: [key#x] Join condition: None - Review comment: does it mean the master branch outputs some extra spaces in EXPLAIN FORMATTED? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28302: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
AmplabJenkins removed a comment on issue #28302: URL: https://github.com/apache/spark/pull/28302#issuecomment-618164998 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121644/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28302: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
AmplabJenkins removed a comment on issue #28302: URL: https://github.com/apache/spark/pull/28302#issuecomment-618164990 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28302: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
SparkQA removed a comment on issue #28302: URL: https://github.com/apache/spark/pull/28302#issuecomment-618137351 **[Test build #121644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121644/testReport)** for PR 28302 at commit [`5c15a98`](https://github.com/apache/spark/commit/5c15a98270c428b0d9e7bacf553162d650a887b3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28302: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
AmplabJenkins commented on issue #28302: URL: https://github.com/apache/spark/pull/28302#issuecomment-618164990 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28302: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
SparkQA commented on issue #28302: URL: https://github.com/apache/spark/pull/28302#issuecomment-618164856 **[Test build #121644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121644/testReport)** for PR 28302 at commit [`5c15a98`](https://github.com/apache/spark/commit/5c15a98270c428b0d9e7bacf553162d650a887b3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27861: [SPARK-30707][SQL]Window function set partitionSpec as order spec when orderSpec is empty
HyukjinKwon commented on a change in pull request #27861: URL: https://github.com/apache/spark/pull/27861#discussion_r413491709 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1691,7 +1691,19 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging override def visitWindowDef(ctx: WindowDefContext): WindowSpecDefinition = withOrigin(ctx) { // CLUSTER BY ... | PARTITION BY ... ORDER BY ... val partition = ctx.partition.asScala.map(expression) -val order = ctx.sortItem.asScala.map(visitSortItem) +val order = if (ctx.sortItem.asScala.nonEmpty) { + ctx.sortItem.asScala.map(visitSortItem) +} else if (ctx.windowFrame != null && + ctx.windowFrame().frameType.getType == SqlBaseParser.RANGE) { + // for RANGE window frame, we won't add default order spec + ctx.sortItem.asScala.map(visitSortItem) +} else { + // Same default behaviors like hive, when order spec is null + // set partition spec expression as order spec + ctx.partition.asScala.map { expr => +SortOrder(expression(expr), Ascending, Ascending.defaultNullOrdering, Set.empty) Review comment: I think we should not fix it because Spark side at least the results will be non-deterministic. I doubt if this is good to add this support only because of compatibility with other DMBSes when the output is expected to be useless. Maybe disallowing it might be a better idea than finding another problem later caused by the different and indeterministic data. Do you maybe know other cases from other distributed DBMSs such as presto? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28305: [SPARK-31474][SQL][FOLLOWUP] Replace _FUNC_ placeholder with functionname in the note field of expression info
AmplabJenkins commented on issue #28305: URL: https://github.com/apache/spark/pull/28305#issuecomment-618157415 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28305: [SPARK-31474][SQL][FOLLOWUP] Replace _FUNC_ placeholder with functionname in the note field of expression info
AmplabJenkins removed a comment on issue #28305: URL: https://github.com/apache/spark/pull/28305#issuecomment-618157415 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28305: [SPARK-31474][SQL][FOLLOWUP] Replace _FUNC_ placeholder with functionname in the note field of expression info
SparkQA commented on issue #28305: URL: https://github.com/apache/spark/pull/28305#issuecomment-618157211 **[Test build #121648 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121648/testReport)** for PR 28305 at commit [`2a3d5ce`](https://github.com/apache/spark/commit/2a3d5cef9ce2b4a9c4ff6134d1a3ca4350654aa0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn opened a new pull request #28305: [SPARK-31474][SQL][FOLLOWUP] Replace _FUNC_ placeholder with functionname in the note field of expression info
yaooqinn opened a new pull request #28305: URL: https://github.com/apache/spark/pull/28305 ### What changes were proposed in this pull request? _FUNC_ is used in note() of `ExpressionDescription` since https://github.com/apache/spark/pull/28248, it can be more cases later, we should replace it with function name for documentation ### Why are the changes needed? doc fix ### Does this PR introduce any user-facing change? no ### How was this patch tested? pass Jenkins, and verify locally with Jekyll serve This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
AmplabJenkins removed a comment on issue #27557: URL: https://github.com/apache/spark/pull/27557#issuecomment-618153699 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
AmplabJenkins commented on issue #27557: URL: https://github.com/apache/spark/pull/27557#issuecomment-618153699 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
SparkQA commented on issue #27557: URL: https://github.com/apache/spark/pull/27557#issuecomment-618153414 **[Test build #121647 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121647/testReport)** for PR 27557 at commit [`648f0dc`](https://github.com/apache/spark/commit/648f0dc2b5b2b53e2c62641ea0da0e04a5ffec0b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on issue #28303: [SPARK-31495][SQL][FOLLOW-UP][3.0] Fix test failure of explain-aqe.sql
Ngone51 commented on issue #28303: URL: https://github.com/apache/spark/pull/28303#issuecomment-618153067 thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #27861: [SPARK-30707][SQL]Window function set partitionSpec as order spec when orderSpec is empty
AngersZh commented on a change in pull request #27861: URL: https://github.com/apache/spark/pull/27861#discussion_r413477419 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1691,7 +1691,19 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging override def visitWindowDef(ctx: WindowDefContext): WindowSpecDefinition = withOrigin(ctx) { // CLUSTER BY ... | PARTITION BY ... ORDER BY ... val partition = ctx.partition.asScala.map(expression) -val order = ctx.sortItem.asScala.map(visitSortItem) +val order = if (ctx.sortItem.asScala.nonEmpty) { + ctx.sortItem.asScala.map(visitSortItem) +} else if (ctx.windowFrame != null && + ctx.windowFrame().frameType.getType == SqlBaseParser.RANGE) { + // for RANGE window frame, we won't add default order spec + ctx.sortItem.asScala.map(visitSortItem) +} else { + // Same default behaviors like hive, when order spec is null + // set partition spec expression as order spec + ctx.partition.asScala.map { expr => +SortOrder(expression(expr), Ascending, Ascending.defaultNullOrdering, Set.empty) Review comment: > deterministic For same sql, result is deterministic. And we add partition column as order by column by default can keep result deterministic. I meet this problem when migration hive sql to spark sql. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28301: [SPARK-31521][CORE] Correct the fetch size when merging blocks into a merged block
dongjoon-hyun commented on a change in pull request #28301: URL: https://github.com/apache/spark/pull/28301#discussion_r413475681 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -414,21 +414,23 @@ final class ShuffleBlockFetcherIterator( def shouldMergeIntoPreviousBatchBlockId = mergedBlockInfo.last.blockId.asInstanceOf[ShuffleBlockBatchId].mapId == startBlockId.mapId -val startReduceId = if (mergedBlockInfo.nonEmpty && shouldMergeIntoPreviousBatchBlockId) { - // Remove the previous batch block id as we will add a new one to replace it. - mergedBlockInfo.remove(mergedBlockInfo.length - 1).blockId -.asInstanceOf[ShuffleBlockBatchId].startReduceId -} else { - startBlockId.reduceId -} +val (startReduceId, size) = + if (mergedBlockInfo.nonEmpty && shouldMergeIntoPreviousBatchBlockId) { +// Remove the previous batch block id as we will add a new one to replace it. +val removed = mergedBlockInfo.remove(mergedBlockInfo.length - 1) + (removed.blockId.asInstanceOf[ShuffleBlockBatchId].startReduceId, +removed.size + toBeMerged.map(_.size).sum) + } else { +(startBlockId.reduceId, toBeMerged.map(_.size).sum) + } Review comment: Thank you, @Ngone51 ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #28293: [SPARK-31518][CORE] Expose filterByRange in JavaPairRDD
dongjoon-hyun commented on issue #28293: URL: https://github.com/apache/spark/pull/28293#issuecomment-618149931 @wetneb . You are added to Apache Spark contributor group and SPARK-31518 is assigned to you. Thank you again, @wetneb ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28304: [SPARK-31523][SQL] LogicalPlan doCanonicalize should throw exception if not resolved
AmplabJenkins removed a comment on issue #28304: URL: https://github.com/apache/spark/pull/28304#issuecomment-618149114 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121646/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28304: [SPARK-31523][SQL] LogicalPlan doCanonicalize should throw exception if not resolved
AmplabJenkins removed a comment on issue #28304: URL: https://github.com/apache/spark/pull/28304#issuecomment-618149106 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28304: [SPARK-31523][SQL] LogicalPlan doCanonicalize should throw exception if not resolved
SparkQA removed a comment on issue #28304: URL: https://github.com/apache/spark/pull/28304#issuecomment-618143562 **[Test build #121646 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121646/testReport)** for PR 28304 at commit [`49b27e1`](https://github.com/apache/spark/commit/49b27e1bc150dfb8a356bc544b7cd247aa1513cc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28304: [SPARK-31523][SQL] LogicalPlan doCanonicalize should throw exception if not resolved
SparkQA commented on issue #28304: URL: https://github.com/apache/spark/pull/28304#issuecomment-618149079 **[Test build #121646 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121646/testReport)** for PR 28304 at commit [`49b27e1`](https://github.com/apache/spark/commit/49b27e1bc150dfb8a356bc544b7cd247aa1513cc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28304: [SPARK-31523][SQL] LogicalPlan doCanonicalize should throw exception if not resolved
AmplabJenkins commented on issue #28304: URL: https://github.com/apache/spark/pull/28304#issuecomment-618149106 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build
dongjoon-hyun commented on issue #28285: URL: https://github.com/apache/spark/pull/28285#issuecomment-618147860 Thank you! Ya. It was really weird. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #28264: [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values
maropu commented on issue #28264: URL: https://github.com/apache/spark/pull/28264#issuecomment-618147127 Could you update the screenshot, too? Looks fine except for the existing comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #28304: [SPARK-31523][SQL] LogicalPlan doCanonicalize should throw exception if not resolved
ulysses-you commented on a change in pull request #28304: URL: https://github.com/apache/spark/pull/28304#discussion_r413470723 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ## @@ -40,6 +40,13 @@ abstract class LogicalPlan super.verboseString(maxFields) + statsCache.map(", " + _.toString).getOrElse("") } + override protected def doCanonicalize(): LogicalPlan = { +if (!resolved) { Review comment: Considered use assert, but it can be happen in a special way. e.g. ``` spark.sql("select id, count(*) from t1 group by id limit 1").queryExecution.logical.canonicalized ``` So throwing maybe better. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28264: [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values
maropu commented on a change in pull request #28264: URL: https://github.com/apache/spark/pull/28264#discussion_r413469950 ## File path: docs/sql-ref-datatypes.md ## @@ -706,3 +708,61 @@ The following table shows the type names as well as aliases used in Spark SQL pa + +### Floating Point Special Values + +Spark SQL supports several special floating point values in a case-insensitive manner: + + * Inf/+Inf/Infinity/+Infinity: positive infinity + * ```FloatType```: 1.0f / 0.0f, which is equal to the value returned by java.lang.Float.intBitsToFloat(0x7f80). + * ```DoubleType```: 1.0 / 0.0, which is equal to the value returned by java.lang.Double.longBitsToDouble(0x7ff0L). + * -Inf/-Infinity: negative infinity + * ```FloatType```: -1.0f / 0.0f, which is equal to the value returned by java.lang.Float.intBitsToFloat(0xff80). + * ```DoubleType```: -1.0 / 0.0, which is equal to the value returned by java.lang.Double.longBitsToDouble(0xfff0L). + * NaN: not a number + * ```FloatType```: 0.0f / 0.0f, which is equivalent to the value returned by java.lang.Float.intBitsToFloat(0x7fc0). + * ```DoubleType```: 0.0d / 0.0, which is equivalent to the value returned by java.lang.Double.longBitsToDouble(0x7ff8L). + + Examples + +{% highlight sql %} +SELECT double('infinity'); +++ +|CAST(infinity AS DOUBLE)| +++ +|Infinity| +++ + +SELECT float('-inf'); ++---+ +|CAST(-inf AS FLOAT)| ++---+ +| -Infinity| ++---+ + +SELECT float('NaN'); ++--+ +|CAST(NaN AS FLOAT)| ++--+ +| NaN| ++--+ +{% endhighlight %} + +### -Infinity/Infinity Semantics + Review comment: Could you leave a short description about what this section is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei commented on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task
turboFei commented on issue #26339: URL: https://github.com/apache/spark/pull/26339#issuecomment-618145797 gentle ping @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28304: [SPARK-31523][SQL] LogicalPlan doCanonicalize should throw exception if not resolved
maropu commented on a change in pull request #28304: URL: https://github.com/apache/spark/pull/28304#discussion_r413469121 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ## @@ -40,6 +40,13 @@ abstract class LogicalPlan super.verboseString(maxFields) + statsCache.map(", " + _.toString).getOrElse("") } + override protected def doCanonicalize(): LogicalPlan = { +if (!resolved) { Review comment: Is it an assert rather than throwing an exception? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
HeartSaVioR commented on a change in pull request #27557: URL: https://github.com/apache/spark/pull/27557#discussion_r413469060 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala ## @@ -268,6 +288,7 @@ abstract class CompactibleFileStreamLog[T <: AnyRef : ClassTag]( object CompactibleFileStreamLog { val COMPACT_FILE_SUFFIX = ".compact" + val COMPACT_LATENCY_WARN_THRESHOLD_MS = 2000 Review comment: Yeah it's a heuristic - I think a batch spending more than 2 seconds only for compacting metadata should be noticed to the end users, as the latency here is opaque to end user if we don't log it and they will be questioning. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27861: [SPARK-30707][SQL]Window function set partitionSpec as order spec when orderSpec is empty
HyukjinKwon commented on a change in pull request #27861: URL: https://github.com/apache/spark/pull/27861#discussion_r413468819 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1691,7 +1691,19 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging override def visitWindowDef(ctx: WindowDefContext): WindowSpecDefinition = withOrigin(ctx) { // CLUSTER BY ... | PARTITION BY ... ORDER BY ... val partition = ctx.partition.asScala.map(expression) -val order = ctx.sortItem.asScala.map(visitSortItem) +val order = if (ctx.sortItem.asScala.nonEmpty) { + ctx.sortItem.asScala.map(visitSortItem) +} else if (ctx.windowFrame != null && + ctx.windowFrame().frameType.getType == SqlBaseParser.RANGE) { + // for RANGE window frame, we won't add default order spec + ctx.sortItem.asScala.map(visitSortItem) +} else { + // Same default behaviors like hive, when order spec is null + // set partition spec expression as order spec + ctx.partition.asScala.map { expr => +SortOrder(expression(expr), Ascending, Ascending.defaultNullOrdering, Set.empty) Review comment: I guess because PostgreSQL can keep the natural order. Spark can't keep the natural order. Is PostgreSQL result deterministic? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28304: [SPARK-31523][SQL] LogicalPlan doCanonicalize should throw exception if not resolved
AmplabJenkins commented on issue #28304: URL: https://github.com/apache/spark/pull/28304#issuecomment-618143947 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28304: [SPARK-31523][SQL] LogicalPlan doCanonicalize should throw exception if not resolved
AmplabJenkins removed a comment on issue #28304: URL: https://github.com/apache/spark/pull/28304#issuecomment-618143947 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28304: [SPARK-31523][SQL] LogicalPlan doCanonicalize should throw exception if not resolved
SparkQA commented on issue #28304: URL: https://github.com/apache/spark/pull/28304#issuecomment-618143562 **[Test build #121646 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121646/testReport)** for PR 28304 at commit [`49b27e1`](https://github.com/apache/spark/commit/49b27e1bc150dfb8a356bc544b7cd247aa1513cc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
HeartSaVioR commented on a change in pull request #27557: URL: https://github.com/apache/spark/pull/27557#discussion_r413466284 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala ## @@ -177,16 +178,35 @@ abstract class CompactibleFileStreamLog[T <: AnyRef : ClassTag]( * corresponding `batchId` file. It will delete expired files as well if enabled. */ private def compact(batchId: Long, logs: Array[T]): Boolean = { -val validBatches = getValidBatchesBeforeCompactionBatch(batchId, compactInterval) -val allLogs = validBatches.flatMap { id => - super.get(id).getOrElse { -throw new IllegalStateException( - s"${batchIdToPath(id)} doesn't exist when compacting batch $batchId " + -s"(compactInterval: $compactInterval)") - } -} ++ logs +val (allLogs, loadElapsedMs) = Utils.timeTakenMs { + val validBatches = getValidBatchesBeforeCompactionBatch(batchId, compactInterval) + validBatches.flatMap { id => +super.get(id).getOrElse { + throw new IllegalStateException( +s"${batchIdToPath(id)} doesn't exist when compacting batch $batchId " + + s"(compactInterval: $compactInterval)") +} + } ++ logs +} +val compactedLogs = compactLogs(allLogs) + // Return false as there is another writer. -super.add(batchId, compactLogs(allLogs).toArray) +val (writeSucceed, writeElapsedMs) = Utils.timeTakenMs { + super.add(batchId, compactedLogs.toArray) +} + +val elapsedMs = loadElapsedMs + writeElapsedMs +if (elapsedMs >= COMPACT_LATENCY_WARN_THRESHOLD_MS) { + logWarning(s"Compacting took $elapsedMs ms (load: $loadElapsedMs ms," + +s" write: $writeElapsedMs ms) for compact batch $batchId") + logWarning(s"Loaded ${allLogs.size} entries (${SizeEstimator.estimate(allLogs)} bytes in " + Review comment: Yes that sounds better. I'll add "(estimated)" after "bytes". Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you opened a new pull request #28304: [SPARK-31523][SQL] LogicalPlan doCanonicalize should throw exception if not resolved
ulysses-you opened a new pull request #28304: URL: https://github.com/apache/spark/pull/28304 ### What changes were proposed in this pull request? Throw AnalysisException if LogicalPlan not resolved. ### Why are the changes needed? It's no meaning to canonicalize unresolved plan. For fast fail. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Jenkins test pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #27861: [SPARK-30707][SQL]Window function set partitionSpec as order spec when orderSpec is empty
AngersZh commented on a change in pull request #27861: URL: https://github.com/apache/spark/pull/27861#discussion_r413465498 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1691,7 +1691,19 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging override def visitWindowDef(ctx: WindowDefContext): WindowSpecDefinition = withOrigin(ctx) { // CLUSTER BY ... | PARTITION BY ... ORDER BY ... val partition = ctx.partition.asScala.map(expression) -val order = ctx.sortItem.asScala.map(visitSortItem) +val order = if (ctx.sortItem.asScala.nonEmpty) { + ctx.sortItem.asScala.map(visitSortItem) +} else if (ctx.windowFrame != null && + ctx.windowFrame().frameType.getType == SqlBaseParser.RANGE) { + // for RANGE window frame, we won't add default order spec + ctx.sortItem.asScala.map(visitSortItem) +} else { + // Same default behaviors like hive, when order spec is null + // set partition spec expression as order spec + ctx.partition.asScala.map { expr => +SortOrder(expression(expr), Ascending, Ascending.defaultNullOrdering, Set.empty) Review comment: > But the results will be useless. When can it be useful if the order is indeterministic for the functions dependent on the order .. ? In postgre sql , if we don't specify order column, the result is according to partition column 's default sort order. ``` angerszhu=# explain analyze verbose select id, num, lead(id) over (partition by num) from s4; QUERY PLAN --- WindowAgg (cost=158.51..198.06 rows=2260 width=12) (actual time=0.107..0.122 rows=6 loops=1) Output: id, num, lead(id) OVER (?) -> Sort (cost=158.51..164.16 rows=2260 width=8) (actual time=0.079..0.081 rows=6 loops=1) Output: num, id Sort Key: s4.num Sort Method: quicksort Memory: 25kB -> Seq Scan on public.s4 (cost=0.00..32.60 rows=2260 width=8) (actual time=0.057..0.061 rows=6 loops=1) Output: num, id Planning Time: 0.114 ms Execution Time: 0.214 ms angerszhu=# explain analyze verbose select id, num, lead(id) over (partition by num order by id) from s4; QUERY PLAN --- WindowAgg (cost=158.51..203.71 rows=2260 width=12) (actual time=0.976..1.017 rows=6 loops=1) Output: id, num, lead(id) OVER (?) -> Sort (cost=158.51..164.16 rows=2260 width=8) (actual time=0.067..0.070 rows=6 loops=1) Output: id, num Sort Key: s4.num, s4.id Sort Method: quicksort Memory: 25kB -> Seq Scan on public.s4 (cost=0.00..32.60 rows=2260 width=8) (actual time=0.042..0.045 rows=6 loops=1) Output: id, num Planning Time: 0.155 ms Execution Time: 1.208 ms (10 rows) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #28257: [SPARK-31485][CORE] Avoid application hang if only partial barrier tasks launched
mridulm commented on a change in pull request #28257: URL: https://github.com/apache/spark/pull/28257#discussion_r413463015 ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ## @@ -468,8 +466,9 @@ private[spark] class TaskSchedulerImpl( resourceProfileIds: Array[Int], availableCpus: Array[Int], availableResources: Array[Map[String, Buffer[String]]], - rpId: Int): Int = { -val resourceProfile = sc.resourceProfileManager.resourceProfileFromId(rpId) + taskSet: TaskSetManager): Int = { +val resourceProfile = sc.resourceProfileManager.resourceProfileFromId( + taskSet.taskSet.resourceProfileId) val offersForResourceProfile = resourceProfileIds.zipWithIndex.filter { case (id, _) => Review comment: Ah ! Yes, I knew I missed something :-) Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #28257: [SPARK-31485][CORE] Avoid application hang if only partial barrier tasks launched
mridulm commented on a change in pull request #28257: URL: https://github.com/apache/spark/pull/28257#discussion_r413462366 ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ## @@ -741,8 +750,12 @@ private[spark] class TaskSchedulerImpl( if (state == TaskState.LOST) { // TaskState.LOST is only used by the deprecated Mesos fine-grained scheduling mode, // where each executor corresponds to a single task, so mark the executor as failed. - val execId = taskIdToExecutorId.getOrElse(tid, throw new IllegalStateException( -"taskIdToTaskSetManager.contains(tid) <=> taskIdToExecutorId.contains(tid)")) + val execId = taskIdToExecutorId.getOrElse(tid, { +val errorMsg = + "taskIdToTaskSetManager.contains(tid) <=> taskIdToExecutorId.contains(tid)" +taskSet.abort(errorMsg) +throw new SparkException(errorMsg) Review comment: The exception change is fine here ? +CC @tgravescs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28257: [SPARK-31485][CORE] Avoid application hang if only partial barrier tasks launched
AmplabJenkins commented on issue #28257: URL: https://github.com/apache/spark/pull/28257#issuecomment-618139740 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28257: [SPARK-31485][CORE] Avoid application hang if only partial barrier tasks launched
AmplabJenkins removed a comment on issue #28257: URL: https://github.com/apache/spark/pull/28257#issuecomment-618139740 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #28257: [SPARK-31485][CORE] Avoid application hang if only partial barrier tasks launched
Ngone51 commented on a change in pull request #28257: URL: https://github.com/apache/spark/pull/28257#discussion_r413462299 ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ## @@ -468,8 +466,9 @@ private[spark] class TaskSchedulerImpl( resourceProfileIds: Array[Int], availableCpus: Array[Int], availableResources: Array[Map[String, Buffer[String]]], - rpId: Int): Int = { -val resourceProfile = sc.resourceProfileManager.resourceProfileFromId(rpId) + taskSet: TaskSetManager): Int = { +val resourceProfile = sc.resourceProfileManager.resourceProfileFromId( + taskSet.taskSet.resourceProfileId) val offersForResourceProfile = resourceProfileIds.zipWithIndex.filter { case (id, _) => Review comment: We need `taskSet: TaskSetManager` now because we'll use it to abort the task set below. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28257: [SPARK-31485][CORE] Avoid application hang if only partial barrier tasks launched
SparkQA commented on issue #28257: URL: https://github.com/apache/spark/pull/28257#issuecomment-618139303 **[Test build #121645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121645/testReport)** for PR 28257 at commit [`6495a9a`](https://github.com/apache/spark/commit/6495a9a31c3076540e791e7f2652452407df28c2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #28257: [SPARK-31485][CORE] Avoid application hang if only partial barrier tasks launched
mridulm commented on a change in pull request #28257: URL: https://github.com/apache/spark/pull/28257#discussion_r413461520 ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ## @@ -468,8 +466,9 @@ private[spark] class TaskSchedulerImpl( resourceProfileIds: Array[Int], availableCpus: Array[Int], availableResources: Array[Map[String, Buffer[String]]], - rpId: Int): Int = { -val resourceProfile = sc.resourceProfileManager.resourceProfileFromId(rpId) + taskSet: TaskSetManager): Int = { +val resourceProfile = sc.resourceProfileManager.resourceProfileFromId( + taskSet.taskSet.resourceProfileId) val offersForResourceProfile = resourceProfileIds.zipWithIndex.filter { case (id, _) => Review comment: True, but I was trying to make sense of whether it was relevant to the fix or not. Looks like an unrelated cleanup This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28302: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
AmplabJenkins removed a comment on issue #28302: URL: https://github.com/apache/spark/pull/28302#issuecomment-618137618 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27861: [SPARK-30707][SQL]Window function set partitionSpec as order spec when orderSpec is empty
HyukjinKwon commented on a change in pull request #27861: URL: https://github.com/apache/spark/pull/27861#discussion_r413460172 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1691,7 +1691,19 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging override def visitWindowDef(ctx: WindowDefContext): WindowSpecDefinition = withOrigin(ctx) { // CLUSTER BY ... | PARTITION BY ... ORDER BY ... val partition = ctx.partition.asScala.map(expression) -val order = ctx.sortItem.asScala.map(visitSortItem) +val order = if (ctx.sortItem.asScala.nonEmpty) { + ctx.sortItem.asScala.map(visitSortItem) +} else if (ctx.windowFrame != null && + ctx.windowFrame().frameType.getType == SqlBaseParser.RANGE) { + // for RANGE window frame, we won't add default order spec + ctx.sortItem.asScala.map(visitSortItem) +} else { + // Same default behaviors like hive, when order spec is null + // set partition spec expression as order spec + ctx.partition.asScala.map { expr => +SortOrder(expression(expr), Ascending, Ascending.defaultNullOrdering, Set.empty) Review comment: But the results will be useless. When can it be useful if the order is indeterministic for the functions dependent on the order .. ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28302: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
AmplabJenkins commented on issue #28302: URL: https://github.com/apache/spark/pull/28302#issuecomment-618137618 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28302: [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
SparkQA commented on issue #28302: URL: https://github.com/apache/spark/pull/28302#issuecomment-618137351 **[Test build #121644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121644/testReport)** for PR 28302 at commit [`5c15a98`](https://github.com/apache/spark/commit/5c15a98270c428b0d9e7bacf553162d650a887b3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27861: [SPARK-30707][SQL]Window function set partitionSpec as order spec when orderSpec is empty
HyukjinKwon commented on a change in pull request #27861: URL: https://github.com/apache/spark/pull/27861#discussion_r413459621 ## File path: sql/core/src/test/resources/sql-tests/results/postgreSQL/window_part1.sql.out ## @@ -422,7 +421,7 @@ struct -- !query SELECT count(*) OVER (PARTITION BY four) FROM (SELECT * FROM tenk1 WHERE FALSE)s Review comment: Okay, now I completely got what you're trying to do it. You do want _window functions_ to work without specifying the ordering, and non-window functions already work without specifying ordering (because the results will be deterministic anyway). Yes, -1 for the same comment from @hvanhovell. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28292: [SPARK-31516][DOC] Fix non-existed metric hiveClientCalls.count of CodeGenerator in DOC
AmplabJenkins removed a comment on issue #28292: URL: https://github.com/apache/spark/pull/28292#issuecomment-617641723 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #27861: [SPARK-30707][SQL]Window function set partitionSpec as order spec when orderSpec is empty
AngersZh commented on a change in pull request #27861: URL: https://github.com/apache/spark/pull/27861#discussion_r413458776 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1691,7 +1691,19 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging override def visitWindowDef(ctx: WindowDefContext): WindowSpecDefinition = withOrigin(ctx) { // CLUSTER BY ... | PARTITION BY ... ORDER BY ... val partition = ctx.partition.asScala.map(expression) -val order = ctx.sortItem.asScala.map(visitSortItem) +val order = if (ctx.sortItem.asScala.nonEmpty) { + ctx.sortItem.asScala.map(visitSortItem) +} else if (ctx.windowFrame != null && + ctx.windowFrame().frameType.getType == SqlBaseParser.RANGE) { + // for RANGE window frame, we won't add default order spec + ctx.sortItem.asScala.map(visitSortItem) +} else { + // Same default behaviors like hive, when order spec is null + // set partition spec expression as order spec + ctx.partition.asScala.map { expr => +SortOrder(expression(expr), Ascending, Ascending.defaultNullOrdering, Set.empty) Review comment: > Wait .. why do we set the ordering column as partition column? We should just leave it unspecified so only (non-window) aggregation functions work together with unbounded windows so it doesn't get affected by the order. This is what Scala API does. e, hive doing like this...for me, when user not set order by clause, means he don't care about result order. For Range DataFrame we can't support this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wezhang commented on issue #28292: [SPARK-31516][DOC] Fix non-existed metric hiveClientCalls.count of CodeGenerator in DOC
wezhang commented on issue #28292: URL: https://github.com/apache/spark/pull/28292#issuecomment-618136677 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27861: [SPARK-30707][SQL]Window function set partitionSpec as order spec when orderSpec is empty
HyukjinKwon commented on a change in pull request #27861: URL: https://github.com/apache/spark/pull/27861#discussion_r413456806 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1691,7 +1691,19 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging override def visitWindowDef(ctx: WindowDefContext): WindowSpecDefinition = withOrigin(ctx) { // CLUSTER BY ... | PARTITION BY ... ORDER BY ... val partition = ctx.partition.asScala.map(expression) -val order = ctx.sortItem.asScala.map(visitSortItem) +val order = if (ctx.sortItem.asScala.nonEmpty) { + ctx.sortItem.asScala.map(visitSortItem) +} else if (ctx.windowFrame != null && + ctx.windowFrame().frameType.getType == SqlBaseParser.RANGE) { + // for RANGE window frame, we won't add default order spec + ctx.sortItem.asScala.map(visitSortItem) +} else { + // Same default behaviors like hive, when order spec is null + // set partition spec expression as order spec + ctx.partition.asScala.map { expr => +SortOrder(expression(expr), Ascending, Ascending.defaultNullOrdering, Set.empty) Review comment: Wait .. why do we set the ordering column as partition column? We should just leave it unspecified so only (non-window) aggregation functions work together with unbounded windows so it doesn't get affected by the order. This is what Scala API does. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon removed a comment on issue #27861: [SPARK-30707][SQL]Window function set partitionSpec as order spec when orderSpec is empty
HyukjinKwon removed a comment on issue #27861: URL: https://github.com/apache/spark/pull/27861#issuecomment-618128508 I think he wants to use the partition column as the ordering column implicitly, instead of not specifying it - https://github.com/apache/spark/pull/27861/files#diff-9847f5cef7cf7fbc5830fbc6b779ee10R1702. It wouldn't work if the ordering is not specified per https://github.com/apache/spark/blob/a28ed86a387b286745b30cd4d90b3d558205a5a7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L2773-L2776 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wezhang commented on issue #28292: [SPARK-31516][DOC] Fix non-existed metric hiveClientCalls.count of CodeGenerator in DOC
wezhang commented on issue #28292: URL: https://github.com/apache/spark/pull/28292#issuecomment-618134605 There might be some networking issue in build, I have to restart it. ``` ERRORS SERVER ERROR: Gateway Time-out url=https://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/classpath/0.13.18/ivys/ivy.xml SERVER ERROR: Gateway Time-out url=https://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/logging/0.13.18/ivys/ivy.xml :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS unresolved dependency: org.scala-sbt#classpath;0.13.18: not found unresolved dependency: org.scala-sbt#logging;0.13.18: not found Error during sbt execution: Error retrieving required libraries (see /home/runner/.sbt/boot/update.log for complete log) Error: Could not retrieve sbt 0.13.18 Jekyll 4.0.0 Please append `--trace` to the `build` command for any additional information or backtrace. /home/runner/work/spark/spark/docs/_plugins/copy_api_dirs.rb:30:in `': Unidoc generation failed (RuntimeError) from /opt/hostedtoolcache/Ruby/2.7.1/x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in `require' from /opt/hostedtoolcache/Ruby/2.7.1/x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in `require' from /opt/hostedtoolcache/Ruby/2.7.1/x64/lib/ruby/gems/2.7.0/gems/jekyll-4.0.0/lib/jekyll/external.rb:60:in `block in require_with_graceful_fail' ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wezhang commented on issue #28292: [SPARK-31516][DOC] Fix non-existed metric hiveClientCalls.count of CodeGenerator in DOC
wezhang commented on issue #28292: URL: https://github.com/apache/spark/pull/28292#issuecomment-618133795 > Hi @wezhang, well spotted, indeed this was a mistake, thanks for fixing it. > LGTM Thank you a lot! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org