[GitHub] [spark] cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r353591940 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -104,19 +103,24 @@ abstract class StringRegexExpression extends BinaryExpression spark.sql.parser.escapedStringLiterals false > SELECT '%SystemDrive%\\Users\\John' _FUNC_ '\%SystemDrive\%Users%'; true + > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' ESCAPE '/'; + true """, note = """ Use RLIKE to match with standard regular expressions. """, since = "1.0.0") // scalastyle:on line.contains.tab -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like(left: Expression, right: Expression, escapeCharOpt: Option[Char] = None) Review comment: None indicates that `ESCAPE` is not specified, so that we can ignore it in `toString`. The existing code seems better. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r353591631 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -104,19 +103,24 @@ abstract class StringRegexExpression extends BinaryExpression spark.sql.parser.escapedStringLiterals false > SELECT '%SystemDrive%\\Users\\John' _FUNC_ '\%SystemDrive\%Users%'; true + > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' ESCAPE '/'; + true """, note = """ Use RLIKE to match with standard regular expressions. """, since = "1.0.0") // scalastyle:on line.contains.tab -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like(left: Expression, right: Expression, escapeCharOpt: Option[Char] = None) Review comment: yea this sounds better, to make the code simpler This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r353591631 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -104,19 +103,24 @@ abstract class StringRegexExpression extends BinaryExpression spark.sql.parser.escapedStringLiterals false > SELECT '%SystemDrive%\\Users\\John' _FUNC_ '\%SystemDrive\%Users%'; true + > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' ESCAPE '/'; + true """, note = """ Use RLIKE to match with standard regular expressions. """, since = "1.0.0") // scalastyle:on line.contains.tab -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like(left: Expression, right: Expression, escapeCharOpt: Option[Char] = None) Review comment: yea this sounds better, to make the code simpler This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r353590982 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -1202,6 +1203,7 @@ nonReserved | DROP | ELSE | END +| ESCAPE Review comment: ah sorry I misread the document. So we expect to make `ESCAPE` to be reserved under ansi mode. This makes sense, let's change it back. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Fokko commented on a change in pull request #24405: [SPARK-27506][SQL] Allow deserialization of Avro data using compatible schemas
Fokko commented on a change in pull request #24405: [SPARK-27506][SQL] Allow deserialization of Avro data using compatible schemas URL: https://github.com/apache/spark/pull/24405#discussion_r353590959 ## File path: docs/sql-data-sources-avro.md ## @@ -240,6 +240,14 @@ Data source options of Avro can be set via: function from_avro + +writerSchema Review comment: I would stick to `writerSchema`, mostly because this is also the term used in Avro itself: https://avro.apache.org/docs/1.9.1/api/java/org/apache/avro/hadoop/io/AvroValueDeserializer.html This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561520322 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19662/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561520318 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561520322 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19662/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561520318 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting
AmplabJenkins commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting URL: https://github.com/apache/spark/pull/26739#issuecomment-561519890 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114831/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting
AmplabJenkins removed a comment on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting URL: https://github.com/apache/spark/pull/26739#issuecomment-561519890 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114831/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting
AmplabJenkins commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting URL: https://github.com/apache/spark/pull/26739#issuecomment-561519884 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
cloud-fan commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#issuecomment-561520060 LGTM. Can we check the behavior in other databases like pgsql? It's better to know if Spark follows SQL standard or not. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting
AmplabJenkins removed a comment on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting URL: https://github.com/apache/spark/pull/26739#issuecomment-561519884 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dlindelof commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF
dlindelof commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF URL: https://github.com/apache/spark/pull/26747#issuecomment-561519797 @srowen This illustrates the current behaviour, where an empty Spark Dataframe with a column of type `LongType` becomes a Pandas Dataframe with a column of type `object`, i.e. string: ``` In [62]: foo = spark.sql("SELECT CAST(1 AS LONG) AS bar WHERE 1 = 0") In [63]: foo Out[63]: DataFrame[bar: bigint] In [64]: foo.toPandas().dtypes Out[64]: barobject dtype: object ``` When the dataframe is not empty, this is what you see: ``` In [65]: foo = spark.sql("SELECT CAST(1 AS LONG) AS bar WHERE 1 = 1") In [66]: foo.toPandas().dtypes Out[66]: barint64 dtype: object ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561519865 **[Test build #114839 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114839/testReport)** for PR 26412 at commit [`571225b`](https://github.com/apache/spark/commit/571225b68957fff781c68171b3c6c52cdfdc56cf). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF
SparkQA commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF URL: https://github.com/apache/spark/pull/26747#issuecomment-561519854 **[Test build #114838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114838/testReport)** for PR 26747 at commit [`f25827c`](https://github.com/apache/spark/commit/f25827ced6728ef033434df8ff39687de5690745). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting
SparkQA removed a comment on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting URL: https://github.com/apache/spark/pull/26739#issuecomment-561499537 **[Test build #114831 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114831/testReport)** for PR 26739 at commit [`f55917d`](https://github.com/apache/spark/commit/f55917d4211f76d68619dc1ff0b1b82dc5a6aa20). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting
SparkQA commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting URL: https://github.com/apache/spark/pull/26739#issuecomment-561519522 **[Test build #114831 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114831/testReport)** for PR 26739 at commit [`f55917d`](https://github.com/apache/spark/commit/f55917d4211f76d68619dc1ff0b1b82dc5a6aa20). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#issuecomment-561518455 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114835/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
yaooqinn commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#discussion_r353588331 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -185,16 +186,17 @@ case class DateAdd(startDate: Expression, days: Expression) """, since = "1.5.0") case class DateSub(startDate: Expression, days: Expression) - extends BinaryExpression with ImplicitCastInputTypes { + extends BinaryExpression with ExpectsInputTypes { override def left: Expression = startDate override def right: Expression = days - override def inputTypes: Seq[AbstractDataType] = Seq(DateType, IntegerType) + override def inputTypes: Seq[AbstractDataType] = +Seq(DateType, TypeCollection(IntegerType, ShortType, ByteType)) override def dataType: DataType = DateType override def nullSafeEval(start: Any, d: Any): Any = { -start.asInstanceOf[Int] - d.asInstanceOf[Int] +start.asInstanceOf[Int] - d.asInstanceOf[Number].intValue() Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] deshanxiao commented on issue #26744: [SPARK-30106][SQL][TEST] Fix the test of DynamicPartitionPruningSuite
deshanxiao commented on issue #26744: [SPARK-30106][SQL][TEST] Fix the test of DynamicPartitionPruningSuite URL: https://github.com/apache/spark/pull/26744#issuecomment-561518774 Thank you @dongjoon-hyun @cloud-fan . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#issuecomment-561518445 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
SparkQA removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#issuecomment-561508568 **[Test build #114835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114835/testReport)** for PR 26485 at commit [`a89b3b4`](https://github.com/apache/spark/commit/a89b3b4a3bee322b5ffc24dc7f37c8c6daf96283). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#issuecomment-561518445 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#issuecomment-561518455 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114835/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
SparkQA commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#issuecomment-561518402 **[Test build #114835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114835/testReport)** for PR 26485 at commit [`a89b3b4`](https://github.com/apache/spark/commit/a89b3b4a3bee322b5ffc24dc7f37c8c6daf96283). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins removed a comment on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/26750#issuecomment-561518074 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19659/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
AmplabJenkins removed a comment on issue #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#issuecomment-561518093 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19661/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF
AmplabJenkins removed a comment on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF URL: https://github.com/apache/spark/pull/26747#issuecomment-561518042 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins removed a comment on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/26750#issuecomment-561518059 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
AmplabJenkins removed a comment on issue #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#issuecomment-561518084 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF
AmplabJenkins removed a comment on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF URL: https://github.com/apache/spark/pull/26747#issuecomment-561518049 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19660/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/26750#issuecomment-561518074 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19659/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
AmplabJenkins commented on issue #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#issuecomment-561518093 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19661/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF
AmplabJenkins commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF URL: https://github.com/apache/spark/pull/26747#issuecomment-561518049 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19660/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF
AmplabJenkins commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF URL: https://github.com/apache/spark/pull/26747#issuecomment-561518042 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
AmplabJenkins commented on issue #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#issuecomment-561518084 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/26750#issuecomment-561518059 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dlindelof commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF
dlindelof commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF URL: https://github.com/apache/spark/pull/26747#issuecomment-561518046 @HyukjinKwon I've reverted back to an if-else chain instead of a dict. Was there anything else you think I should change? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
SparkQA commented on issue #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#issuecomment-561517640 **[Test build #114836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114836/testReport)** for PR 26736 at commit [`248d2e7`](https://github.com/apache/spark/commit/248d2e74a14fb6170883fdbfbaf67b925205f792). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
SparkQA commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/26750#issuecomment-561517649 **[Test build #114837 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114837/testReport)** for PR 26750 at commit [`d50facf`](https://github.com/apache/spark/commit/d50facf000401b282d101c350ad571d762d6d729). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table
AmplabJenkins removed a comment on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table URL: https://github.com/apache/spark/pull/26754#issuecomment-561516682 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
cloud-fan commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/26750#issuecomment-561516747 This is preferred over https://github.com/apache/spark/pull/26297, because 1. This follows the existing API style, so much less diff. 2. It's hard to decouple scheme and partition inference. For example, file source needs to infer partitioning before reporting its schema, as partition columns are part of the table schema. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table
AmplabJenkins removed a comment on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table URL: https://github.com/apache/spark/pull/26754#issuecomment-561516686 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114822/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table
AmplabJenkins commented on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table URL: https://github.com/apache/spark/pull/26754#issuecomment-561516686 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114822/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table
AmplabJenkins commented on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table URL: https://github.com/apache/spark/pull/26754#issuecomment-561516682 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table
SparkQA removed a comment on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table URL: https://github.com/apache/spark/pull/26754#issuecomment-561459485 **[Test build #114822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114822/testReport)** for PR 26754 at commit [`8d74a2c`](https://github.com/apache/spark/commit/8d74a2c62515eee67408a6a79dd779591df2e036). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table
SparkQA commented on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table URL: https://github.com/apache/spark/pull/26754#issuecomment-561516125 **[Test build #114822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114822/testReport)** for PR 26754 at commit [`8d74a2c`](https://github.com/apache/spark/commit/8d74a2c62515eee67408a6a79dd779591df2e036). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561513987 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114825/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561513979 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561513979 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561513987 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114825/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#discussion_r353583379 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ## @@ -135,19 +135,25 @@ object AggUtils { } val distinctAttributes = namedDistinctExpressions.map(_.toAttribute) val groupingAttributes = groupingExpressions.map(_.toAttribute) +val filterWithDistinctAttributes = functionsWithDistinct.flatMap(_.filterAttributes.toSeq) // 1. Create an Aggregate Operator for partial aggregations. val partialAggregate: SparkPlan = { val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = Partial)) val aggregateAttributes = aggregateExpressions.map(_.resultAttribute) // We will group by the original grouping expression, plus an additional expression for the - // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, the grouping - // expressions will be [key, value]. + // DISTINCT column and the referred attributes in the FILTER clause associated with each + // aggregate function. For example: + // 1.for the AVG (DISTINCT value) GROUP BY key, the grouping expression will be [key, value]; + // 2.for AVG (DISTINCT value) Filter (WHERE value2> 20) GROUP BY key, the grouping expression + // will be [key, value, value2]. Review comment: Oh, I will try to this. Thanks wenchen. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
SparkQA removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561465729 **[Test build #114825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114825/testReport)** for PR 26412 at commit [`4af7edb`](https://github.com/apache/spark/commit/4af7edb6476aea554c31ce9c54f8d2a23a9adf13). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561513617 **[Test build #114825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114825/testReport)** for PR 26412 at commit [`4af7edb`](https://github.com/apache/spark/commit/4af7edb6476aea554c31ce9c54f8d2a23a9adf13). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
cloud-fan commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#discussion_r353581578 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -185,16 +186,17 @@ case class DateAdd(startDate: Expression, days: Expression) """, since = "1.5.0") case class DateSub(startDate: Expression, days: Expression) - extends BinaryExpression with ImplicitCastInputTypes { + extends BinaryExpression with ExpectsInputTypes { override def left: Expression = startDate override def right: Expression = days - override def inputTypes: Seq[AbstractDataType] = Seq(DateType, IntegerType) + override def inputTypes: Seq[AbstractDataType] = +Seq(DateType, TypeCollection(IntegerType, ShortType, ByteType)) override def dataType: DataType = DateType override def nullSafeEval(start: Any, d: Any): Any = { -start.asInstanceOf[Int] - d.asInstanceOf[Int] +start.asInstanceOf[Int] - d.asInstanceOf[Number].intValue() Review comment: can we add some UT in `DateExpressionsSuite` to make sure byte/short works? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#issuecomment-561508982 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
cloud-fan commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#discussion_r353578469 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ## @@ -135,19 +135,25 @@ object AggUtils { } val distinctAttributes = namedDistinctExpressions.map(_.toAttribute) val groupingAttributes = groupingExpressions.map(_.toAttribute) +val filterWithDistinctAttributes = functionsWithDistinct.flatMap(_.filterAttributes.toSeq) // 1. Create an Aggregate Operator for partial aggregations. val partialAggregate: SparkPlan = { val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = Partial)) val aggregateAttributes = aggregateExpressions.map(_.resultAttribute) // We will group by the original grouping expression, plus an additional expression for the - // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, the grouping - // expressions will be [key, value]. + // DISTINCT column and the referred attributes in the FILTER clause associated with each + // aggregate function. For example: + // 1.for the AVG (DISTINCT value) GROUP BY key, the grouping expression will be [key, value]; + // 2.for AVG (DISTINCT value) Filter (WHERE value2> 20) GROUP BY key, the grouping expression + // will be [key, value, value2]. Review comment: Outputting value2 doesn't mean we have to group by value2. We can update the `resultExpressions` to include value2. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#issuecomment-561508990 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19658/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streamin…
AmplabJenkins removed a comment on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streamin… URL: https://github.com/apache/spark/pull/26756#issuecomment-561508446 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#issuecomment-561508982 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#issuecomment-561508990 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19658/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streamin…
AmplabJenkins commented on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streamin… URL: https://github.com/apache/spark/pull/26756#issuecomment-561508873 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#discussion_r353577964 ## File path: sql/core/src/test/resources/sql-tests/results/subquery/negative-cases/subq-input-typecheck.sql.out ## @@ -135,12 +135,12 @@ WHERE struct<> -- !query 9 output org.apache.spark.sql.AnalysisException -cannot resolve '(named_struct('t4a', t4.`t4a`, 't4b', t4.`t4b`, 't4c', t4.`t4c`) IN (listquery()))' due to data type mismatch: +cannot resolve '(named_struct('t4a', t4.`t4a`, 't4b', t4.`t4b`, 't4c', t4.`t4c`) IN (listquery()))' due to data type mismatch: Review comment: In fact, I do not know why a space is involved, I have tried to remove it, but failed. It should does not matter. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streamin…
AmplabJenkins commented on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streamin… URL: https://github.com/apache/spark/pull/26756#issuecomment-561508446 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
SparkQA commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#issuecomment-561508568 **[Test build #114835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114835/testReport)** for PR 26485 at commit [`a89b3b4`](https://github.com/apache/spark/commit/a89b3b4a3bee322b5ffc24dc7f37c8c6daf96283). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#discussion_r353577512 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala ## @@ -472,9 +472,12 @@ object TypeCoercion { // RHS is the subquery output. val rhs = sub.output -val commonTypes = lhs.zip(rhs).flatMap { case (l, r) => - findCommonTypeForBinaryComparison(l.dataType, r.dataType, conf) -.orElse(findTightestCommonType(l.dataType, r.dataType)) +val commonTypes = lhs.zip(rhs).flatMap { + case (l, r) if !l.dataType.isInstanceOf[DecimalType] && Review comment: Thanks for your suggestion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#discussion_r353577420 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala ## @@ -472,9 +472,12 @@ object TypeCoercion { // RHS is the subquery output. val rhs = sub.output -val commonTypes = lhs.zip(rhs).flatMap { case (l, r) => - findCommonTypeForBinaryComparison(l.dataType, r.dataType, conf) -.orElse(findTightestCommonType(l.dataType, r.dataType)) +val commonTypes = lhs.zip(rhs).flatMap { + case (l, r) if !l.dataType.isInstanceOf[DecimalType] && +!r.dataType.isInstanceOf[DecimalType] => +findCommonTypeForBinaryComparison(l.dataType, r.dataType, conf) + .orElse(findTightestCommonType(l.dataType, r.dataType)) + case (l, r) => findWiderTypeForDecimal(l.dataType, r.dataType) Review comment: thanks for your suggestion. To unify the logic of `in` and `inSubquery` mentioned by cloud-fan, I simply call `findWiderTypeForTwo` here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#discussion_r353577120 ## File path: sql/core/src/test/resources/sql-tests/results/subquery/negative-cases/subq-input-typecheck.sql.out ## @@ -132,15 +132,6 @@ WHERE t5c FROM t5) -- !query 9 schema -struct<> +struct -- !query 9 output -org.apache.spark.sql.AnalysisException -cannot resolve '(named_struct('t4a', t4.`t4a`, 't4b', t4.`t4b`, 't4c', t4.`t4c`) IN (listquery()))' due to data type mismatch: -The data type of one or more elements in the left hand side of an IN subquery -is not compatible with the data type of the output of the subquery Review comment: thanks, I have modified them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson opened a new pull request #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streamin…
iRakson opened a new pull request #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streamin… URL: https://github.com/apache/spark/pull/26756 …g Tab ### What changes were proposed in this pull request? Adding support for pagination in streaming tab for completed batch table. ### Why are the changes needed? If our streaming job is running for long time and number of batches are huge then out of memory error may come while loading the streaming page. Introducing pagination will solve this problem and also improve the loading time of page. Besides jobs,stages,sql and thrift-server page contains pagination. So it also brings consistency. ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? Manually. Will attach screenshots later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561506710 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19657/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561506710 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19657/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561506708 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561506708 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561506326 **[Test build #114834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114834/testReport)** for PR 25001 at commit [`a891139`](https://github.com/apache/spark/commit/a8911392aa52b883edded53c1765d52648d5adfa). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r353575369 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -104,19 +103,24 @@ abstract class StringRegexExpression extends BinaryExpression spark.sql.parser.escapedStringLiterals false > SELECT '%SystemDrive%\\Users\\John' _FUNC_ '\%SystemDrive\%Users%'; true + > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' ESCAPE '/'; + true """, note = """ Use RLIKE to match with standard regular expressions. """, since = "1.0.0") // scalastyle:on line.contains.tab -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like(left: Expression, right: Expression, escapeCharOpt: Option[Char] = None) Review comment: It's OK too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r353575345 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,16 +83,15 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string added since Spark 3.0. The default escape character is the '\'. Review comment: Thanks for you remind. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r353575285 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -1202,6 +1203,7 @@ nonReserved | DROP | ELSE | END +| ESCAPE Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561504421 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19655/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected
AmplabJenkins removed a comment on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#issuecomment-561504471 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561504421 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19655/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected
AmplabJenkins commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#issuecomment-561504471 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected
AmplabJenkins commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#issuecomment-561504480 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19656/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561504415 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected
AmplabJenkins removed a comment on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#issuecomment-561504480 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19656/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting
zhengruifeng commented on a change in pull request #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting URL: https://github.com/apache/spark/pull/26739#discussion_r353573872 ## File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ## @@ -278,30 +287,32 @@ class KMeans private ( val bcCenters = sc.broadcast(centers) // Find the new centers - val collected = data.mapPartitions { points => + val collected = data.mapPartitions { pointsAndWeights => val thisCenters = bcCenters.value val dims = thisCenters.head.vector.size val sums = Array.fill(thisCenters.length)(Vectors.zeros(dims)) -val counts = Array.fill(thisCenters.length)(0L) -points.foreach { point => - val (bestCenter, cost) = distanceMeasureInstance.findClosest(thisCenters, point) +// clusterWeightSum is needed to calculate cluster center +// cluster center = +// sample1 * weight1/clusterWeightSum + sample2 * weight2/clusterWeightSum + ... +val clusterWeightSum = Array.fill(thisCenters.length)(0.0) + +pointsAndWeights.foreach { case (point, weight) => + var (bestCenter, cost) = distanceMeasureInstance.findClosest(thisCenters, point) + cost *= weight costAccum.add(cost) - distanceMeasureInstance.updateClusterSum(point, sums(bestCenter)) - counts(bestCenter) += 1 + distanceMeasureInstance.updateClusterSum(point, sums(bestCenter), weight) + clusterWeightSum(bestCenter) += weight } -counts.indices.filter(counts(_) > 0).map(j => (j, (sums(j), counts(j.iterator - }.reduceByKey { case ((sum1, count1), (sum2, count2)) => +clusterWeightSum.indices.filter(clusterWeightSum(_) > 0) + .map(j => (j, (sums(j), clusterWeightSum(j.iterator + }.reduceByKey { case ((sum1, clusterWeightSum1), (sum2, clusterWeightSum2)) => axpy(1.0, sum2, sum1) -(sum1, count1 + count2) +(sum1, clusterWeightSum1 + clusterWeightSum2) }.collectAsMap() - if (iteration == 0) { -instr.foreach(_.logNumExamples(collected.values.map(_._2).sum)) - } - Review comment: I am OK to add new `instr.log` in other PR. Here I prefer to keep `instr.logNumExamples` log the unweighted count, in order to keep it in sync with other algs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561504415 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561504005 **[Test build #114832 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114832/testReport)** for PR 26412 at commit [`ae70022`](https://github.com/apache/spark/commit/ae7002232a87bd5c20ff060c95443e1804e5c869). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected
SparkQA commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#issuecomment-561503993 **[Test build #114833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114833/testReport)** for PR 26080 at commit [`f11e7c8`](https://github.com/apache/spark/commit/f11e7c8e72b319327fea3a8db511b3fec3152384). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#discussion_r353571892 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ## @@ -135,19 +135,25 @@ object AggUtils { } val distinctAttributes = namedDistinctExpressions.map(_.toAttribute) val groupingAttributes = groupingExpressions.map(_.toAttribute) +val filterWithDistinctAttributes = functionsWithDistinct.flatMap(_.filterAttributes.toSeq) // 1. Create an Aggregate Operator for partial aggregations. val partialAggregate: SparkPlan = { val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = Partial)) val aggregateAttributes = aggregateExpressions.map(_.resultAttribute) // We will group by the original grouping expression, plus an additional expression for the - // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, the grouping - // expressions will be [key, value]. + // DISTINCT column and the referred attributes in the FILTER clause associated with each + // aggregate function. For example: + // 1.for the AVG (DISTINCT value) GROUP BY key, the grouping expression will be [key, value]; + // 2.for AVG (DISTINCT value) Filter (WHERE value2> 20) GROUP BY key, the grouping expression + // will be [key, value, value2]. Review comment: For a query like `SELECT COUNT(DISTINCT a) FILTER (WHERE c > 0), SUM(b) FILTER (WHERE d = 0) FROM table` will be ``` Final-AGG-4 (count distinct) Shuffle to a single reducer PartialMerge-AGG-3 (count distinct, no grouping, apply function COUNT on a with c > 0) PartialMerge-AGG-2 (grouping on a and c) Shuffle by a and c Partial-AGG-1 (grouping on a and c, apply function SUM on b with d = 0) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting
zhengruifeng commented on a change in pull request #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting URL: https://github.com/apache/spark/pull/26739#discussion_r353572931 ## File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ## @@ -278,30 +287,32 @@ class KMeans private ( val bcCenters = sc.broadcast(centers) // Find the new centers - val collected = data.mapPartitions { points => + val collected = data.mapPartitions { pointsAndWeights => val thisCenters = bcCenters.value val dims = thisCenters.head.vector.size val sums = Array.fill(thisCenters.length)(Vectors.zeros(dims)) -val counts = Array.fill(thisCenters.length)(0L) -points.foreach { point => - val (bestCenter, cost) = distanceMeasureInstance.findClosest(thisCenters, point) +// clusterWeightSum is needed to calculate cluster center +// cluster center = +// sample1 * weight1/clusterWeightSum + sample2 * weight2/clusterWeightSum + ... +val clusterWeightSum = Array.fill(thisCenters.length)(0.0) + +pointsAndWeights.foreach { case (point, weight) => + var (bestCenter, cost) = distanceMeasureInstance.findClosest(thisCenters, point) + cost *= weight costAccum.add(cost) - distanceMeasureInstance.updateClusterSum(point, sums(bestCenter)) - counts(bestCenter) += 1 + distanceMeasureInstance.updateClusterSum(point, sums(bestCenter), weight) + clusterWeightSum(bestCenter) += weight } -counts.indices.filter(counts(_) > 0).map(j => (j, (sums(j), counts(j.iterator - }.reduceByKey { case ((sum1, count1), (sum2, count2)) => +clusterWeightSum.indices.filter(clusterWeightSum(_) > 0) + .map(j => (j, (sums(j), clusterWeightSum(j.iterator + }.reduceByKey { case ((sum1, clusterWeightSum1), (sum2, clusterWeightSum2)) => axpy(1.0, sum2, sum1) -(sum1, count1 + count2) +(sum1, clusterWeightSum1 + clusterWeightSum2) }.collectAsMap() - if (iteration == 0) { -instr.foreach(_.logNumExamples(collected.values.map(_._2).sum)) - } - Review comment: I am OK to add new `instr.log` in other PR. Here I prefer to keep `instr.logNumExamples` log the unweighted count, in order to keep it in sync with other algs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#discussion_r353571892 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ## @@ -135,19 +135,25 @@ object AggUtils { } val distinctAttributes = namedDistinctExpressions.map(_.toAttribute) val groupingAttributes = groupingExpressions.map(_.toAttribute) +val filterWithDistinctAttributes = functionsWithDistinct.flatMap(_.filterAttributes.toSeq) // 1. Create an Aggregate Operator for partial aggregations. val partialAggregate: SparkPlan = { val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = Partial)) val aggregateAttributes = aggregateExpressions.map(_.resultAttribute) // We will group by the original grouping expression, plus an additional expression for the - // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, the grouping - // expressions will be [key, value]. + // DISTINCT column and the referred attributes in the FILTER clause associated with each + // aggregate function. For example: + // 1.for the AVG (DISTINCT value) GROUP BY key, the grouping expression will be [key, value]; + // 2.for AVG (DISTINCT value) Filter (WHERE value2> 20) GROUP BY key, the grouping expression + // will be [key, value, value2]. Review comment: For a query like `SELECT COUNT(DISTINCT a) FILTER (WHERE c > 0), SUM(b) FILTER (WHERE d = 0) FROM table` will be ``` Final-AGG-4 (count distinct) Shuffle to a single reducer PartialMerge-AGG-3 (count distinct, no grouping, apply function COUNT on a with c > 0) PartialMerge-AGG-2 (grouping on a and c) Shuffle by a Partial-AGG-1 (grouping on a and c, apply function SUM on b with d = 0) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected
yaooqinn commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#discussion_r353572328 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ## @@ -809,6 +846,22 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { } } + override def getDatabaseOwnerName(db: Database): String = { + Option(getDatabaseOwnerNameMethod.invoke(db)).map(_.asInstanceOf[String]).getOrElse("") + } + + override def setDatabaseOwnerName(db: Database, owner: String): Unit = { +setDatabaseOwnerNameMethod.invoke(db, owner) + } + + override def getDatabaseOwnerType(db: Database): String = { +Option(getDatabaseOwnerTypeMethod.invoke(db)) + .map(_.asInstanceOf[PrincipalType].name()).getOrElse(PrincipalType.USER.name()) Review comment: agree This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#discussion_r353571892 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ## @@ -135,19 +135,25 @@ object AggUtils { } val distinctAttributes = namedDistinctExpressions.map(_.toAttribute) val groupingAttributes = groupingExpressions.map(_.toAttribute) +val filterWithDistinctAttributes = functionsWithDistinct.flatMap(_.filterAttributes.toSeq) // 1. Create an Aggregate Operator for partial aggregations. val partialAggregate: SparkPlan = { val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = Partial)) val aggregateAttributes = aggregateExpressions.map(_.resultAttribute) // We will group by the original grouping expression, plus an additional expression for the - // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, the grouping - // expressions will be [key, value]. + // DISTINCT column and the referred attributes in the FILTER clause associated with each + // aggregate function. For example: + // 1.for the AVG (DISTINCT value) GROUP BY key, the grouping expression will be [key, value]; + // 2.for AVG (DISTINCT value) Filter (WHERE value2> 20) GROUP BY key, the grouping expression + // will be [key, value, value2]. Review comment: For a query like `SELECT COUNT(DISTINCT a) FILTER (WHERE c > 0), SUM(b) FILTER (WHERE d = 0) FROM table` will be ``` AGG-4 (count distinct) Shuffle to a single reducer Partial-AGG-3 (count distinct, no grouping, apply function COUNT on a with c > 0) Partial-AGG-2 (grouping on a and c) Shuffle by a Partial-AGG-1 (grouping on a and c, apply function SUM on b with d = 0) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#discussion_r353571892 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ## @@ -135,19 +135,25 @@ object AggUtils { } val distinctAttributes = namedDistinctExpressions.map(_.toAttribute) val groupingAttributes = groupingExpressions.map(_.toAttribute) +val filterWithDistinctAttributes = functionsWithDistinct.flatMap(_.filterAttributes.toSeq) // 1. Create an Aggregate Operator for partial aggregations. val partialAggregate: SparkPlan = { val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = Partial)) val aggregateAttributes = aggregateExpressions.map(_.resultAttribute) // We will group by the original grouping expression, plus an additional expression for the - // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, the grouping - // expressions will be [key, value]. + // DISTINCT column and the referred attributes in the FILTER clause associated with each + // aggregate function. For example: + // 1.for the AVG (DISTINCT value) GROUP BY key, the grouping expression will be [key, value]; + // 2.for AVG (DISTINCT value) Filter (WHERE value2> 20) GROUP BY key, the grouping expression + // will be [key, value, value2]. Review comment: For a query like SELECT COUNT(DISTINCT a) FILTER (WHERE c > 0), SUM(b) FILTER (WHERE d = 0) FROM table will be ``` AGG-4 (count distinct) Shuffle to a single reducer Partial-AGG-3 (count distinct, no grouping, apply function COUNT on a with c > 0) Partial-AGG-2 (grouping on a and c) Shuffle by a Partial-AGG-1 (grouping on a and c, apply function SUM on b with d = 0) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table
LantaoJin commented on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table URL: https://github.com/apache/spark/pull/26754#issuecomment-561502217 cc @cloud-fan @dongjoon-hyun @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#issuecomment-561499842 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114820/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting
AmplabJenkins commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting URL: https://github.com/apache/spark/pull/26739#issuecomment-56155 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19654/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.
AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery. URL: https://github.com/apache/spark/pull/26485#issuecomment-561499833 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org