date:20191203

[GitHub] [spark] cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-12-03 Thread GitBox

cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] 
Support LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#discussion_r353591940
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ##
 @@ -104,19 +103,24 @@ abstract class StringRegexExpression extends 
BinaryExpression
   spark.sql.parser.escapedStringLiterals   false
   > SELECT '%SystemDrive%\\Users\\John' _FUNC_ '\%SystemDrive\%Users%';
   true
+  > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' 
ESCAPE '/';
+  true
   """,
   note = """
 Use RLIKE to match with standard regular expressions.
   """,
   since = "1.0.0")
 // scalastyle:on line.contains.tab
-case class Like(left: Expression, right: Expression) extends 
StringRegexExpression {
+case class Like(left: Expression, right: Expression, escapeCharOpt: 
Option[Char] = None)
 
 Review comment:
   None indicates that `ESCAPE` is not specified, so that we can ignore it in 
`toString`. The existing code seems better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-12-03 Thread GitBox

cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] 
Support LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#discussion_r353591631
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ##
 @@ -104,19 +103,24 @@ abstract class StringRegexExpression extends 
BinaryExpression
   spark.sql.parser.escapedStringLiterals   false
   > SELECT '%SystemDrive%\\Users\\John' _FUNC_ '\%SystemDrive\%Users%';
   true
+  > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' 
ESCAPE '/';
+  true
   """,
   note = """
 Use RLIKE to match with standard regular expressions.
   """,
   since = "1.0.0")
 // scalastyle:on line.contains.tab
-case class Like(left: Expression, right: Expression) extends 
StringRegexExpression {
+case class Like(left: Expression, right: Expression, escapeCharOpt: 
Option[Char] = None)
 
 Review comment:
   yea this sounds better, to make the code simpler


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-12-03 Thread GitBox

cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] 
Support LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#discussion_r353591631
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ##
 @@ -104,19 +103,24 @@ abstract class StringRegexExpression extends 
BinaryExpression
   spark.sql.parser.escapedStringLiterals   false
   > SELECT '%SystemDrive%\\Users\\John' _FUNC_ '\%SystemDrive\%Users%';
   true
+  > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' 
ESCAPE '/';
+  true
   """,
   note = """
 Use RLIKE to match with standard regular expressions.
   """,
   since = "1.0.0")
 // scalastyle:on line.contains.tab
-case class Like(left: Expression, right: Expression) extends 
StringRegexExpression {
+case class Like(left: Expression, right: Expression, escapeCharOpt: 
Option[Char] = None)
 
 Review comment:
   yea this sounds better, to make the code simpler


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-12-03 Thread GitBox

cloud-fan commented on a change in pull request #25001: [SPARK-28083][SQL] 
Support LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#discussion_r353590982
 
 

 ##
 File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##
 @@ -1202,6 +1203,7 @@ nonReserved
 | DROP
 | ELSE
 | END
+| ESCAPE
 
 Review comment:
   ah sorry I misread the document. So we expect to make `ESCAPE` to be 
reserved under ansi mode. This makes sense, let's change it back.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Fokko commented on a change in pull request #24405: [SPARK-27506][SQL] Allow deserialization of Avro data using compatible schemas

2019-12-03 Thread GitBox

Fokko commented on a change in pull request #24405: [SPARK-27506][SQL] Allow 
deserialization of Avro data using compatible schemas
URL: https://github.com/apache/spark/pull/24405#discussion_r353590959
 
 

 ##
 File path: docs/sql-data-sources-avro.md
 ##
 @@ -240,6 +240,14 @@ Data source options of Avro can be set via:
 
 function from_avro
   
+  
+writerSchema
 
 Review comment:
   I would stick to `writerSchema`, mostly because this is also the term used 
in Avro itself: 
https://avro.apache.org/docs/1.9.1/api/java/org/apache/avro/hadoop/io/AvroValueDeserializer.html


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and 
Timestamp type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561520322
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19662/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and 
Timestamp type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561520318
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp 
type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561520322
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19662/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp 
type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561520318
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans 
support instance weighting
URL: https://github.com/apache/spark/pull/26739#issuecomment-561519890
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114831/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26739: [SPARK-29967][ML][PYTHON] 
KMeans support instance weighting
URL: https://github.com/apache/spark/pull/26739#issuecomment-561519890
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114831/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans 
support instance weighting
URL: https://github.com/apache/spark/pull/26739#issuecomment-561519884
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

cloud-fan commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch 
issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#issuecomment-561520060
 
 
   LGTM. Can we check the behavior in other databases like pgsql? It's better 
to know if Spark follows SQL standard or not.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26739: [SPARK-29967][ML][PYTHON] 
KMeans support instance weighting
URL: https://github.com/apache/spark/pull/26739#issuecomment-561519884
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dlindelof commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF

2019-12-03 Thread GitBox

dlindelof commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without 
Arrow) gets wrong dtypes when applied on empty DF
URL: https://github.com/apache/spark/pull/26747#issuecomment-561519797
 
 
   @srowen This illustrates the current behaviour, where an empty Spark 
Dataframe with a column of type `LongType` becomes a Pandas Dataframe with a 
column of type `object`, i.e. string:
   
   ```
   In [62]: foo = spark.sql("SELECT CAST(1 AS LONG) AS bar WHERE 1 = 0")
   
   In [63]: foo
   Out[63]: DataFrame[bar: bigint]
   
   In [64]: foo.toPandas().dtypes
   Out[64]:
   barobject
   dtype: object
   ```
   
   When the dataframe is not empty, this is what you see:
   
   ```
   In [65]: foo = spark.sql("SELECT CAST(1 AS LONG) AS bar WHERE 1 = 1")
   
   In [66]: foo.toPandas().dtypes
   Out[66]:
   barint64
   dtype: object
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type 
+/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561519865
 
 
   **[Test build #114839 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114839/testReport)**
 for PR 26412 at commit 
[`571225b`](https://github.com/apache/spark/commit/571225b68957fff781c68171b3c6c52cdfdc56cf).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF

2019-12-03 Thread GitBox

SparkQA commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without 
Arrow) gets wrong dtypes when applied on empty DF
URL: https://github.com/apache/spark/pull/26747#issuecomment-561519854
 
 
   **[Test build #114838 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114838/testReport)**
 for PR 26747 at commit 
[`f25827c`](https://github.com/apache/spark/commit/f25827ced6728ef033434df8ff39687de5690745).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting

2019-12-03 Thread GitBox

SparkQA removed a comment on issue #26739: [SPARK-29967][ML][PYTHON] KMeans 
support instance weighting
URL: https://github.com/apache/spark/pull/26739#issuecomment-561499537
 
 
   **[Test build #114831 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114831/testReport)**
 for PR 26739 at commit 
[`f55917d`](https://github.com/apache/spark/commit/f55917d4211f76d68619dc1ff0b1b82dc5a6aa20).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting

2019-12-03 Thread GitBox

SparkQA commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support 
instance weighting
URL: https://github.com/apache/spark/pull/26739#issuecomment-561519522
 
 
   **[Test build #114831 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114831/testReport)**
 for PR 26739 at commit 
[`f55917d`](https://github.com/apache/spark/commit/f55917d4211f76d68619dc1ff0b1b82dc5a6aa20).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix 
dataType mismatch issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#issuecomment-561518455
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114835/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

yaooqinn commented on a change in pull request #26412: [SPARK-29774][SQL] Date 
and Timestamp type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#discussion_r353588331
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ##
 @@ -185,16 +186,17 @@ case class DateAdd(startDate: Expression, days: 
Expression)
   """,
   since = "1.5.0")
 case class DateSub(startDate: Expression, days: Expression)
-  extends BinaryExpression with ImplicitCastInputTypes {
+  extends BinaryExpression with ExpectsInputTypes {
   override def left: Expression = startDate
   override def right: Expression = days
 
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType, IntegerType)
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(DateType, TypeCollection(IntegerType, ShortType, ByteType))
 
   override def dataType: DataType = DateType
 
   override def nullSafeEval(start: Any, d: Any): Any = {
-start.asInstanceOf[Int] - d.asInstanceOf[Int]
+start.asInstanceOf[Int] - d.asInstanceOf[Number].intValue()
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] deshanxiao commented on issue #26744: [SPARK-30106][SQL][TEST] Fix the test of DynamicPartitionPruningSuite

2019-12-03 Thread GitBox

deshanxiao commented on issue #26744: [SPARK-30106][SQL][TEST] Fix the test of 
DynamicPartitionPruningSuite
URL: https://github.com/apache/spark/pull/26744#issuecomment-561518774
 
 
   Thank you @dongjoon-hyun @cloud-fan .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix 
dataType mismatch issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#issuecomment-561518445
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

SparkQA removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType 
mismatch issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#issuecomment-561508568
 
 
   **[Test build #114835 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114835/testReport)**
 for PR 26485 at commit 
[`a89b3b4`](https://github.com/apache/spark/commit/a89b3b4a3bee322b5ffc24dc7f37c8c6daf96283).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType 
mismatch issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#issuecomment-561518445
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType 
mismatch issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#issuecomment-561518455
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114835/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

SparkQA commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch 
issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#issuecomment-561518402
 
 
   **[Test build #114835 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114835/testReport)**
 for PR 26485 at commit 
[`a89b3b4`](https://github.com/apache/spark/commit/a89b3b4a3bee322b5ffc24dc7f37c8c6daf96283).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26750: [SPARK-28948][SQL] Support 
passing all Table metadata in TableProvider
URL: https://github.com/apache/spark/pull/26750#issuecomment-561518074
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19659/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26736: [SPARK-30098][SQL] Use default 
datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#issuecomment-561518093
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19661/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26747: [SPARK-29188][PYTHON] toPandas 
(without Arrow) gets wrong dtypes when applied on empty DF
URL: https://github.com/apache/spark/pull/26747#issuecomment-561518042
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26750: [SPARK-28948][SQL] Support 
passing all Table metadata in TableProvider
URL: https://github.com/apache/spark/pull/26750#issuecomment-561518059
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26736: [SPARK-30098][SQL] Use default 
datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#issuecomment-561518084
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26747: [SPARK-29188][PYTHON] toPandas 
(without Arrow) gets wrong dtypes when applied on empty DF
URL: https://github.com/apache/spark/pull/26747#issuecomment-561518049
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19660/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26750: [SPARK-28948][SQL] Support passing all 
Table metadata in TableProvider
URL: https://github.com/apache/spark/pull/26750#issuecomment-561518074
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19659/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26736: [SPARK-30098][SQL] Use default 
datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#issuecomment-561518093
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19661/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26747: [SPARK-29188][PYTHON] toPandas 
(without Arrow) gets wrong dtypes when applied on empty DF
URL: https://github.com/apache/spark/pull/26747#issuecomment-561518049
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19660/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26747: [SPARK-29188][PYTHON] toPandas 
(without Arrow) gets wrong dtypes when applied on empty DF
URL: https://github.com/apache/spark/pull/26747#issuecomment-561518042
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26736: [SPARK-30098][SQL] Use default 
datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#issuecomment-561518084
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26750: [SPARK-28948][SQL] Support passing all 
Table metadata in TableProvider
URL: https://github.com/apache/spark/pull/26750#issuecomment-561518059
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dlindelof commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF

2019-12-03 Thread GitBox

dlindelof commented on issue #26747: [SPARK-29188][PYTHON] toPandas (without 
Arrow) gets wrong dtypes when applied on empty DF
URL: https://github.com/apache/spark/pull/26747#issuecomment-561518046
 
 
   @HyukjinKwon I've reverted back to an if-else chain instead of a dict. Was 
there anything else you think I should change?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-03 Thread GitBox

SparkQA commented on issue #26736: [SPARK-30098][SQL] Use default datasource as 
provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#issuecomment-561517640
 
 
   **[Test build #114836 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114836/testReport)**
 for PR 26736 at commit 
[`248d2e7`](https://github.com/apache/spark/commit/248d2e74a14fb6170883fdbfbaf67b925205f792).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider

2019-12-03 Thread GitBox

SparkQA commented on issue #26750: [SPARK-28948][SQL] Support passing all Table 
metadata in TableProvider
URL: https://github.com/apache/spark/pull/26750#issuecomment-561517649
 
 
   **[Test build #114837 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114837/testReport)**
 for PR 26750 at commit 
[`d50facf`](https://github.com/apache/spark/commit/d50facf000401b282d101c350ad571d762d6d729).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26754: [SPARK-30115][SQL] Improve 
limit only query on datasource table
URL: https://github.com/apache/spark/pull/26754#issuecomment-561516682
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider

2019-12-03 Thread GitBox

cloud-fan commented on issue #26750: [SPARK-28948][SQL] Support passing all 
Table metadata in TableProvider
URL: https://github.com/apache/spark/pull/26750#issuecomment-561516747
 
 
   This is preferred over https://github.com/apache/spark/pull/26297, because
   1. This follows the existing API style, so much less diff.
   2. It's hard to decouple scheme and partition inference. For example, file 
source needs to infer partitioning before reporting its schema, as partition 
columns are part of the table schema.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26754: [SPARK-30115][SQL] Improve 
limit only query on datasource table
URL: https://github.com/apache/spark/pull/26754#issuecomment-561516686
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114822/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26754: [SPARK-30115][SQL] Improve limit only 
query on datasource table
URL: https://github.com/apache/spark/pull/26754#issuecomment-561516686
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114822/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26754: [SPARK-30115][SQL] Improve limit only 
query on datasource table
URL: https://github.com/apache/spark/pull/26754#issuecomment-561516682
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table

2019-12-03 Thread GitBox

SparkQA removed a comment on issue #26754: [SPARK-30115][SQL] Improve limit 
only query on datasource table
URL: https://github.com/apache/spark/pull/26754#issuecomment-561459485
 
 
   **[Test build #114822 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114822/testReport)**
 for PR 26754 at commit 
[`8d74a2c`](https://github.com/apache/spark/commit/8d74a2c62515eee67408a6a79dd779591df2e036).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table

2019-12-03 Thread GitBox

SparkQA commented on issue #26754: [SPARK-30115][SQL] Improve limit only query 
on datasource table
URL: https://github.com/apache/spark/pull/26754#issuecomment-561516125
 
 
   **[Test build #114822 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114822/testReport)**
 for PR 26754 at commit 
[`8d74a2c`](https://github.com/apache/spark/commit/8d74a2c62515eee67408a6a79dd779591df2e036).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and 
Timestamp type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561513987
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114825/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and 
Timestamp type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561513979
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp 
type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561513979
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp 
type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561513987
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114825/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-03 Thread GitBox

beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] 
Support ANSI SQL filter clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#discussion_r353583379
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
 ##
 @@ -135,19 +135,25 @@ object AggUtils {
 }
 val distinctAttributes = namedDistinctExpressions.map(_.toAttribute)
 val groupingAttributes = groupingExpressions.map(_.toAttribute)
+val filterWithDistinctAttributes = 
functionsWithDistinct.flatMap(_.filterAttributes.toSeq)
 
 // 1. Create an Aggregate Operator for partial aggregations.
 val partialAggregate: SparkPlan = {
   val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = 
Partial))
   val aggregateAttributes = aggregateExpressions.map(_.resultAttribute)
   // We will group by the original grouping expression, plus an additional 
expression for the
-  // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, 
the grouping
-  // expressions will be [key, value].
+  // DISTINCT column and the referred attributes in the FILTER clause 
associated with each
+  // aggregate function. For example:
+  // 1.for the AVG (DISTINCT value) GROUP BY key, the grouping expression 
will be [key, value];
+  // 2.for AVG (DISTINCT value) Filter (WHERE value2> 20) GROUP BY key, 
the grouping expression
+  // will be [key, value, value2].
 
 Review comment:
   Oh, I will try to this. Thanks wenchen.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

SparkQA removed a comment on issue #26412: [SPARK-29774][SQL] Date and 
Timestamp type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561465729
 
 
   **[Test build #114825 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114825/testReport)**
 for PR 26412 at commit 
[`4af7edb`](https://github.com/apache/spark/commit/4af7edb6476aea554c31ce9c54f8d2a23a9adf13).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type 
+/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561513617
 
 
   **[Test build #114825 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114825/testReport)**
 for PR 26412 at commit 
[`4af7edb`](https://github.com/apache/spark/commit/4af7edb6476aea554c31ce9c54f8d2a23a9adf13).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

cloud-fan commented on a change in pull request #26412: [SPARK-29774][SQL] Date 
and Timestamp type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#discussion_r353581578
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ##
 @@ -185,16 +186,17 @@ case class DateAdd(startDate: Expression, days: 
Expression)
   """,
   since = "1.5.0")
 case class DateSub(startDate: Expression, days: Expression)
-  extends BinaryExpression with ImplicitCastInputTypes {
+  extends BinaryExpression with ExpectsInputTypes {
   override def left: Expression = startDate
   override def right: Expression = days
 
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType, IntegerType)
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(DateType, TypeCollection(IntegerType, ShortType, ByteType))
 
   override def dataType: DataType = DateType
 
   override def nullSafeEval(start: Any, d: Any): Any = {
-start.asInstanceOf[Int] - d.asInstanceOf[Int]
+start.asInstanceOf[Int] - d.asInstanceOf[Number].intValue()
 
 Review comment:
   can we add some UT in `DateExpressionsSuite` to make sure byte/short works?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix 
dataType mismatch issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#issuecomment-561508982
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-03 Thread GitBox

cloud-fan commented on a change in pull request #26656: [SPARK-27986][SQL] 
Support ANSI SQL filter clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#discussion_r353578469
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
 ##
 @@ -135,19 +135,25 @@ object AggUtils {
 }
 val distinctAttributes = namedDistinctExpressions.map(_.toAttribute)
 val groupingAttributes = groupingExpressions.map(_.toAttribute)
+val filterWithDistinctAttributes = 
functionsWithDistinct.flatMap(_.filterAttributes.toSeq)
 
 // 1. Create an Aggregate Operator for partial aggregations.
 val partialAggregate: SparkPlan = {
   val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = 
Partial))
   val aggregateAttributes = aggregateExpressions.map(_.resultAttribute)
   // We will group by the original grouping expression, plus an additional 
expression for the
-  // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, 
the grouping
-  // expressions will be [key, value].
+  // DISTINCT column and the referred attributes in the FILTER clause 
associated with each
+  // aggregate function. For example:
+  // 1.for the AVG (DISTINCT value) GROUP BY key, the grouping expression 
will be [key, value];
+  // 2.for AVG (DISTINCT value) Filter (WHERE value2> 20) GROUP BY key, 
the grouping expression
+  // will be [key, value, value2].
 
 Review comment:
   Outputting value2 doesn't mean we have to group by value2. We can update the 
`resultExpressions` to include value2.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26485: [SPARK-29860][SQL] Fix 
dataType mismatch issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#issuecomment-561508990
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19658/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streamin…

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26756: [SPARK-30119][WebUI]Support 
Pagination for Completed Batch Table in Streamin…
URL: https://github.com/apache/spark/pull/26756#issuecomment-561508446
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType 
mismatch issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#issuecomment-561508982
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType 
mismatch issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#issuecomment-561508990
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19658/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streamin…

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26756: [SPARK-30119][WebUI]Support Pagination 
for Completed Batch Table in Streamin…
URL: https://github.com/apache/spark/pull/26756#issuecomment-561508873
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix 
dataType mismatch issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#discussion_r353577964
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/results/subquery/negative-cases/subq-input-typecheck.sql.out
 ##
 @@ -135,12 +135,12 @@ WHERE
 struct<>
 -- !query 9 output
 org.apache.spark.sql.AnalysisException
-cannot resolve '(named_struct('t4a', t4.`t4a`, 't4b', t4.`t4b`, 't4c', 
t4.`t4c`) IN (listquery()))' due to data type mismatch: 
+cannot resolve '(named_struct('t4a', t4.`t4a`, 't4b', t4.`t4b`, 't4c', 
t4.`t4c`) IN (listquery()))' due to data type mismatch:
 
 Review comment:
   In fact, I do not know why a space is involved, I have tried to remove it, 
but failed. It should does not matter.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streamin…

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26756: [SPARK-30119][WebUI]Support Pagination 
for Completed Batch Table in Streamin…
URL: https://github.com/apache/spark/pull/26756#issuecomment-561508446
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

SparkQA commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch 
issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#issuecomment-561508568
 
 
   **[Test build #114835 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114835/testReport)**
 for PR 26485 at commit 
[`a89b3b4`](https://github.com/apache/spark/commit/a89b3b4a3bee322b5ffc24dc7f37c8c6daf96283).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix 
dataType mismatch issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#discussion_r353577512
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ##
 @@ -472,9 +472,12 @@ object TypeCoercion {
 // RHS is the subquery output.
 val rhs = sub.output
 
-val commonTypes = lhs.zip(rhs).flatMap { case (l, r) =>
-  findCommonTypeForBinaryComparison(l.dataType, r.dataType, conf)
-.orElse(findTightestCommonType(l.dataType, r.dataType))
+val commonTypes = lhs.zip(rhs).flatMap {
+  case (l, r) if !l.dataType.isInstanceOf[DecimalType] &&
 
 Review comment:
   Thanks for your suggestion.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix 
dataType mismatch issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#discussion_r353577420
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ##
 @@ -472,9 +472,12 @@ object TypeCoercion {
 // RHS is the subquery output.
 val rhs = sub.output
 
-val commonTypes = lhs.zip(rhs).flatMap { case (l, r) =>
-  findCommonTypeForBinaryComparison(l.dataType, r.dataType, conf)
-.orElse(findTightestCommonType(l.dataType, r.dataType))
+val commonTypes = lhs.zip(rhs).flatMap {
+  case (l, r) if !l.dataType.isInstanceOf[DecimalType] &&
+!r.dataType.isInstanceOf[DecimalType] =>
+findCommonTypeForBinaryComparison(l.dataType, r.dataType, conf)
+  .orElse(findTightestCommonType(l.dataType, r.dataType))
+  case (l, r) => findWiderTypeForDecimal(l.dataType, r.dataType)
 
 Review comment:
   thanks for your suggestion.
   To unify the logic of `in` and `inSubquery` mentioned by cloud-fan, I simply 
call `findWiderTypeForTwo` here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

turboFei commented on a change in pull request #26485: [SPARK-29860][SQL] Fix 
dataType mismatch issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#discussion_r353577120
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/results/subquery/negative-cases/subq-input-typecheck.sql.out
 ##
 @@ -132,15 +132,6 @@ WHERE
t5c
 FROM t5)
 -- !query 9 schema
-struct<>
+struct
 -- !query 9 output
-org.apache.spark.sql.AnalysisException
-cannot resolve '(named_struct('t4a', t4.`t4a`, 't4b', t4.`t4b`, 't4c', 
t4.`t4c`) IN (listquery()))' due to data type mismatch: 
-The data type of one or more elements in the left hand side of an IN subquery
-is not compatible with the data type of the output of the subquery
 
 Review comment:
   thanks, I have modified them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] iRakson opened a new pull request #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streamin…

2019-12-03 Thread GitBox

iRakson opened a new pull request #26756: [SPARK-30119][WebUI]Support 
Pagination for Completed Batch Table in Streamin…
URL: https://github.com/apache/spark/pull/26756
 
 
   …g Tab
   
   
   
   ### What changes were proposed in this pull request?
   Adding support for pagination in streaming tab for completed batch table.
   
   
   
   ### Why are the changes needed?
   If our streaming job is running for long time and number of batches are huge 
then out of memory error may come while loading the streaming page. Introducing 
pagination will solve this problem and also improve the loading time of page. 
Besides jobs,stages,sql and thrift-server page contains pagination. So it also 
brings consistency.
   
   
   
   ### Does this PR introduce any user-facing change?
   Yes
   
   
   
   ### How was this patch tested?
   Manually. Will attach screenshots later.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support 
LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-561506710
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19657/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... 
ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-561506710
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19657/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... 
ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-561506708
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support 
LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-561506708
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-12-03 Thread GitBox

SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE 
syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-561506326
 
 
   **[Test build #114834 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114834/testReport)**
 for PR 25001 at commit 
[`a891139`](https://github.com/apache/spark/commit/a8911392aa52b883edded53c1765d52648d5adfa).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-12-03 Thread GitBox

beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] 
Support LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#discussion_r353575369
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ##
 @@ -104,19 +103,24 @@ abstract class StringRegexExpression extends 
BinaryExpression
   spark.sql.parser.escapedStringLiterals   false
   > SELECT '%SystemDrive%\\Users\\John' _FUNC_ '\%SystemDrive\%Users%';
   true
+  > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' 
ESCAPE '/';
+  true
   """,
   note = """
 Use RLIKE to match with standard regular expressions.
   """,
   since = "1.0.0")
 // scalastyle:on line.contains.tab
-case class Like(left: Expression, right: Expression) extends 
StringRegexExpression {
+case class Like(left: Expression, right: Expression, escapeCharOpt: 
Option[Char] = None)
 
 Review comment:
   It's OK too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-12-03 Thread GitBox

beliefer commented on a change in pull request #25001: [SPARK-28083][SQL]
Support LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#discussion_r353575345

##
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##
@@ -83,16 +83,15 @@ abstract class StringRegexExpression extends
BinaryExpression
% matches zero or more characters in the input (similar to .* in
posix regular
expressions)

- The escape character is '\'. If an escape character precedes a
special symbol or another
- escape character, the following character is matched literally. It
is invalid to escape
- any other character.
-
Since Spark 2.0, string literals are unescaped in our SQL parser.
For example, in order
to match "\abc", the pattern should be "\\abc".

When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled,
it fallbacks
to Spark 1.6 behavior regarding string literal parsing. For example,
if the config is
enabled, the pattern to match "\abc" should be "\abc".
+ * escape - a optional string added since Spark 3.0. The default escape
character is the '\'.

Review comment:
Thanks for you remind.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-12-03 Thread GitBox

beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] 
Support LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#discussion_r353575285
 
 

 ##
 File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##
 @@ -1202,6 +1203,7 @@ nonReserved
 | DROP
 | ELSE
 | END
+| ESCAPE
 
 Review comment:
   OK.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp 
type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561504421
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19655/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26080: [SPARK-29425][SQL] The 
ownership of a database should be respected
URL: https://github.com/apache/spark/pull/26080#issuecomment-561504471
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and 
Timestamp type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561504421
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19655/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26080: [SPARK-29425][SQL] The ownership of a 
database should be respected
URL: https://github.com/apache/spark/pull/26080#issuecomment-561504471
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26080: [SPARK-29425][SQL] The ownership of a 
database should be respected
URL: https://github.com/apache/spark/pull/26080#issuecomment-561504480
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19656/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and 
Timestamp type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561504415
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected

2019-12-03 Thread GitBox

AmplabJenkins removed a comment on issue #26080: [SPARK-29425][SQL] The 
ownership of a database should be respected
URL: https://github.com/apache/spark/pull/26080#issuecomment-561504480
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19656/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting

2019-12-03 Thread GitBox

zhengruifeng commented on a change in pull request #26739: 
[SPARK-29967][ML][PYTHON] KMeans support instance weighting
URL: https://github.com/apache/spark/pull/26739#discussion_r353573872
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
 ##
 @@ -278,30 +287,32 @@ class KMeans private (
   val bcCenters = sc.broadcast(centers)
 
   // Find the new centers
-  val collected = data.mapPartitions { points =>
+  val collected = data.mapPartitions { pointsAndWeights =>
 val thisCenters = bcCenters.value
 val dims = thisCenters.head.vector.size
 
 val sums = Array.fill(thisCenters.length)(Vectors.zeros(dims))
-val counts = Array.fill(thisCenters.length)(0L)
 
-points.foreach { point =>
-  val (bestCenter, cost) = 
distanceMeasureInstance.findClosest(thisCenters, point)
+// clusterWeightSum is needed to calculate cluster center
+// cluster center =
+// sample1 * weight1/clusterWeightSum + sample2 * 
weight2/clusterWeightSum + ...
+val clusterWeightSum = Array.fill(thisCenters.length)(0.0)
+
+pointsAndWeights.foreach { case (point, weight) =>
+  var (bestCenter, cost) = 
distanceMeasureInstance.findClosest(thisCenters, point)
+  cost *= weight
   costAccum.add(cost)
-  distanceMeasureInstance.updateClusterSum(point, sums(bestCenter))
-  counts(bestCenter) += 1
+  distanceMeasureInstance.updateClusterSum(point, sums(bestCenter), 
weight)
+  clusterWeightSum(bestCenter) += weight
 }
 
-counts.indices.filter(counts(_) > 0).map(j => (j, (sums(j), 
counts(j.iterator
-  }.reduceByKey { case ((sum1, count1), (sum2, count2)) =>
+clusterWeightSum.indices.filter(clusterWeightSum(_) > 0)
+  .map(j => (j, (sums(j), clusterWeightSum(j.iterator
+  }.reduceByKey { case ((sum1, clusterWeightSum1), (sum2, 
clusterWeightSum2)) =>
 axpy(1.0, sum2, sum1)
-(sum1, count1 + count2)
+(sum1, clusterWeightSum1 + clusterWeightSum2)
   }.collectAsMap()
 
-  if (iteration == 0) {
-instr.foreach(_.logNumExamples(collected.values.map(_._2).sum))
-  }
-
 
 Review comment:
   I am OK to add new `instr.log` in other PR.
   Here I prefer to keep `instr.logNumExamples` log the unweighted count, in 
order to keep it in sync with other algs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp 
type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561504415
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-03 Thread GitBox

SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type 
+/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561504005
 
 
   **[Test build #114832 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114832/testReport)**
 for PR 26412 at commit 
[`ae70022`](https://github.com/apache/spark/commit/ae7002232a87bd5c20ff060c95443e1804e5c869).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected

2019-12-03 Thread GitBox

SparkQA commented on issue #26080: [SPARK-29425][SQL] The ownership of a 
database should be respected
URL: https://github.com/apache/spark/pull/26080#issuecomment-561503993
 
 
   **[Test build #114833 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114833/testReport)**
 for PR 26080 at commit 
[`f11e7c8`](https://github.com/apache/spark/commit/f11e7c8e72b319327fea3a8db511b3fec3152384).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-03 Thread GitBox

beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] 
Support ANSI SQL filter clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#discussion_r353571892
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
 ##
 @@ -135,19 +135,25 @@ object AggUtils {
 }
 val distinctAttributes = namedDistinctExpressions.map(_.toAttribute)
 val groupingAttributes = groupingExpressions.map(_.toAttribute)
+val filterWithDistinctAttributes = 
functionsWithDistinct.flatMap(_.filterAttributes.toSeq)
 
 // 1. Create an Aggregate Operator for partial aggregations.
 val partialAggregate: SparkPlan = {
   val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = 
Partial))
   val aggregateAttributes = aggregateExpressions.map(_.resultAttribute)
   // We will group by the original grouping expression, plus an additional 
expression for the
-  // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, 
the grouping
-  // expressions will be [key, value].
+  // DISTINCT column and the referred attributes in the FILTER clause 
associated with each
+  // aggregate function. For example:
+  // 1.for the AVG (DISTINCT value) GROUP BY key, the grouping expression 
will be [key, value];
+  // 2.for AVG (DISTINCT value) Filter (WHERE value2> 20) GROUP BY key, 
the grouping expression
+  // will be [key, value, value2].
 
 Review comment:
   For a query like `SELECT COUNT(DISTINCT a) FILTER (WHERE c > 0), SUM(b) 
FILTER (WHERE d = 0) FROM table` will be
   ```
   Final-AGG-4 (count distinct)
 Shuffle to a single reducer
   PartialMerge-AGG-3 (count distinct, no grouping, apply function COUNT on 
a with c > 0)
 PartialMerge-AGG-2 (grouping on a and c)
   Shuffle by a and c
 Partial-AGG-1 (grouping on a and c, apply function SUM on b with d 
= 0)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting

2019-12-03 Thread GitBox

zhengruifeng commented on a change in pull request #26739: 
[SPARK-29967][ML][PYTHON] KMeans support instance weighting
URL: https://github.com/apache/spark/pull/26739#discussion_r353572931
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
 ##
 @@ -278,30 +287,32 @@ class KMeans private (
   val bcCenters = sc.broadcast(centers)
 
   // Find the new centers
-  val collected = data.mapPartitions { points =>
+  val collected = data.mapPartitions { pointsAndWeights =>
 val thisCenters = bcCenters.value
 val dims = thisCenters.head.vector.size
 
 val sums = Array.fill(thisCenters.length)(Vectors.zeros(dims))
-val counts = Array.fill(thisCenters.length)(0L)
 
-points.foreach { point =>
-  val (bestCenter, cost) = 
distanceMeasureInstance.findClosest(thisCenters, point)
+// clusterWeightSum is needed to calculate cluster center
+// cluster center =
+// sample1 * weight1/clusterWeightSum + sample2 * 
weight2/clusterWeightSum + ...
+val clusterWeightSum = Array.fill(thisCenters.length)(0.0)
+
+pointsAndWeights.foreach { case (point, weight) =>
+  var (bestCenter, cost) = 
distanceMeasureInstance.findClosest(thisCenters, point)
+  cost *= weight
   costAccum.add(cost)
-  distanceMeasureInstance.updateClusterSum(point, sums(bestCenter))
-  counts(bestCenter) += 1
+  distanceMeasureInstance.updateClusterSum(point, sums(bestCenter), 
weight)
+  clusterWeightSum(bestCenter) += weight
 }
 
-counts.indices.filter(counts(_) > 0).map(j => (j, (sums(j), 
counts(j.iterator
-  }.reduceByKey { case ((sum1, count1), (sum2, count2)) =>
+clusterWeightSum.indices.filter(clusterWeightSum(_) > 0)
+  .map(j => (j, (sums(j), clusterWeightSum(j.iterator
+  }.reduceByKey { case ((sum1, clusterWeightSum1), (sum2, 
clusterWeightSum2)) =>
 axpy(1.0, sum2, sum1)
-(sum1, count1 + count2)
+(sum1, clusterWeightSum1 + clusterWeightSum2)
   }.collectAsMap()
 
-  if (iteration == 0) {
-instr.foreach(_.logNumExamples(collected.values.map(_._2).sum))
-  }
-
 
 Review comment:
   I am OK to add new `instr.log` in other PR.
   Here I prefer to keep `instr.logNumExamples` log the unweighted count, in 
order to keep it in sync with other algs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-03 Thread GitBox

beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] 
Support ANSI SQL filter clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#discussion_r353571892
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
 ##
 @@ -135,19 +135,25 @@ object AggUtils {
 }
 val distinctAttributes = namedDistinctExpressions.map(_.toAttribute)
 val groupingAttributes = groupingExpressions.map(_.toAttribute)
+val filterWithDistinctAttributes = 
functionsWithDistinct.flatMap(_.filterAttributes.toSeq)
 
 // 1. Create an Aggregate Operator for partial aggregations.
 val partialAggregate: SparkPlan = {
   val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = 
Partial))
   val aggregateAttributes = aggregateExpressions.map(_.resultAttribute)
   // We will group by the original grouping expression, plus an additional 
expression for the
-  // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, 
the grouping
-  // expressions will be [key, value].
+  // DISTINCT column and the referred attributes in the FILTER clause 
associated with each
+  // aggregate function. For example:
+  // 1.for the AVG (DISTINCT value) GROUP BY key, the grouping expression 
will be [key, value];
+  // 2.for AVG (DISTINCT value) Filter (WHERE value2> 20) GROUP BY key, 
the grouping expression
+  // will be [key, value, value2].
 
 Review comment:
   For a query like `SELECT COUNT(DISTINCT a) FILTER (WHERE c > 0), SUM(b) 
FILTER (WHERE d = 0) FROM table` will be
   ```
   Final-AGG-4 (count distinct)
 Shuffle to a single reducer
   PartialMerge-AGG-3 (count distinct, no grouping, apply function COUNT on 
a with c > 0)
 PartialMerge-AGG-2 (grouping on a and c)
   Shuffle by a
 Partial-AGG-1 (grouping on a and c, apply function SUM on b with d 
= 0)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected

2019-12-03 Thread GitBox

yaooqinn commented on a change in pull request #26080: [SPARK-29425][SQL] The 
ownership of a database should be respected
URL: https://github.com/apache/spark/pull/26080#discussion_r353572328
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
 ##
 @@ -809,6 +846,22 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 }
   }
 
+  override def getDatabaseOwnerName(db: Database): String = {
+
Option(getDatabaseOwnerNameMethod.invoke(db)).map(_.asInstanceOf[String]).getOrElse("")
+  }
+
+  override def setDatabaseOwnerName(db: Database, owner: String): Unit = {
+setDatabaseOwnerNameMethod.invoke(db, owner)
+  }
+
+  override def getDatabaseOwnerType(db: Database): String = {
+Option(getDatabaseOwnerTypeMethod.invoke(db))
+  
.map(_.asInstanceOf[PrincipalType].name()).getOrElse(PrincipalType.USER.name())
 
 Review comment:
   agree


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-03 Thread GitBox

beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] 
Support ANSI SQL filter clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#discussion_r353571892
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
 ##
 @@ -135,19 +135,25 @@ object AggUtils {
 }
 val distinctAttributes = namedDistinctExpressions.map(_.toAttribute)
 val groupingAttributes = groupingExpressions.map(_.toAttribute)
+val filterWithDistinctAttributes = 
functionsWithDistinct.flatMap(_.filterAttributes.toSeq)
 
 // 1. Create an Aggregate Operator for partial aggregations.
 val partialAggregate: SparkPlan = {
   val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = 
Partial))
   val aggregateAttributes = aggregateExpressions.map(_.resultAttribute)
   // We will group by the original grouping expression, plus an additional 
expression for the
-  // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, 
the grouping
-  // expressions will be [key, value].
+  // DISTINCT column and the referred attributes in the FILTER clause 
associated with each
+  // aggregate function. For example:
+  // 1.for the AVG (DISTINCT value) GROUP BY key, the grouping expression 
will be [key, value];
+  // 2.for AVG (DISTINCT value) Filter (WHERE value2> 20) GROUP BY key, 
the grouping expression
+  // will be [key, value, value2].
 
 Review comment:
   For a query like `SELECT COUNT(DISTINCT a) FILTER (WHERE c > 0), SUM(b) 
FILTER (WHERE d = 0) FROM table` will be
   ```
   AGG-4 (count distinct)
 Shuffle to a single reducer
   Partial-AGG-3 (count distinct, no grouping, apply function COUNT on a 
with c > 0)
 Partial-AGG-2 (grouping on a and c)
   Shuffle by a
 Partial-AGG-1 (grouping on a and c, apply function SUM on b with d 
= 0)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-03 Thread GitBox

beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] 
Support ANSI SQL filter clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#discussion_r353571892
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
 ##
 @@ -135,19 +135,25 @@ object AggUtils {
 }
 val distinctAttributes = namedDistinctExpressions.map(_.toAttribute)
 val groupingAttributes = groupingExpressions.map(_.toAttribute)
+val filterWithDistinctAttributes = 
functionsWithDistinct.flatMap(_.filterAttributes.toSeq)
 
 // 1. Create an Aggregate Operator for partial aggregations.
 val partialAggregate: SparkPlan = {
   val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = 
Partial))
   val aggregateAttributes = aggregateExpressions.map(_.resultAttribute)
   // We will group by the original grouping expression, plus an additional 
expression for the
-  // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, 
the grouping
-  // expressions will be [key, value].
+  // DISTINCT column and the referred attributes in the FILTER clause 
associated with each
+  // aggregate function. For example:
+  // 1.for the AVG (DISTINCT value) GROUP BY key, the grouping expression 
will be [key, value];
+  // 2.for AVG (DISTINCT value) Filter (WHERE value2> 20) GROUP BY key, 
the grouping expression
+  // will be [key, value, value2].
 
 Review comment:
   For a query like SELECT COUNT(DISTINCT a) FILTER (WHERE c > 0), SUM(b) 
FILTER (WHERE d = 0) FROM table will be
   ```
   AGG-4 (count distinct)
 Shuffle to a single reducer
   Partial-AGG-3 (count distinct, no grouping, apply function COUNT on a 
with c > 0)
 Partial-AGG-2 (grouping on a and c)
   Shuffle by a
 Partial-AGG-1 (grouping on a and c, apply function SUM on b with d 
= 0)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LantaoJin commented on issue #26754: [SPARK-30115][SQL] Improve limit only query on datasource table

2019-12-03 Thread GitBox

LantaoJin commented on issue #26754: [SPARK-30115][SQL] Improve limit only 
query on datasource table
URL: https://github.com/apache/spark/pull/26754#issuecomment-561502217
 
 
   cc @cloud-fan @dongjoon-hyun @HyukjinKwon 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType 
mismatch issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#issuecomment-561499842
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114820/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans support instance weighting

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26739: [SPARK-29967][ML][PYTHON] KMeans 
support instance weighting
URL: https://github.com/apache/spark/pull/26739#issuecomment-56155
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19654/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery.

2019-12-03 Thread GitBox

AmplabJenkins commented on issue #26485: [SPARK-29860][SQL] Fix dataType 
mismatch issue for InSubquery.
URL: https://github.com/apache/spark/pull/26485#issuecomment-561499833
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1296 matches

Mail list logo