[GitHub] [spark] AmplabJenkins removed a comment on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667819031







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #29331:
URL: https://github.com/apache/spark/pull/29331#issuecomment-667819083







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


AmplabJenkins commented on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667819031







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-02 Thread GitBox


AmplabJenkins commented on pull request #29331:
URL: https://github.com/apache/spark/pull/29331#issuecomment-667819083







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-02 Thread GitBox


SparkQA commented on pull request #29331:
URL: https://github.com/apache/spark/pull/29331#issuecomment-667818351


   **[Test build #126955 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126955/testReport)**
 for PR 29331 at commit 
[`0cf67c4`](https://github.com/apache/spark/commit/0cf67c43d225c198607d6957fc26b64a26aeefaa).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-08-02 Thread GitBox


SparkQA commented on pull request #29291:
URL: https://github.com/apache/spark/pull/29291#issuecomment-667818385


   **[Test build #126956 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126956/testReport)**
 for PR 29291 at commit 
[`883973b`](https://github.com/apache/spark/commit/883973b9bc8a9c530a002cf4b48217546929fb5e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667816168


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126952/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667816161


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


SparkQA removed a comment on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667807499


   **[Test build #126952 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126952/testReport)**
 for PR 28953 at commit 
[`70d8719`](https://github.com/apache/spark/commit/70d8719e8877ac7b4f4d0b0b8bb309ee1611df07).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-02 Thread GitBox


dongjoon-hyun opened a new pull request #29331:
URL: https://github.com/apache/spark/pull/29331


   ### What changes were proposed in this pull request?
   
   This PR aims to add `StorageLevel.DISK_ONLY_3` as a built-in `StorageLevel`.
   
   ### Why are the changes needed?
   
   Disaggregate clusters or clusters without storage services like HDFS are 
increasing. Previously, the users were able to use similar `MEMORY_AND_DISK_2` 
or a user-created StorageLevel . This PR aims to support it officially.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. This provides a new built-in option.
   
   ### How was this patch tested?
   
   Pass the GitHub Action or Jenkins with the revised test cases.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


AmplabJenkins commented on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667816161







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


SparkQA commented on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667816089


   **[Test build #126952 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126952/testReport)**
 for PR 28953 at commit 
[`70d8719`](https://github.com/apache/spark/commit/70d8719e8877ac7b4f4d0b0b8bb309ee1611df07).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken edited a comment on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-08-02 Thread GitBox


leanken edited a comment on pull request #29304:
URL: https://github.com/apache/spark/pull/29304#issuecomment-667811949


   @agrawaldevesh I am finally understand the complexity of multi column 
support, thanks to your remind again and again, feel sorry about my naive. Do 
you think it still worth to carry on to support multi column? sincerely ask for 
your suggestion.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #29291:
URL: https://github.com/apache/spark/pull/29291#issuecomment-667815478







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29330: [SPARK-32432] Added support for reading ORC/Parquet files with SymlinkTextInputFormat

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #29330:
URL: https://github.com/apache/spark/pull/29330#issuecomment-667815514







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-08-02 Thread GitBox


AmplabJenkins commented on pull request #29291:
URL: https://github.com/apache/spark/pull/29291#issuecomment-667815478







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29330: [SPARK-32432] Added support for reading ORC/Parquet files with SymlinkTextInputFormat

2020-08-02 Thread GitBox


AmplabJenkins commented on pull request #29330:
URL: https://github.com/apache/spark/pull/29330#issuecomment-667815514







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29330: [SPARK-32432] Added support for reading ORC/Parquet files with SymlinkTextInputFormat

2020-08-02 Thread GitBox


SparkQA commented on pull request #29330:
URL: https://github.com/apache/spark/pull/29330#issuecomment-667814945


   **[Test build #126954 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126954/testReport)**
 for PR 29330 at commit 
[`c97f003`](https://github.com/apache/spark/commit/c97f0031eb7c18d53ef6c302213e8766cb5d2e99).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] manuzhang commented on pull request #29321: [SPARK-32083][SQL][3.0] AQE coalesce should at least return one partition

2020-08-02 Thread GitBox


manuzhang commented on pull request #29321:
URL: https://github.com/apache/spark/pull/29321#issuecomment-667814511


   @cloud-fan 
   The title seems not to be related to the partial backport.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs

2020-08-02 Thread GitBox


imback82 commented on a change in pull request #29328:
URL: https://github.com/apache/spark/pull/29328#discussion_r464203736



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -245,15 +245,22 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
 "read files of Hive data source directly.")
 }
 
+val updatedPaths = if (paths.length == 1) {

Review comment:
   +1 for your suggestion





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-08-02 Thread GitBox


leanken commented on pull request #29304:
URL: https://github.com/apache/spark/pull/29304#issuecomment-667814305


   > @agrawaldevesh I am finally understand the complexity of multi column 
support, thanks to your remind again and again, feel sorry about my naive. Do 
you think it still worth to carry on to support multi column? sincerely ask for 
you suggestion.
   
   as for how to support it, i think it might be
   
   1. scan buildSide to gather information about which columns contains null
   2. build HashedRelation with original input include anyNull Key
   3. building a extra HashedRelation which is all combination null padding.
   
   when probe doing on streamedSide
   1. if streamedSide key is a all non-null value, using the gathered null 
information on right side, to try find match in original HashedRelation, for 
example (1,2,3) with buildSide c2, c3 with null value, try match using 
following keys
   (1,2,3) (1,null,3)(1,2,null)(1,null,null)
   2. if streamedSide key contains any column which is null value, for example 
(null, 2, 3), use the key to look up in extra hashedRelation because it 
contains all possible combinations.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs

2020-08-02 Thread GitBox


cloud-fan commented on a change in pull request #29328:
URL: https://github.com/apache/spark/pull/29328#discussion_r464203078



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -245,15 +245,22 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
 "read files of Hive data source directly.")
 }
 
+val updatedPaths = if (paths.length == 1) {

Review comment:
   If we are worried about silent result changing, we can fail if there are 
`path` option and `load` is called with path parameters. The error message 
should ask users to either remove the `path` options, we put it into the `load` 
parameters.

##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -245,15 +245,22 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
 "read files of Hive data source directly.")
 }
 
+val updatedPaths = if (paths.length == 1) {

Review comment:
   If we are worried about silent result changing, we can fail if there are 
`path` option and `load` is called with path parameters. The error message 
should ask users to either remove the `path` options, or put it into the `load` 
parameters.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] moomindani opened a new pull request #29330: [SPARK-32432] Added support for reading ORC/Parquet files with SymlinkTextInputFormat

2020-08-02 Thread GitBox


moomindani opened a new pull request #29330:
URL: https://github.com/apache/spark/pull/29330


   ### What changes were proposed in this pull request?
   
   This pull-request is to add support for reading ORC/Parquet files with 
SymlinkTextInputFormat in Apache Spark.
   
   ### Why are the changes needed?
   
   Hive style symlink (SymlinkTextInputFormat) is commonly used in different 
analytic engines including prestodb and prestosql.
   Currently SymlinkTextInputFormat works with JSON/CSV files but does not work 
with ORC/Parquet files in Apache Spark (and Apache Hive).
   On the other hand, prestodb and prestosql support SymlinkTextInputFormat 
with ORC/Parquet files.
   This pull-request is to add support for reading ORC/Parquet files with 
SymlinkTextInputFormat in Apache Spark.
   
   See details in the JIRA.  SPARK-32432
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes.
   Currently Spark returns exceptions if users try to use 
SymlinkTextInputFormat with ORC/Parquet files.
   With this patch, Spark can handle symlink which indicates locations of 
ORC/Parquet files.
   
   ### How was this patch tested?
   
   I added a new test suite `SymlinkSuite` and confirmed it passed.
   
   ```
   $ ./build/sbt "project hive" "test-only 
org.apache.spark.sql.hive.SymlinkSuite"
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs

2020-08-02 Thread GitBox


cloud-fan commented on a change in pull request #29328:
URL: https://github.com/apache/spark/pull/29328#discussion_r464202756



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -245,15 +245,22 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
 "read files of Hive data source directly.")
 }
 
+val updatedPaths = if (paths.length == 1) {

Review comment:
   I think the most intuitive behavior is to drop the `path` option if 
`load` is called with path parameters, no matter it's one or more paths.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-08-02 Thread GitBox


leanken commented on pull request #29304:
URL: https://github.com/apache/spark/pull/29304#issuecomment-667811949


   @agrawaldevesh I am finally understand the complexity of multi column 
support, thanks to your remind again and again, feel sorry about my naive. Do 
you think it still worth to carry on to support multi column? sincerely ask for 
you suggestion.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #29291:
URL: https://github.com/apache/spark/pull/29291#issuecomment-667810979







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #29328:
URL: https://github.com/apache/spark/pull/29328#issuecomment-667810713


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126949/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs

2020-08-02 Thread GitBox


SparkQA removed a comment on pull request #29328:
URL: https://github.com/apache/spark/pull/29328#issuecomment-667792091


   **[Test build #126949 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126949/testReport)**
 for PR 29328 at commit 
[`296d4bb`](https://github.com/apache/spark/commit/296d4bbab647189fb32f3ffc0051086f244bcfca).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #29328:
URL: https://github.com/apache/spark/pull/29328#issuecomment-667810710


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-08-02 Thread GitBox


AmplabJenkins commented on pull request #29291:
URL: https://github.com/apache/spark/pull/29291#issuecomment-667810979







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs

2020-08-02 Thread GitBox


AmplabJenkins commented on pull request #29328:
URL: https://github.com/apache/spark/pull/29328#issuecomment-667810710







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs

2020-08-02 Thread GitBox


SparkQA commented on pull request #29328:
URL: https://github.com/apache/spark/pull/29328#issuecomment-667810614


   **[Test build #126949 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126949/testReport)**
 for PR 29328 at commit 
[`296d4bb`](https://github.com/apache/spark/commit/296d4bbab647189fb32f3ffc0051086f244bcfca).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #29329: Investigate JUnit XML test reporter

2020-08-02 Thread GitBox


HyukjinKwon closed pull request #29329:
URL: https://github.com/apache/spark/pull/29329


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29320: [WIP][SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #29320:
URL: https://github.com/apache/spark/pull/29320#issuecomment-667809951







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request #29329: Investigate JUnit XML test reporter

2020-08-02 Thread GitBox


HyukjinKwon opened a new pull request #29329:
URL: https://github.com/apache/spark/pull/29329


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-08-02 Thread GitBox


SparkQA commented on pull request #29291:
URL: https://github.com/apache/spark/pull/29291#issuecomment-667810108


   **[Test build #126953 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126953/testReport)**
 for PR 29291 at commit 
[`39583dd`](https://github.com/apache/spark/commit/39583dde43da9580245cd34768d3f613fab8b090).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29320: [WIP][SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation

2020-08-02 Thread GitBox


AmplabJenkins commented on pull request #29320:
URL: https://github.com/apache/spark/pull/29320#issuecomment-667809951







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29320: [WIP][SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation

2020-08-02 Thread GitBox


SparkQA removed a comment on pull request #29320:
URL: https://github.com/apache/spark/pull/29320#issuecomment-667802282


   **[Test build #126951 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126951/testReport)**
 for PR 29320 at commit 
[`6d5f6ef`](https://github.com/apache/spark/commit/6d5f6ef069cb8e0fbb65616ca98f919cdd367fda).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29320: [WIP][SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation

2020-08-02 Thread GitBox


SparkQA commented on pull request #29320:
URL: https://github.com/apache/spark/pull/29320#issuecomment-667809709


   **[Test build #126951 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126951/testReport)**
 for PR 29320 at commit 
[`6d5f6ef`](https://github.com/apache/spark/commit/6d5f6ef069cb8e0fbb65616ca98f919cdd367fda).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667808043







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


AmplabJenkins commented on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667808043







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


SparkQA commented on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667807499


   **[Test build #126952 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126952/testReport)**
 for PR 28953 at commit 
[`70d8719`](https://github.com/apache/spark/commit/70d8719e8877ac7b4f4d0b0b8bb309ee1611df07).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29322: [SPARK-32511][SQL] Add dropFields method to Column class

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #29322:
URL: https://github.com/apache/spark/pull/29322#issuecomment-667804781







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29322: [SPARK-32511][SQL] Add dropFields method to Column class

2020-08-02 Thread GitBox


AmplabJenkins commented on pull request #29322:
URL: https://github.com/apache/spark/pull/29322#issuecomment-667804781







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29322: [SPARK-32511][SQL] Add dropFields method to Column class

2020-08-02 Thread GitBox


SparkQA commented on pull request #29322:
URL: https://github.com/apache/spark/pull/29322#issuecomment-667804240


   **[Test build #126947 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126947/testReport)**
 for PR 29322 at commit 
[`19587e8`](https://github.com/apache/spark/commit/19587e830a7889616583f48b44da61ca296c5215).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29322: [SPARK-32511][SQL] Add dropFields method to Column class

2020-08-02 Thread GitBox


SparkQA removed a comment on pull request #29322:
URL: https://github.com/apache/spark/pull/29322#issuecomment-667748670


   **[Test build #126947 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126947/testReport)**
 for PR 29322 at commit 
[`19587e8`](https://github.com/apache/spark/commit/19587e830a7889616583f48b44da61ca296c5215).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29320: [WIP][SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation

2020-08-02 Thread GitBox


viirya commented on a change in pull request #29320:
URL: https://github.com/apache/spark/pull/29320#discussion_r464195184



##
File path: python/docs/source/index.rst
##
@@ -21,8 +21,42 @@
 PySpark Documentation
 =
 
+PySpark is an interface for Apache Spark in Python. It not only allows you to 
write
+Spark applications using Python APIs, but also provides the PySpark shell for
+interactively analyzing your data in a distributed environment. PySpark 
supports most
+of Spark's features such as Spark SQL, DataFrmae, Streaming, MLlib
+(Machine Learning) and Spark Core.
+
+.. image:: ../../../docs/img/pyspark-components.png
+  :alt: PySpark Compoenents
+
+**Spark SQL and DataFrame**
+
+Spark SQL is a Spark module for structured data processing. It provides
+a programming abstraction called DataFrame and can also act as distributed
+SQL query engine.
+
+**Streaming**
+
+Running on top of Spark, the streaming feature in Apache Spark enables powerful
+interactive and analytical applications across both streaming and historical 
data,
+while inheriting Spark’s ease of use and fault tolerance characteristics.
+
+**MLlib**
+
+Built on top of Spark, MLlib is a scalable machine learning library that 
provides
+a uniform set of high-level APIs that help users create and tune practical 
machine
+learning pipelines.
+
+**Spark Core**
+
+Spark Core is the underlying general execution engine for the Spark platform 
that all
+other functionality is built on top of. It provides an RDD (Resilient 
Disributed Dataset)

Review comment:
   Disributed -> Distributed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path

2020-08-02 Thread GitBox


cloud-fan commented on a change in pull request #28527:
URL: https://github.com/apache/spark/pull/28527#discussion_r464195071



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
##
@@ -350,6 +358,16 @@ class SessionCatalog(
 }
   }
 
+  private def makeQualifiedTablePath(locationUri: URI, database: String): URI 
= {
+if (locationUri.isAbsolute) {
+  locationUri
+} else {
+  val dbName = formatDatabaseName(database)
+  val dbLocation = 
makeQualifiedDBPath(getDatabaseMetadata(dbName).locationUri)

Review comment:
   I'm a bit concerned about it as it adds an extra database lookup. Is it 
better to push this work to the underlying external catalog?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29320: [WIP][SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation

2020-08-02 Thread GitBox


viirya commented on a change in pull request #29320:
URL: https://github.com/apache/spark/pull/29320#discussion_r464194094



##
File path: python/docs/source/index.rst
##
@@ -21,8 +21,42 @@
 PySpark Documentation
 =
 
+PySpark is an interface for Apache Spark in Python. It not only allows you to 
write
+Spark applications using Python APIs, but also provides the PySpark shell for
+interactively analyzing your data in a distributed environment. PySpark 
supports most
+of Spark's features such as Spark SQL, DataFrmae, Streaming, MLlib

Review comment:
   DataFrmae -> DataFrame





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29320: [WIP][SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #29320:
URL: https://github.com/apache/spark/pull/29320#issuecomment-667802688







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29320: [WIP][SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation

2020-08-02 Thread GitBox


AmplabJenkins commented on pull request #29320:
URL: https://github.com/apache/spark/pull/29320#issuecomment-667802688







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29320: [WIP][SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation

2020-08-02 Thread GitBox


SparkQA commented on pull request #29320:
URL: https://github.com/apache/spark/pull/29320#issuecomment-667802282


   **[Test build #126951 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126951/testReport)**
 for PR 29320 at commit 
[`6d5f6ef`](https://github.com/apache/spark/commit/6d5f6ef069cb8e0fbb65616ca98f919cdd367fda).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29169: [SPARK-32357][INFRA] Add a step in GitHub Actions to show failed tests

2020-08-02 Thread GitBox


viirya commented on pull request #29169:
URL: https://github.com/apache/spark/pull/29169#issuecomment-667801606


   @HyukjinKwon Hi, this is for a while. Do you have some more thoughts? Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre and Hadoop to 3.2.1

2020-08-02 Thread GitBox


viirya commented on pull request #29326:
URL: https://github.com/apache/spark/pull/29326#issuecomment-667801138


   It is a trouble that hive-exec uses a method that became package-private 
since Guava version 20. So there is incompatibility with Guava versions > 19.0.
   
   ```
   sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: 
java.lang.IllegalAccessError: tried to access method 
com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
 from class org.apache.hadoop.hive.ql.exec.FetchOperator
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108)
at 
org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
   ```
   
   hive-exec doesn't shade Guava until 
https://issues.apache.org/jira/browse/HIVE-22126 that targets 4.0.0.
   
   This seems a dead end for upgrading Guava in Spark for now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667799383


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126950/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


SparkQA commented on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667799372


   **[Test build #126950 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126950/testReport)**
 for PR 28953 at commit 
[`7ca42b1`](https://github.com/apache/spark/commit/7ca42b1cbb0917658874a058c999f092a290fcd8).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


AmplabJenkins commented on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667799382







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


SparkQA removed a comment on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667798788


   **[Test build #126950 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126950/testReport)**
 for PR 28953 at commit 
[`7ca42b1`](https://github.com/apache/spark/commit/7ca42b1cbb0917658874a058c999f092a290fcd8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667799185







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


AmplabJenkins commented on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667799185







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-08-02 Thread GitBox


SparkQA commented on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-667798788


   **[Test build #126950 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126950/testReport)**
 for PR 28953 at commit 
[`7ca42b1`](https://github.com/apache/spark/commit/7ca42b1cbb0917658874a058c999f092a290fcd8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #29328:
URL: https://github.com/apache/spark/pull/29328#issuecomment-667792444







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs

2020-08-02 Thread GitBox


AmplabJenkins commented on pull request #29328:
URL: https://github.com/apache/spark/pull/29328#issuecomment-667792444







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29328: [WIP][SPARK-32516][SQL] 'path' option should be treated consistently when loading dataframes for different APIs

2020-08-02 Thread GitBox


SparkQA commented on pull request #29328:
URL: https://github.com/apache/spark/pull/29328#issuecomment-667792091


   **[Test build #126949 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126949/testReport)**
 for PR 29328 at commit 
[`296d4bb`](https://github.com/apache/spark/commit/296d4bbab647189fb32f3ffc0051086f244bcfca).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29321: [SPARK-32083][SQL][3.0] AQE coalesce should at least return one partition

2020-08-02 Thread GitBox


AmplabJenkins removed a comment on pull request #29321:
URL: https://github.com/apache/spark/pull/29321#issuecomment-667789608







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29321: [SPARK-32083][SQL][3.0] AQE coalesce should at least return one partition

2020-08-02 Thread GitBox


AmplabJenkins commented on pull request #29321:
URL: https://github.com/apache/spark/pull/29321#issuecomment-667789608







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29321: [SPARK-32083][SQL][3.0] AQE coalesce should at least return one partition

2020-08-02 Thread GitBox


SparkQA commented on pull request #29321:
URL: https://github.com/apache/spark/pull/29321#issuecomment-667789287


   **[Test build #126948 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126948/testReport)**
 for PR 29321 at commit 
[`6a06fba`](https://github.com/apache/spark/commit/6a06fba70cce84cf23b6d85951fb99d25c7adcc7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken edited a comment on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-08-02 Thread GitBox


leanken edited a comment on pull request #29304:
URL: https://github.com/apache/spark/pull/29304#issuecomment-667788398


   I just found out a negative case for it 
   it should return (1,2,3) in expansion solution, but it return nothing in 
BNLJ. 
   you are right about the correctness, let me rethink and come back to you 
later.
   
   ```
   spark.sql(
   """
 |CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES
 |  (1, 2, 3)
 |  AS m(a, b, c)
   """.stripMargin).collect()
   
 spark.sql(
   """
 |CREATE TEMPORARY VIEW s AS SELECT * FROM VALUES
 |  (1, null, 3)
 |  AS s(c, d, e)
   """.stripMargin).collect()
   
 spark.sql(
   """
 |select * from m where (a,b,c) not in (select * from s)
   """.stripMargin).collect().foreach(println)
   ```
   
   and we should do something on streamedSide too, if we want this hash lookup 
to apply correctly.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-08-02 Thread GitBox


cloud-fan commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r464181036



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Column.scala
##
@@ -871,6 +871,72 @@ class Column(val expr: Expression) extends Logging {
*/
   def getItem(key: Any): Column = withExpr { UnresolvedExtractValue(expr, 
Literal(key)) }
 
+  // scalastyle:off line.size.limit
+  /**
+   * An expression that adds/replaces field in `StructType` by name.
+   *
+   * {{{
+   *   val df = sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
+   *   df.select($"struct_col".withField("c", lit(3)))

Review comment:
   Yes, please





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-08-02 Thread GitBox


leanken commented on pull request #29304:
URL: https://github.com/apache/spark/pull/29304#issuecomment-667788398


   I just found out a negative case for it 
   it should return (1,2,3) in expansion solution, but it return nothing in 
BNLJ. 
   you are right about the correctness, let me rethink and come back to you 
later.
   
   ```
   spark.sql(
   """
 |CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES
 |  (1, 2, 3)
 |  AS m(a, b, c)
   """.stripMargin).collect()
   
 spark.sql(
   """
 |CREATE TEMPORARY VIEW s AS SELECT * FROM VALUES
 |  (1, null, 3)
 |  AS s(c, d, e)
   """.stripMargin).collect()
   
 spark.sql(
   """
 |select * from m where (a,b,c) not in (select * from s)
   """.stripMargin).collect().foreach(println)
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken removed a comment on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-08-02 Thread GitBox


leanken removed a comment on pull request #29304:
URL: https://github.com/apache/spark/pull/29304#issuecomment-667786567


   
![image](https://user-images.githubusercontent.com/17242071/89143652-e8b49b00-d57d-11ea-8fd5-b0f03f812cf3.png)
   
   `build a secondary access structure`
   In my case, I am building all possible secondary access structure 
beforehand. @agrawaldevesh 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-08-02 Thread GitBox


leanken commented on pull request #29304:
URL: https://github.com/apache/spark/pull/29304#issuecomment-667786567


   
![image](https://user-images.githubusercontent.com/17242071/89143652-e8b49b00-d57d-11ea-8fd5-b0f03f812cf3.png)
   
   `build a secondary access structure`
   In my case, I am building all possible secondary access structure 
beforehand. @agrawaldevesh 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-08-02 Thread GitBox


fqaiser94 commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r464177138



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Column.scala
##
@@ -871,6 +871,72 @@ class Column(val expr: Expression) extends Logging {
*/
   def getItem(key: Any): Column = withExpr { UnresolvedExtractValue(expr, 
Literal(key)) }
 
+  // scalastyle:off line.size.limit
+  /**
+   * An expression that adds/replaces field in `StructType` by name.
+   *
+   * {{{
+   *   val df = sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
+   *   df.select($"struct_col".withField("c", lit(3)))

Review comment:
   I failed to write a test case to cover this scenario, my bad. 
   And yea, I just tried this example again, and I can see that it fails. 
   The issue is that I `override foldable` for this `Unevaluable` Expression. 
And so, when `foldable` returns true, Spark tries to evaluate the expression 
and it fails at that point. 
   I kind-of realized this as well recently and in my PR for `dropFields` 
[here](https://github.com/apache/spark/pull/29322/files#diff-c1758d627a06084e577be0d33d47f44eL566),
 I've fixed the issue (basically i just don't `override foldable` anymore, 
which by default returns `false`). 
   I guess I should submit a follow-up PR to fix this immediately with 
associated unit tests? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-08-02 Thread GitBox


cloud-fan commented on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-667784890


   @skambha the `sum` shouldn't fail without ANSI mode, this PR fixes it.
   
   It's indeed a bug that we can write an overflowed decimal to UnsafeRow but 
can't read it. The `sum` is also buggy but we can't backport the fix due to 
streaming compatibility reasons.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-08-02 Thread GitBox


fqaiser94 commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r464177138



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Column.scala
##
@@ -871,6 +871,72 @@ class Column(val expr: Expression) extends Logging {
*/
   def getItem(key: Any): Column = withExpr { UnresolvedExtractValue(expr, 
Literal(key)) }
 
+  // scalastyle:off line.size.limit
+  /**
+   * An expression that adds/replaces field in `StructType` by name.
+   *
+   * {{{
+   *   val df = sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
+   *   df.select($"struct_col".withField("c", lit(3)))

Review comment:
   I failed to write a test case to cover this scenario, my bad. 
   And yea, I just tried this example again, and I can see that it fails. 
   The issue is that I `override foldable` for this `Unevaluable` Expression. 
And so, when `foldable` returns true, Spark tries to evaluate the expression 
and it fails at that point. 
   I kind-of realized this as well recently and in my PR for `dropFields` 
[here](https://github.com/apache/spark/pull/29322/files#diff-c1758d627a06084e577be0d33d47f44eL566),
 I've fixed the issue (basically i just don't `override foldable` anymore, 
which my default returns `false`). 
   I guess I should submit a follow-up PR to fix this immediately with 
associated unit tests? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken removed a comment on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-08-02 Thread GitBox


leanken removed a comment on pull request #29304:
URL: https://github.com/apache/spark/pull/29304#issuecomment-667783096


   
![image](https://user-images.githubusercontent.com/17242071/89143099-fb2dd500-d57b-11ea-881e-9d248403db9d.png)
   
   this is quite the same with expansion, first it go through all the data in 
buildSide, to gather information about which column might have exists with null 
values, and let's say there are c1 c2 c3 in buildSide, after scan, found that 
only c1 c2 with null values
   
   then left record (1, 2, 3) will try to found match (1, null, 3) (null, 2, 3) 
(null, null, 3)
   
   And what i am trying to do is that I will not scan the buildSide to gather 
null information, I just assume that every column might have the null value, 
and with all combination of null padding.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-08-02 Thread GitBox


fqaiser94 commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r464177138



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Column.scala
##
@@ -871,6 +871,72 @@ class Column(val expr: Expression) extends Logging {
*/
   def getItem(key: Any): Column = withExpr { UnresolvedExtractValue(expr, 
Literal(key)) }
 
+  // scalastyle:off line.size.limit
+  /**
+   * An expression that adds/replaces field in `StructType` by name.
+   *
+   * {{{
+   *   val df = sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
+   *   df.select($"struct_col".withField("c", lit(3)))

Review comment:
   I failed to write a test case to cover this scenario, my bad. 
   And yea, I just tried this example again, and I can see that it fails. 
   The issue is that I `override foldable` for this `Unevaluable` Expression. 
And so, when `foldable` returns true, Spark tries to evaluate the expression 
and it fails at that point. 
   I kind-of realized this as well recently and in my PR for `dropFields` 
[here](https://github.com/apache/spark/pull/29322/files#diff-c1758d627a06084e577be0d33d47f44eL566),
 I've fixed the issue (basically i just don't `override foldable` anymore). 
   I guess I should submit a follow-up PR to fix this immediately with 
associated unit tests? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken edited a comment on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-08-02 Thread GitBox


leanken edited a comment on pull request #29304:
URL: https://github.com/apache/spark/pull/29304#issuecomment-667783096


   
![image](https://user-images.githubusercontent.com/17242071/89143099-fb2dd500-d57b-11ea-881e-9d248403db9d.png)
   
   this is quite the same with expansion, first it go through all the data in 
buildSide, to gather information about which column might have exists with null 
values, and let's say there are c1 c2 c3 in buildSide, after scan, found that 
only c1 c2 with null values
   
   then left record (1, 2, 3) will try to found match (1, null, 3) (null, 2, 3) 
(null, null, 3)
   
   And what i am trying to do is that I will not scan the buildSide to gather 
null information, I just assume that every column might have the null value, 
and with all combination of null padding.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #29318: [SPARK-32509][SQL] Ignore unused DPP True Filter in Canonicalization

2020-08-02 Thread GitBox


cloud-fan closed pull request #29318:
URL: https://github.com/apache/spark/pull/29318


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29318: [SPARK-32509][SQL] Ignore unused DPP True Filter in Canonicalization

2020-08-02 Thread GitBox


cloud-fan commented on pull request #29318:
URL: https://github.com/apache/spark/pull/29318#issuecomment-667783276


   thanks, merging to master/3.0!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-08-02 Thread GitBox


leanken commented on pull request #29304:
URL: https://github.com/apache/spark/pull/29304#issuecomment-667783096


   
![image](https://user-images.githubusercontent.com/17242071/89143099-fb2dd500-d57b-11ea-881e-9d248403db9d.png)
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken edited a comment on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-08-02 Thread GitBox


leanken edited a comment on pull request #29304:
URL: https://github.com/apache/spark/pull/29304#issuecomment-667781762


   > > Step 2: Say there is a right (build) side row (1, null, 3). It should be 
counted as a match against a row on the left side (1, 2, 3). What makes this 
tricky is that say say you have a build row (1, 5, 3), then (1, 5, 3) should 
NOT match the probe row (1, 2, 3). But if you explode (1, 5, 3) into a (1, 
null, 3) then it might incorrectly match (1, 2, 3). How do you handle both of 
these subcases ?
   > > Step 3: Consider a build row (1, 5, null), it should match the left row 
(1, null, 3). In addition, it should not match the build row (1, 5, 7). How do 
you handle these subcases ?
   > 
   > Above, when I mean "match" -- I mean that the left side would match the 
build row and WON'T be returned. Whereas with non match I mean that the left 
side would not match the build side and thus WILL be returned. We have 
different meanings for the words 'match' and 'not-match'. So please read my 
'match' == 'NAAJ should not return the left row', and conversely for non-match.
   > 
   > I would really really really encourage you to:
   > 
   > * Please reread the paper section 6.2 in its entirety many times and 
understand the above cases. I had to read it many times myself. It is very 
tricky as you pointed out.
   > * Add them as test cases comparing them with the original BNLJ 
implementation, both the negative and positive cases.
   > 
   > This is really tricky and I don't think the current implementation you 
have of expanding the hash table with a simple lookup on the stream side would 
suffice. I will also try to play around with your PR locally and run them as 
tests to convince myself. I hope I am wrong ;-).
   
   Yes, I do understand of the Paper 6.2. Basically the paper describe the 
algorithm in the perspective of StreamedSide. But the expansion state the 
perspective of BuildSide. Let's just do revert inferencing of the following 
case.
   
   if buildSide exist a row (1,2,3), what data in StreamedSide will evaluated 
as TRUE OR UNKNOWN and dropped.
   it should be 
   (null, 2, 3) (1, null, 3) (1, 2, null) (null, null, 3) (null, 2, null) (1, 
null, null) and of course (1,2,3)
   right?
   
   Only in above combination, streamedSide row will be dropped besides 
non-all-null case, right?
   Once you find a exact same record in HashedRelation include null columns, 
you dropped.
   
   ```
   if (lookupKey.allNull()) {
 false
   } else {
 // Anti Join: Drop the row on the streamed side if it is a 
match on the build
 hashed.get(lookupKey) == null
   }
   ```
   
   I suppose this solution is working because it's passing all the not in cases 
in SQLQueryTestSuite.  
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29317: [SPARK-32510][SQL] Check duplicate nested columns in read from JDBC datasource

2020-08-02 Thread GitBox


cloud-fan commented on pull request #29317:
URL: https://github.com/apache/spark/pull/29317#issuecomment-667781962


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #29317: [SPARK-32510][SQL] Check duplicate nested columns in read from JDBC datasource

2020-08-02 Thread GitBox


cloud-fan closed pull request #29317:
URL: https://github.com/apache/spark/pull/29317


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-08-02 Thread GitBox


leanken commented on pull request #29304:
URL: https://github.com/apache/spark/pull/29304#issuecomment-667781762


   > > Step 2: Say there is a right (build) side row (1, null, 3). It should be 
counted as a match against a row on the left side (1, 2, 3). What makes this 
tricky is that say say you have a build row (1, 5, 3), then (1, 5, 3) should 
NOT match the probe row (1, 2, 3). But if you explode (1, 5, 3) into a (1, 
null, 3) then it might incorrectly match (1, 2, 3). How do you handle both of 
these subcases ?
   > > Step 3: Consider a build row (1, 5, null), it should match the left row 
(1, null, 3). In addition, it should not match the build row (1, 5, 7). How do 
you handle these subcases ?
   > 
   > Above, when I mean "match" -- I mean that the left side would match the 
build row and WON'T be returned. Whereas with non match I mean that the left 
side would not match the build side and thus WILL be returned. We have 
different meanings for the words 'match' and 'not-match'. So please read my 
'match' == 'NAAJ should not return the left row', and conversely for non-match.
   > 
   > I would really really really encourage you to:
   > 
   > * Please reread the paper section 6.2 in its entirety many times and 
understand the above cases. I had to read it many times myself. It is very 
tricky as you pointed out.
   > * Add them as test cases comparing them with the original BNLJ 
implementation, both the negative and positive cases.
   > 
   > This is really tricky and I don't think the current implementation you 
have of expanding the hash table with a simple lookup on the stream side would 
suffice. I will also try to play around with your PR locally and run them as 
tests to convince myself. I hope I am wrong ;-).
   
   Yes, I do understand of the Paper 6.2. Basically the paper describe the 
algorithm in the perspective of StreamedSide. But the expansion state the 
perspective of BuildSide. Let's just do revert inferencing of the following 
case.
   
   if buildSide exist a row (1,2,3), what data in StreamedSide will evaluated 
as TRUE OR UNKNOWN and dropped.
   it should be 
   (null, 2, 3) (1, null, 3) (1, 2, null) (null, null, 3) (null, 2, null) (1, 
null, null) and of course (1,2,3)
   right?
   
   Only in above combination, streamedSide row will be dropped besides 
non-all-null case, right?
   
   I suppose this solution is working because it's passing all the not in cases 
in SQLQueryTestSuite.  
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-08-02 Thread GitBox


cloud-fan closed pull request #29067:
URL: https://github.com/apache/spark/pull/29067


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-08-02 Thread GitBox


cloud-fan commented on pull request #29067:
URL: https://github.com/apache/spark/pull/29067#issuecomment-667780468


   github action passes, I'm merging to master, thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-08-02 Thread GitBox


cloud-fan commented on a change in pull request #29146:
URL: https://github.com/apache/spark/pull/29146#discussion_r464172835



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
##
@@ -61,6 +64,80 @@ class SparkSqlParserSuite extends AnalysisTest {
   private def intercept(sqlCommand: String, messages: String*): Unit =
 interceptParseException(parser.parsePlan)(sqlCommand, messages: _*)
 
+  test("Checks if SET/RESET can parse all the configurations") {
+// Force to build static SQL configurations
+StaticSQLConf
+(SQLConf.sqlConfEntries.values.asScala ++ 
ConfigEntry.knownConfigs.values.asScala)

Review comment:
   `SQLConf` also uses `ConfigEntry`, I think `ConfigEntry.knownConfigs` 
already covers all the registered configs.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-08-02 Thread GitBox


cloud-fan commented on a change in pull request #29146:
URL: https://github.com/apache/spark/pull/29146#discussion_r464172446



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
##
@@ -66,17 +68,29 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
* character in the raw string.
*/
   override def visitSetConfiguration(ctx: SetConfigurationContext): 
LogicalPlan = withOrigin(ctx) {
-// Construct the command.
-val raw = remainder(ctx.SET.getSymbol)
-val keyValueSeparatorIndex = raw.indexOf('=')
-if (keyValueSeparatorIndex >= 0) {
-  val key = raw.substring(0, keyValueSeparatorIndex).trim
-  val value = raw.substring(keyValueSeparatorIndex + 1).trim
-  SetCommand(Some(key -> Option(value)))
-} else if (raw.nonEmpty) {
-  SetCommand(Some(raw.trim -> None))
+val configKeyValueDef = """([a-zA-Z_\d\\.:]+)\s*=(.*)""".r
+remainder(ctx.SET.getSymbol).trim match {
+  case configKeyValueDef(key, value) =>
+SetCommand(Some(key -> Option(value.trim)))
+  case configKeyDef(key) =>
+SetCommand(Some(key -> None))
+  case s if s == "-v" =>
+SetCommand(Some("-v" -> None))
+  case s if s.isEmpty =>
+SetCommand(None)
+  case _ => throw new ParseException("Expected format is 'SET', 'SET key', 
or " +
+"'SET key=value'. If you want to include special characters in key, " +
+"please use quotes, e.g., SET `ke y`=value.", ctx)
+}
+  }
+
+  override def visitSetQuotedConfiguration(ctx: SetQuotedConfigurationContext)
+: LogicalPlan = withOrigin(ctx) {
+val keyStr = ctx.quotedConfigKey().getText
+if (ctx.value != null) {
+  SetCommand(Some(keyStr -> Option(remainder(ctx.EQ().getSymbol).trim)))

Review comment:
   `(EQ value=.*)` we have an alias, can we use it here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-08-02 Thread GitBox


cloud-fan commented on a change in pull request #29146:
URL: https://github.com/apache/spark/pull/29146#discussion_r464172133



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
##
@@ -66,17 +68,29 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
* character in the raw string.
*/
   override def visitSetConfiguration(ctx: SetConfigurationContext): 
LogicalPlan = withOrigin(ctx) {
-// Construct the command.
-val raw = remainder(ctx.SET.getSymbol)
-val keyValueSeparatorIndex = raw.indexOf('=')
-if (keyValueSeparatorIndex >= 0) {
-  val key = raw.substring(0, keyValueSeparatorIndex).trim
-  val value = raw.substring(keyValueSeparatorIndex + 1).trim
-  SetCommand(Some(key -> Option(value)))
-} else if (raw.nonEmpty) {
-  SetCommand(Some(raw.trim -> None))
+val configKeyValueDef = """([a-zA-Z_\d\\.:]+)\s*=(.*)""".r
+remainder(ctx.SET.getSymbol).trim match {
+  case configKeyValueDef(key, value) =>
+SetCommand(Some(key -> Option(value.trim)))
+  case configKeyDef(key) =>

Review comment:
   ah nvm, we will also match `configKeyValueDef` first.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-08-02 Thread GitBox


cloud-fan commented on a change in pull request #29146:
URL: https://github.com/apache/spark/pull/29146#discussion_r464172042



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
##
@@ -66,17 +68,29 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
* character in the raw string.
*/
   override def visitSetConfiguration(ctx: SetConfigurationContext): 
LogicalPlan = withOrigin(ctx) {
-// Construct the command.
-val raw = remainder(ctx.SET.getSymbol)
-val keyValueSeparatorIndex = raw.indexOf('=')
-if (keyValueSeparatorIndex >= 0) {
-  val key = raw.substring(0, keyValueSeparatorIndex).trim
-  val value = raw.substring(keyValueSeparatorIndex + 1).trim
-  SetCommand(Some(key -> Option(value)))
-} else if (raw.nonEmpty) {
-  SetCommand(Some(raw.trim -> None))
+val configKeyValueDef = """([a-zA-Z_\d\\.:]+)\s*=(.*)""".r
+remainder(ctx.SET.getSymbol).trim match {
+  case configKeyValueDef(key, value) =>
+SetCommand(Some(key -> Option(value.trim)))
+  case configKeyDef(key) =>

Review comment:
   Will it match something like `a ###`?  Shall we use 
`([a-zA-Z_\d\\.:]+)$`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-08-02 Thread GitBox


cloud-fan commented on a change in pull request #29146:
URL: https://github.com/apache/spark/pull/29146#discussion_r464171781



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
##
@@ -66,17 +68,29 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
* character in the raw string.
*/
   override def visitSetConfiguration(ctx: SetConfigurationContext): 
LogicalPlan = withOrigin(ctx) {
-// Construct the command.
-val raw = remainder(ctx.SET.getSymbol)
-val keyValueSeparatorIndex = raw.indexOf('=')
-if (keyValueSeparatorIndex >= 0) {
-  val key = raw.substring(0, keyValueSeparatorIndex).trim
-  val value = raw.substring(keyValueSeparatorIndex + 1).trim
-  SetCommand(Some(key -> Option(value)))
-} else if (raw.nonEmpty) {
-  SetCommand(Some(raw.trim -> None))
+val configKeyValueDef = """([a-zA-Z_\d\\.:]+)\s*=(.*)""".r

Review comment:
   Can we put it in the class body so we don't need to compile the regex 
repeatedly?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-08-02 Thread GitBox


cloud-fan commented on a change in pull request #29146:
URL: https://github.com/apache/spark/pull/29146#discussion_r464171523



##
File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##
@@ -246,11 +246,17 @@ statement
 | SET TIME ZONE interval   
#setTimeZone
 | SET TIME ZONE timezone=(STRING | LOCAL)  
#setTimeZone
 | SET TIME ZONE .*?
#setTimeZone
+| SET quotedConfigKey (EQ value=.*)?   
#setQuotedConfiguration
 | SET .*?  
#setConfiguration
+| RESET quotedConfigKey
#resetQuotedConfiguration
 | RESET .*?
#resetConfiguration
 | unsupportedHiveNativeCommands .*?
#failNativeCommand
 ;
 
+quotedConfigKey

Review comment:
   hmm, is it necessary to create an alias? How about `SET key= 
quotedIdentifier (EQ value=.*)?`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-08-02 Thread GitBox


cloud-fan commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r464170447



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Column.scala
##
@@ -871,6 +871,72 @@ class Column(val expr: Expression) extends Logging {
*/
   def getItem(key: Any): Column = withExpr { UnresolvedExtractValue(expr, 
Literal(key)) }
 
+  // scalastyle:off line.size.limit
+  /**
+   * An expression that adds/replaces field in `StructType` by name.
+   *
+   * {{{
+   *   val df = sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
+   *   df.select($"struct_col".withField("c", lit(3)))

Review comment:
   weird, we have tests to cover these examples. @fqaiser94 can you take a 
look?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-08-02 Thread GitBox


agrawaldevesh commented on pull request #29304:
URL: https://github.com/apache/spark/pull/29304#issuecomment-667773518


   > Step 2: Say there is a right (build) side row (1, null, 3). It should be 
counted as a match against a row on the left side (1, 2, 3). What makes this 
tricky is that say say you have a build row (1, 5, 3), then (1, 5, 3) should 
NOT match the probe row (1, 2, 3). But if you explode (1, 5, 3) into a (1, 
null, 3) then it might incorrectly match (1, 2, 3). How do you handle both of 
these subcases ?
   Step 3: Consider a build row (1, 5, null), it should match the left row (1, 
null, 3). In addition, it should not match the build row (1, 5, 7). How do you 
handle these subcases ?
   
   Above, when I mean "match" -- I mean that the left side would match the 
build row and WON'T be returned. Whereas with non match I mean that the left 
side would not match the build side and thus WILL be returned. We have 
different meanings for the words 'match' and 'not-match'. So please read my 
'match' == 'NAAJ should not return the left row', and conversely for non-match.
   
   I would really really really encourage you to:
   - Please reread the paper section 6.2 in its entirety many times and 
understand the above cases. I had to read it many times myself. It is very 
tricky as you pointed out.
   - Add them as test cases comparing them with the original BNLJ 
implementation, both the negative and positive cases. 
   
   This is really tricky and I don't think the current implementation you have 
of expanding the hash table with a simple lookup on the stream side would 
suffice. I will also try to play around with your PR locally and run them as 
tests to convince myself. I hope I am wrong ;-).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path

2020-08-02 Thread GitBox


yaooqinn commented on pull request #28527:
URL: https://github.com/apache/spark/pull/28527#issuecomment-667772325


   gentle ping @cloud-fan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29192: [SPARK-32393][SQL] Support PostgreSQL `bpchar` array

2020-08-02 Thread GitBox


maropu commented on pull request #29192:
URL: https://github.com/apache/spark/pull/29192#issuecomment-667772167


   kindly ping.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshisan commented on pull request #29266: [SPARK-32464][SQL] Support skew handling on join that has one side wi…

2020-08-02 Thread GitBox


wangshisan commented on pull request #29266:
URL: https://github.com/apache/spark/pull/29266#issuecomment-667772074


   @cloud-fan @JkSelf  Could you have a look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29320: [WIP][SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation

2020-08-02 Thread GitBox


HyukjinKwon commented on pull request #29320:
URL: https://github.com/apache/spark/pull/29320#issuecomment-667772011


   > What's docs/img/pyspark-components.pptx for?
   
   It is for the image I used in the main page in case some people want to 
edit. There are other pptx files in `docs/img` as well for that purpose.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #29303: [SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for thrif

2020-08-02 Thread GitBox


yaooqinn commented on a change in pull request #29303:
URL: https://github.com/apache/spark/pull/29303#discussion_r464165400



##
File path: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala
##
@@ -126,12 +124,52 @@ private[hive] class SparkGetColumnsOperation(
 HiveThriftServer2.eventManager.onStatementFinish(statementId)
   }
 
+  /**
+   * For numeric and datetime types, it returns the default size of its 
catalyst type
+   * For struct type, when its elements are fixed-size, the summation of all 
element sizes will be
+   * returned.
+   * For array, map, string, and binaries, the column size is variable, return 
null as unknown.
+   */
+  private def getColumnSize(typ: DataType): Option[Int] = typ match {

Review comment:
   Hive does not return same result for each type





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >