date:20200627

[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-06-27 Thread GitBox



moomindani commented on a change in pull request #27690:
URL: https://github.com/apache/spark/pull/27690#discussion_r446516892



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
##
@@ -97,12 +99,38 @@ private[hive] trait SaveAsHiveFile extends 
DataWritingCommand {
   options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  // Mostly copied from Context.java#getMRTmpPath of Hive 2.3.
+  // Visible for testing.
+  private[execution] def getNonBlobTmpPath(
+  hadoopConf: Configuration,
+  sessionScratchDir: String,
+  scratchDir: String): Path = {
+
+// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1',
+// which is ruled by 'hive.exec.scratchdir' including file system.
+// This is the same as Spark's #oldVersionExternalTempPath.
+// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is 
HIVE-7090.
+// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir'
+// Here it uses session_path unless it's emtpy, otherwise uses scratchDir.
+val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else 
scratchDir
+val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), 
hadoopConf, sessionPath)

Review comment:
   Hive has two kinds of scratch dir accordingly, one in local, the other 
in hdfs.
   https://mingyue.me/2018/11/17/hive-scratch-working-directory/
   
   In this pull-request, the latter one, `hive.exec.scratchdir` is used. In my 
recognition, we can assume HDFS schema in most cases.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28905: [SPARK-32071][SQL][TESTS] Add `make_interval` benchmark

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28905:
URL: https://github.com/apache/spark/pull/28905#issuecomment-650701364







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28905: [SPARK-32071][SQL][TESTS] Add `make_interval` benchmark

2020-06-27 Thread GitBox



AmplabJenkins commented on pull request #28905:
URL: https://github.com/apache/spark/pull/28905#issuecomment-650701364







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28905: [SPARK-32071][SQL][TESTS] Add `make_interval` benchmark

2020-06-27 Thread GitBox



SparkQA removed a comment on pull request #28905:
URL: https://github.com/apache/spark/pull/28905#issuecomment-650658676


   **[Test build #124579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124579/testReport)**
 for PR 28905 at commit 
[`3c5b604`](https://github.com/apache/spark/commit/3c5b6041477194a855667059629d0fe4b0258b23).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-06-27 Thread GitBox



moomindani commented on a change in pull request #27690:
URL: https://github.com/apache/spark/pull/27690#discussion_r446604034



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
##
@@ -97,12 +99,38 @@ private[hive] trait SaveAsHiveFile extends 
DataWritingCommand {
   options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  // Mostly copied from Context.java#getMRTmpPath of Hive 2.3.
+  // Visible for testing.
+  private[execution] def getNonBlobTmpPath(
+  hadoopConf: Configuration,
+  sessionScratchDir: String,
+  scratchDir: String): Path = {
+
+// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1',
+// which is ruled by 'hive.exec.scratchdir' including file system.
+// This is the same as Spark's #oldVersionExternalTempPath.
+// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is 
HIVE-7090.
+// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir'
+// Here it uses session_path unless it's emtpy, otherwise uses scratchDir.
+val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else 
scratchDir
+val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), 
hadoopConf, sessionPath)

Review comment:
   Currently we just rely on `hive.exec.scratchdir` (not directly on 
`fs.default.name`), and it works in most use cases even if 
`hive.exec.scratchdir` is not configured explicitly. 
   I do not want to restrict this feature to HDFS only because I have seen some 
clusters which do not have HDFS. I want to let end-users choose any scheme 
where they want to store temporary data.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28905: [SPARK-32071][SQL][TESTS] Add `make_interval` benchmark

2020-06-27 Thread GitBox



SparkQA commented on pull request #28905:
URL: https://github.com/apache/spark/pull/28905#issuecomment-650701112


   **[Test build #124579 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124579/testReport)**
 for PR 28905 at commit 
[`3c5b604`](https://github.com/apache/spark/commit/3c5b6041477194a855667059629d0fe4b0258b23).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-06-27 Thread GitBox



moomindani commented on a change in pull request #27690:
URL: https://github.com/apache/spark/pull/27690#discussion_r446604034



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
##
@@ -97,12 +99,38 @@ private[hive] trait SaveAsHiveFile extends 
DataWritingCommand {
   options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  // Mostly copied from Context.java#getMRTmpPath of Hive 2.3.
+  // Visible for testing.
+  private[execution] def getNonBlobTmpPath(
+  hadoopConf: Configuration,
+  sessionScratchDir: String,
+  scratchDir: String): Path = {
+
+// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1',
+// which is ruled by 'hive.exec.scratchdir' including file system.
+// This is the same as Spark's #oldVersionExternalTempPath.
+// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is 
HIVE-7090.
+// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir'
+// Here it uses session_path unless it's emtpy, otherwise uses scratchDir.
+val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else 
scratchDir
+val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), 
hadoopConf, sessionPath)

Review comment:
   Currently we just rely on `hive.exec.scratchdir`, and it works in most 
use cases even if `hive.exec.scratchdir` is not configured explicitly. 
   I do not want to restrict this feature to HDFS only because I have seen some 
clusters which do not have HDFS. I want to let end-users choose any scheme 
where they want to store temporary data.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28901: [WIP][SPARK-32064][SQL] Supporting create temporary table

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-650700128







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28901: [WIP][SPARK-32064][SQL] Supporting create temporary table

2020-06-27 Thread GitBox



AmplabJenkins commented on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-650700128







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28901: [WIP][SPARK-32064][SQL] Supporting create temporary table

2020-06-27 Thread GitBox



SparkQA commented on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-650700037


   **[Test build #124594 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124594/testReport)**
 for PR 28901 at commit 
[`6d3274f`](https://github.com/apache/spark/commit/6d3274f08c1c81262c8b0c21aa133a04e31c6796).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28901: [WIP][SPARK-32064][SQL] Supporting create temporary table

2020-06-27 Thread GitBox



LantaoJin commented on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-650698393


   @gatorsmile Yes. Just like Hive temporary table or Teradata volatile table. 
We are migrating our Spark to v3.0. This is one of inside features which had 
widely used in our prodution. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28647:
URL: https://github.com/apache/spark/pull/28647#issuecomment-650697849


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124581/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28647:
URL: https://github.com/apache/spark/pull/28647#issuecomment-650697842


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand

2020-06-27 Thread GitBox



AmplabJenkins commented on pull request #28647:
URL: https://github.com/apache/spark/pull/28647#issuecomment-650697842







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand

2020-06-27 Thread GitBox



SparkQA removed a comment on pull request #28647:
URL: https://github.com/apache/spark/pull/28647#issuecomment-650673555


   **[Test build #124581 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124581/testReport)**
 for PR 28647 at commit 
[`5c63477`](https://github.com/apache/spark/commit/5c634779f429ae148ff2d7f0453e3935109b7785).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand

2020-06-27 Thread GitBox



SparkQA commented on pull request #28647:
URL: https://github.com/apache/spark/pull/28647#issuecomment-650697664


   **[Test build #124581 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124581/testReport)**
 for PR 28647 at commit 
[`5c63477`](https://github.com/apache/spark/commit/5c634779f429ae148ff2d7f0453e3935109b7785).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] TJX2014 edited a comment on pull request #28918: [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab

2020-06-27 Thread GitBox



TJX2014 edited a comment on pull request #28918:
URL: https://github.com/apache/spark/pull/28918#issuecomment-650686745


   @dongjoon-hyun Thanks, I have done. :-)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] TJX2014 edited a comment on pull request #28918: [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab

2020-06-27 Thread GitBox



TJX2014 edited a comment on pull request #28918:
URL: https://github.com/apache/spark/pull/28918#issuecomment-650686745


   @dongjoon-hyun Thanks, I have done.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28937: [SPARK-32115][SQL] Incorrect results for SUBSTRING when overflow

2020-06-27 Thread GitBox



AmplabJenkins commented on pull request #28937:
URL: https://github.com/apache/spark/pull/28937#issuecomment-650694350







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28937: [SPARK-32115][SQL] Incorrect results for SUBSTRING when overflow

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28937:
URL: https://github.com/apache/spark/pull/28937#issuecomment-650694350







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR edited a comment on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path

2020-06-27 Thread GitBox



HeartSaVioR edited a comment on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-650694094


   Please take a look at how Kafka data source options apply with both batch 
and streaming query. The semantic of the option should be applied differently.
   
   
http://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-for-batch-queries
   
   `startingOffsetsByTimestamp`, `startingOffsets`, `endingOffsetsByTimestamp`, 
`endingOffsets`
   
   If we are not fully sure about how to do it, let's only apply the option to 
batch query, and file an issue to address for the streaming query.
   
   Btw, that said, I prefer to have lower bound + upper bound instead of only 
lower bound, as commented earlier on reviewing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path

2020-06-27 Thread GitBox



HeartSaVioR commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-650694094


   Please take a look at how Kafka data source options apply with both batch 
and streaming query. The semantic of the option should be applied differently.
   
   
http://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-for-batch-queries
   
   `startingOffsetsByTimestamp`, `startingOffsets`, `endingOffsetsByTimestamp`, 
`endingOffsets`
   
   If we are not fully sure about, let's only apply the option to batch query, 
and file an issue to address for the streaming query.
   
   Btw, that said, I prefer to have lower bound + upper bound instead of only 
lower bound, as commented earlier on reviewing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28937: [SPARK-32115][SQL] Incorrect results for SUBSTRING when overflow

2020-06-27 Thread GitBox



SparkQA commented on pull request #28937:
URL: https://github.com/apache/spark/pull/28937#issuecomment-650694181


   **[Test build #124593 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124593/testReport)**
 for PR 28937 at commit 
[`5f109a8`](https://github.com/apache/spark/commit/5f109a87bcdadf693352f995bab0e72faf360824).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-06-27 Thread GitBox



beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r446590900



##
File path: 
sql/core/src/test/resources/sql-tests/inputs/postgreSQL/window_part1.sql
##
@@ -301,7 +301,7 @@ FROM tenk1 WHERE unique1 < 10;
 -- unique1, four
 -- FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four);
 
--- [SPARK-27951] ANSI SQL: NTH_VALUE function
+-- [SPARK-30708] first_value/last_value window function throws ParseException

Review comment:
   Because #25082 has reverted, SPARK-30708 not need.
   I updated this comments with SPARK-28310





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking opened a new pull request #28937: [SPARK-32115][SQL] Incorrect results for SUBSTRING when overflow

2020-06-27 Thread GitBox



xuanyuanking opened a new pull request #28937:
URL: https://github.com/apache/spark/pull/28937


   ### What changes were proposed in this pull request?
   Bug fix for overflow case in `UTF8String.substringSQL`.
   
   ### Why are the changes needed?
   SQL query `SELECT SUBSTRING("abc", -1207959552, -1207959552)` incorrectly 
returns` "abc"` against expected output of `""`. For query `SUBSTRING("abc", 
-100, -100)`, we'll get the right output of `""`.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, bug fix for the overflow case.
   
   ### How was this patch tested?
   New UT.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking commented on pull request #28937: [SPARK-32115][SQL] Incorrect results for SUBSTRING when overflow

2020-06-27 Thread GitBox



xuanyuanking commented on pull request #28937:
URL: https://github.com/apache/spark/pull/28937#issuecomment-650693563


   cc @cloud-fan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-650692254







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox



AmplabJenkins commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-650692254







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox



SparkQA commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-650692166


   **[Test build #124592 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124592/testReport)**
 for PR 28898 at commit 
[`acce8c5`](https://github.com/apache/spark/commit/acce8c5d8d51bae5f981e56a8811f075cb07d214).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] frankyin-factual commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox



frankyin-factual commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r446598259



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -39,6 +39,22 @@ object NestedColumnAliasing {
   NestedColumnAliasing.replaceToAliases(plan, nestedFieldToAlias, 
attrToAliases)
   }
 
+/**
+ * This is to solve a `LogicalPlan` like `Project`->`Filter`->`Window`.
+ * In this case, `Window` can be plan that is `canProjectPushThrough`.
+ * By adding this, it allows nested columns to be passed onto next stages.
+ * Currently, not adding `Filter` into `canProjectPushThrough` due to
+ * infinitely loop in optimizers during the predicate push-down rule.
+ */

Review comment:
   I don't know exactly why it's broken, but here is a simple query that 
can reproduce this issue: 
   `select name.last from contacts where name.first='Jane'`
   The error message is like: 
   ```
   20/06/27 21:17:41 WARN internal.BaseSessionStateBuilder$$anon$2: Max 
iterations (100) reached for batch Operator Optimization before Inferring 
Filters, please set 'spark.sql.optimizer.maxIterations' to a larger value.
   20/06/27 21:17:41 WARN internal.BaseSessionStateBuilder$$anon$2: Max 
iterations (100) reached for batch Operator Optimization after Inferring 
Filters, please set 'spark.sql.optimizer.maxIterations' to a larger value.
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-06-27 Thread GitBox



HeartSaVioR commented on pull request #28904:
URL: https://github.com/apache/spark/pull/28904#issuecomment-650690135


   UPDATE: now SPARK-30946 + SPARK-30462 writes 11879 which RES is still less 
than 2G (around 1.7G). I'll stop the sustaining test for enough heap and run 
the another sustaining test for smaller heap (1.5G).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28935: [SPARK-20680][SQL] Adding HiveVoidType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28935:
URL: https://github.com/apache/spark/pull/28935#issuecomment-650688121







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-06-27 Thread GitBox



AmplabJenkins commented on pull request #28685:
URL: https://github.com/apache/spark/pull/28685#issuecomment-650688119







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28685:
URL: https://github.com/apache/spark/pull/28685#issuecomment-650688119







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28935: [SPARK-20680][SQL] Adding HiveVoidType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



AmplabJenkins commented on pull request #28935:
URL: https://github.com/apache/spark/pull/28935#issuecomment-650688121







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28935: [SPARK-20680][SQL] Adding HiveVoidType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



SparkQA commented on pull request #28935:
URL: https://github.com/apache/spark/pull/28935#issuecomment-650687920


   **[Test build #124590 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124590/testReport)**
 for PR 28935 at commit 
[`17aace2`](https://github.com/apache/spark/commit/17aace25df565491012a729d24c9035b988904d6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-06-27 Thread GitBox



SparkQA commented on pull request #28685:
URL: https://github.com/apache/spark/pull/28685#issuecomment-650687924


   **[Test build #124591 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124591/testReport)**
 for PR 28685 at commit 
[`f7c2b1e`](https://github.com/apache/spark/commit/f7c2b1e7b7134d32691a7a844c497a5cdf731aad).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu edited a comment on pull request #28737: [SPARK-31913][SQL] Fix StackOverflowError in FileScanRDD

2020-06-27 Thread GitBox



maropu edited a comment on pull request #28737:
URL: https://github.com/apache/spark/pull/28737#issuecomment-650686603


   Yea, we need env-independent tests to reproduce this issue...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28918: [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28918:
URL: https://github.com/apache/spark/pull/28918#issuecomment-650687180







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28918: [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab

2020-06-27 Thread GitBox



AmplabJenkins commented on pull request #28918:
URL: https://github.com/apache/spark/pull/28918#issuecomment-650687180







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28936: [SPARK-30798][SS][FOLLOW-UP] Scope Session.active in IncrementalExecution

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28936:
URL: https://github.com/apache/spark/pull/28936#issuecomment-650687026







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28936: [SPARK-30798][SS][FOLLOW-UP] Scope Session.active in IncrementalExecution

2020-06-27 Thread GitBox



AmplabJenkins commented on pull request #28936:
URL: https://github.com/apache/spark/pull/28936#issuecomment-650687026







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28918: [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab

2020-06-27 Thread GitBox



SparkQA removed a comment on pull request #28918:
URL: https://github.com/apache/spark/pull/28918#issuecomment-650656834


   **[Test build #124577 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124577/testReport)**
 for PR 28918 at commit 
[`698cae6`](https://github.com/apache/spark/commit/698cae60d76988939c0c80bf9abcfc8eb8214bd2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28936: [SPARK-30798][SS][FOLLOW-UP] Scope Session.active in IncrementalExecution

2020-06-27 Thread GitBox



SparkQA commented on pull request #28936:
URL: https://github.com/apache/spark/pull/28936#issuecomment-650686881


   **[Test build #124589 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124589/testReport)**
 for PR 28936 at commit 
[`5ca38fd`](https://github.com/apache/spark/commit/5ca38fd63233752f302225b1ab3d0f54f5847831).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28918: [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab

2020-06-27 Thread GitBox



SparkQA commented on pull request #28918:
URL: https://github.com/apache/spark/pull/28918#issuecomment-650686877


   **[Test build #124577 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124577/testReport)**
 for PR 28918 at commit 
[`698cae6`](https://github.com/apache/spark/commit/698cae60d76988939c0c80bf9abcfc8eb8214bd2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] TJX2014 commented on pull request #28918: [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab

2020-06-27 Thread GitBox



TJX2014 commented on pull request #28918:
URL: https://github.com/apache/spark/pull/28918#issuecomment-650686745


   @dongjoon-hyun Thanks, I may have done.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on pull request #28737: [SPARK-31913][SQL] Fix StackOverflowError in FileScanRDD

2020-06-27 Thread GitBox



maropu commented on pull request #28737:
URL: https://github.com/apache/spark/pull/28737#issuecomment-650686603


   Yea, we need env-independent tests for this issue...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



dongjoon-hyun commented on pull request #28935:
URL: https://github.com/apache/spark/pull/28935#issuecomment-650686555


   Thank you for working on this, @LantaoJin !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-06-27 Thread GitBox



beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r446590900



##
File path: 
sql/core/src/test/resources/sql-tests/inputs/postgreSQL/window_part1.sql
##
@@ -301,7 +301,7 @@ FROM tenk1 WHERE unique1 < 10;
 -- unique1, four
 -- FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four);
 
--- [SPARK-27951] ANSI SQL: NTH_VALUE function
+-- [SPARK-30708] first_value/last_value window function throws ParseException

Review comment:
   Because #25082 has reverted, SPARK-30708 not need.
   I will delete this line.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



dongjoon-hyun commented on a change in pull request #28935:
URL: https://github.com/apache/spark/pull/28935#discussion_r446595871



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types
+
+/**
+ * A hive null type for compatibility. These datatypes should only used for 
parsing,
+ * and should NOT be used anywhere else. Any instance of these data types 
should be
+ * replaced by a [[NullType]] before analysis.
+ */
+class HiveNullType private() extends DataType {
+
+  override def defaultSize: Int = 1
+
+  override private[spark] def asNullable: HiveNullType = this
+
+  override def simpleString: String = "void"
+}
+
+case object HiveNullType extends HiveNullType {
+  def replaceNullType(dt: DataType): DataType = dt match {
+case ArrayType(et, nullable) =>
+  ArrayType(replaceNullType(et), nullable)
+case MapType(kt, vt, nullable) =>
+  MapType(replaceNullType(kt), replaceNullType(vt), nullable)
+case StructType(fields) =>
+  StructType(fields.map { field =>
+field.copy(dataType = replaceNullType(field.dataType))
+  })
+case _: HiveNullType => NullType
+case _ => dt
+  }
+
+
+  def containsNullType(dt: DataType): Boolean = dt match {

Review comment:
   Shall we remove this because this PR doesn't use this at all? You can 
add later when you need.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



dongjoon-hyun commented on a change in pull request #28935:
URL: https://github.com/apache/spark/pull/28935#discussion_r446595757



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types
+
+/**
+ * A hive null type for compatibility. These datatypes should only used for 
parsing,
+ * and should NOT be used anywhere else. Any instance of these data types 
should be
+ * replaced by a [[NullType]] before analysis.
+ */
+class HiveNullType private() extends DataType {
+
+  override def defaultSize: Int = 1
+
+  override private[spark] def asNullable: HiveNullType = this
+
+  override def simpleString: String = "void"
+}
+
+case object HiveNullType extends HiveNullType {
+  def replaceNullType(dt: DataType): DataType = dt match {
+case ArrayType(et, nullable) =>
+  ArrayType(replaceNullType(et), nullable)
+case MapType(kt, vt, nullable) =>
+  MapType(replaceNullType(kt), replaceNullType(vt), nullable)
+case StructType(fields) =>
+  StructType(fields.map { field =>
+field.copy(dataType = replaceNullType(field.dataType))
+  })

Review comment:
   Maybe, the following is shorter as a one-liner.
   ```scala
   StructType(fields.map(f => f.copy(dataType = replaceNullType(f.dataType
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking opened a new pull request #28936: [SPARK-30798][SS][FOLLOW-UP] Scope Session.active in IncrementalExecution

2020-06-27 Thread GitBox



xuanyuanking opened a new pull request #28936:
URL: https://github.com/apache/spark/pull/28936


   ### What changes were proposed in this pull request?
   
   The `optimizedPlan` in IncrementalExecution should also be scoped in 
`withActive`.
   
   ### Why are the changes needed?
   
   Follow-up of SPARK-30798 for the Streaming side.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Existing UT.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking commented on pull request #28936: [SPARK-30798][SS][FOLLOW-UP] Scope Session.active in IncrementalExecution

2020-06-27 Thread GitBox



xuanyuanking commented on pull request #28936:
URL: https://github.com/apache/spark/pull/28936#issuecomment-650686260


   cc @cloud-fan @HyukjinKwon 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



LantaoJin commented on a change in pull request #28935:
URL: https://github.com/apache/spark/pull/28935#discussion_r446595548



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types
+
+/**
+ * A hive null type for compatibility. These datatypes should only used for 
parsing,
+ * and should NOT be used anywhere else. Any instance of these data types 
should be
+ * replaced by a [[NullType]] before analysis.
+ */
+class HiveNullType private() extends DataType {

Review comment:
   > `null` is a `value` and Hive exposes `void` as a type.
   
   You are right.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



dongjoon-hyun commented on a change in pull request #28935:
URL: https://github.com/apache/spark/pull/28935#discussion_r446595406



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -132,7 +132,9 @@ class ResolveSessionCatalog(
   }
   }
   // Add Hive type string to metadata.

Review comment:
   Please update this description together in this PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



dongjoon-hyun commented on a change in pull request #28935:
URL: https://github.com/apache/spark/pull/28935#discussion_r446595364



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types
+
+/**
+ * A hive null type for compatibility. These datatypes should only used for 
parsing,
+ * and should NOT be used anywhere else. Any instance of these data types 
should be
+ * replaced by a [[NullType]] before analysis.
+ */
+class HiveNullType private() extends DataType {
+
+  override def defaultSize: Int = 1
+
+  override private[spark] def asNullable: HiveNullType = this
+
+  override def simpleString: String = "void"
+}
+
+case object HiveNullType extends HiveNullType {
+  def replaceNullType(dt: DataType): DataType = dt match {

Review comment:
   This will be `replaceVoidType` and it will be less ambiguous.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking commented on pull request #28737: [SPARK-31913][SQL] Fix StackOverflowError in FileScanRDD

2020-06-27 Thread GitBox



xuanyuanking commented on pull request #28737:
URL: https://github.com/apache/spark/pull/28737#issuecomment-650685434


   Same question with Takeshi here 
https://github.com/apache/spark/pull/28737#discussion_r437831211



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



dongjoon-hyun commented on a change in pull request #28935:
URL: https://github.com/apache/spark/pull/28935#discussion_r446595267



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types
+
+/**
+ * A hive null type for compatibility. These datatypes should only used for 
parsing,
+ * and should NOT be used anywhere else. Any instance of these data types 
should be
+ * replaced by a [[NullType]] before analysis.
+ */
+class HiveNullType private() extends DataType {

Review comment:
   `null` is a `value` and Hive exposes `void` as a type.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



dongjoon-hyun commented on a change in pull request #28935:
URL: https://github.com/apache/spark/pull/28935#discussion_r446595209



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types
+
+/**
+ * A hive null type for compatibility. These datatypes should only used for 
parsing,
+ * and should NOT be used anywhere else. Any instance of these data types 
should be
+ * replaced by a [[NullType]] before analysis.
+ */
+class HiveNullType private() extends DataType {

Review comment:
   Currently, the description is interpreted like "`hive null type` should 
be replaced by a NullType before analysis".





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



dongjoon-hyun commented on a change in pull request #28935:
URL: https://github.com/apache/spark/pull/28935#discussion_r446595141



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types
+
+/**
+ * A hive null type for compatibility. These datatypes should only used for 
parsing,
+ * and should NOT be used anywhere else. Any instance of these data types 
should be
+ * replaced by a [[NullType]] before analysis.
+ */
+class HiveNullType private() extends DataType {

Review comment:
   I know the context, but can we name this `HiveVoidType` literally?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue

2020-06-27 Thread GitBox



xuanyuanking commented on pull request #28904:
URL: https://github.com/apache/spark/pull/28904#issuecomment-650684827


   Very impressive! I'll review this in 2 days.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



LantaoJin commented on a change in pull request #28935:
URL: https://github.com/apache/spark/pull/28935#discussion_r446595075



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -2184,7 +2184,9 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
* Create a Spark DataType.
*/
   private def visitSparkDataType(ctx: DataTypeContext): DataType = {
-HiveStringType.replaceCharType(typedVisit(ctx))
+HiveNullType.replaceNullType(
+  HiveStringType.replaceCharType(typedVisit(ctx))
+)

Review comment:
   Yes. Split to two lines is just for readability. Ok, I will coalesce to 
one line.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



dongjoon-hyun commented on a change in pull request #28935:
URL: https://github.com/apache/spark/pull/28935#discussion_r446594898



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -2260,7 +2263,9 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 
 // Add Hive type string to metadata.
 val rawDataType = typedVisit[DataType](ctx.dataType)
-val cleanedDataType = HiveStringType.replaceCharType(rawDataType)
+val cleanedDataType = HiveNullType.replaceNullType(
+  HiveStringType.replaceCharType(rawDataType)
+)

Review comment:
   ditto.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



dongjoon-hyun commented on a change in pull request #28935:
URL: https://github.com/apache/spark/pull/28935#discussion_r446594858



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -2184,7 +2184,9 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
* Create a Spark DataType.
*/
   private def visitSparkDataType(ctx: DataTypeContext): DataType = {
-HiveStringType.replaceCharType(typedVisit(ctx))
+HiveNullType.replaceNullType(
+  HiveStringType.replaceCharType(typedVisit(ctx))
+)

Review comment:
   One line is enough, isn't it?
   ```scala
   HiveNullType.replaceNullType(HiveStringType.replaceCharType(typedVisit(ctx)))
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager

2020-06-27 Thread GitBox



dongjoon-hyun commented on a change in pull request #28895:
URL: https://github.com/apache/spark/pull/28895#discussion_r446476096



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -335,23 +335,6 @@ private[spark] abstract class MapOutputTracker(conf: 
SparkConf) extends Logging
* tuples describing the shuffle blocks that are stored at that 
block manager.
*/
   def getMapSizesByExecutorId(
-  shuffleId: Int,
-  startPartition: Int,
-  endPartition: Int)
-  : Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])]
-
-  /**
-   * Called from executors to get the server URIs and output sizes for each 
shuffle block that
-   * needs to be read from a given range of map output partitions 
(startPartition is included but
-   * endPartition is excluded from the range) and is produced by
-   * a range of mappers (startMapIndex, endMapIndex, startMapIndex is included 
and
-   * the endMapIndex is excluded).

Review comment:
   Hi, @Ngone51 . This should be the function description of the unified 
`getMapSizesByExecutorId`. Did I understand correctly? Or, could you add a 
comment about `startMapIndex` and `endMapIndex` and about when we don't care 
about that because of `actualEndMapIndex` (you don't need to mention this 
variable name specifically).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28916: [SPARK-32083][SQL] Coalesce to one partition when all partitions are empty in AQE

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28916:
URL: https://github.com/apache/spark/pull/28916#issuecomment-650683327







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28916: [SPARK-32083][SQL] Coalesce to one partition when all partitions are empty in AQE

2020-06-27 Thread GitBox



AmplabJenkins commented on pull request #28916:
URL: https://github.com/apache/spark/pull/28916#issuecomment-650683327







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28916: [SPARK-32083][SQL] Coalesce to one partition when all partitions are empty in AQE

2020-06-27 Thread GitBox



SparkQA commented on pull request #28916:
URL: https://github.com/apache/spark/pull/28916#issuecomment-650683117


   **[Test build #124588 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124588/testReport)**
 for PR 28916 at commit 
[`84031c1`](https://github.com/apache/spark/commit/84031c1642f2085028163a67e365c012c3b3a906).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28916: [SPARK-32083][SQL] Coalesce to one partition when all partitions are empty in AQE

2020-06-27 Thread GitBox



dongjoon-hyun commented on pull request #28916:
URL: https://github.com/apache/spark/pull/28916#issuecomment-650682797


   Retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-06-27 Thread GitBox



viirya commented on a change in pull request #27690:
URL: https://github.com/apache/spark/pull/27690#discussion_r446593989



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
##
@@ -97,12 +99,38 @@ private[hive] trait SaveAsHiveFile extends 
DataWritingCommand {
   options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  // Mostly copied from Context.java#getMRTmpPath of Hive 2.3.
+  // Visible for testing.
+  private[execution] def getNonBlobTmpPath(
+  hadoopConf: Configuration,
+  sessionScratchDir: String,
+  scratchDir: String): Path = {
+
+// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1',
+// which is ruled by 'hive.exec.scratchdir' including file system.
+// This is the same as Spark's #oldVersionExternalTempPath.
+// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is 
HIVE-7090.
+// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir'
+// Here it uses session_path unless it's emtpy, otherwise uses scratchDir.
+val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else 
scratchDir
+val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), 
hadoopConf, sessionPath)

Review comment:
   When this new feature is enabled, it is possible that a scheme which 
doesn't work for this feature is used, e.g. local scheme. If it is happened and 
causes some error, end-users might not know how to deal with it.
   
   Because we don't know if every scheme supports this feature, we use a list 
of schemes as config value, instead of a boolean config. Similarly, I think we 
should not reply on an assumption that `fs.default.name` always works for this 
feature. Can we just restrict this feature to HDFS only? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-06-27 Thread GitBox



AmplabJenkins commented on pull request #28917:
URL: https://github.com/apache/spark/pull/28917#issuecomment-650682148







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28917:
URL: https://github.com/apache/spark/pull/28917#issuecomment-650682148







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-06-27 Thread GitBox



SparkQA commented on pull request #28917:
URL: https://github.com/apache/spark/pull/28917#issuecomment-650681799


   **[Test build #124587 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124587/testReport)**
 for PR 28917 at commit 
[`350fa8d`](https://github.com/apache/spark/commit/350fa8dc6b8ec0d9b28c5200cd287b48a22cfca0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-06-27 Thread GitBox



beliefer commented on a change in pull request #28917:
URL: https://github.com/apache/spark/pull/28917#discussion_r446593625



##
File path: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
##
@@ -278,7 +280,26 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 
   override def beforeEach(): Unit = {
 super.beforeEach()
-init(new SparkConf())
+  }
+
+  override protected def test(testName: String, testTags: Tag*)(testFun: => 
Any)
+  (implicit pos: Position): Unit = {
+testWithSparkConf(testName, testTags: _*)()(testFun)(pos)
+  }
+
+  protected def testWithSparkConf(testName: String, testTags: Tag*)

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xianyinxin commented on a change in pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-27 Thread GitBox



xianyinxin commented on a change in pull request #28875:
URL: https://github.com/apache/spark/pull/28875#discussion_r446593547



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
##
@@ -347,23 +347,23 @@ case class MergeIntoTable(
 }
 
 sealed abstract class MergeAction(
-condition: Option[Expression]) extends Expression with Unevaluable {
+val condition: Option[Expression]) extends Expression with Unevaluable {

Review comment:
   done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xianyinxin commented on a change in pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-27 Thread GitBox



xianyinxin commented on a change in pull request #28875:
URL: https://github.com/apache/spark/pull/28875#discussion_r446593529



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -468,13 +458,25 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   throw new ParseException("There must be at least one WHEN clause in a 
MERGE statement", ctx)
 }
 // children being empty means that the condition is not set
-if (matchedActions.length == 2 && matchedActions.head.children.isEmpty) {
-  throw new ParseException("When there are 2 MATCHED clauses in a MERGE 
statement, " +
-"the first MATCHED clause must have a condition", ctx)
-}
-if (matchedActions.groupBy(_.getClass).mapValues(_.size).exists(_._2 > 1)) 
{
+val matchedActionSize = matchedActions.length
+if (matchedActionSize >= 2 && 
!matchedActions.init.forall(_.condition.nonEmpty)) {

Review comment:
   Yes, it was a bug.

##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
##
@@ -1134,58 +1134,53 @@ class DDLParserSuite extends AnalysisTest {
 }
   }
 
-  test("merge into table: at most two matched clauses") {
-val exc = intercept[ParseException] {
-  parsePlan(
-"""
-  |MERGE INTO testcat1.ns1.ns2.tbl AS target
-  |USING testcat2.ns1.ns2.tbl AS source
-  |ON target.col1 = source.col1
-  |WHEN MATCHED AND (target.col2='delete') THEN DELETE
-  |WHEN MATCHED AND (target.col2='update1') THEN UPDATE SET 
target.col2 = source.col2
-  |WHEN MATCHED AND (target.col2='update2') THEN UPDATE SET 
target.col2 = source.col2
-  |WHEN NOT MATCHED AND (target.col2='insert')
-  |THEN INSERT (target.col1, target.col2) values (source.col1, 
source.col2)
-""".stripMargin)
-}
-
-assert(exc.getMessage.contains("There should be at most 2 'WHEN MATCHED' 
clauses."))
-  }
-
-  test("merge into table: at most one not matched clause") {
-val exc = intercept[ParseException] {
-  parsePlan(
-"""
-  |MERGE INTO testcat1.ns1.ns2.tbl AS target
-  |USING testcat2.ns1.ns2.tbl AS source
-  |ON target.col1 = source.col1
-  |WHEN MATCHED AND (target.col2='delete') THEN DELETE
-  |WHEN MATCHED AND (target.col2='update1') THEN UPDATE SET 
target.col2 = source.col2
-  |WHEN NOT MATCHED AND (target.col2='insert1')
-  |THEN INSERT (target.col1, target.col2) values (source.col1, 
source.col2)
-  |WHEN NOT MATCHED AND (target.col2='insert2')
-  |THEN INSERT (target.col1, target.col2) values (source.col1, 
source.col2)
-""".stripMargin)
-}
-
-assert(exc.getMessage.contains("There should be at most 1 'WHEN NOT 
MATCHED' clause."))
+  test("merge into table: multi matched and not matched clauses") {
+parseCompare(
+  """
+|MERGE INTO testcat1.ns1.ns2.tbl AS target
+|USING testcat2.ns1.ns2.tbl AS source
+|ON target.col1 = source.col1
+|WHEN MATCHED AND (target.col2='delete') THEN DELETE
+|WHEN MATCHED AND (target.col2='update to 1') THEN UPDATE SET 
target.col2 = 1
+|WHEN MATCHED AND (target.col2='update to 2') THEN UPDATE SET 
target.col2 = 2
+|WHEN NOT MATCHED AND (target.col2='insert 1')
+|THEN INSERT (target.col1, target.col2) values (source.col1, 1)
+|WHEN NOT MATCHED AND (target.col2='insert 2')
+|THEN INSERT (target.col1, target.col2) values (source.col1, 2)
+  """.stripMargin,
+  MergeIntoTable(
+SubqueryAlias("target", UnresolvedRelation(Seq("testcat1", "ns1", 
"ns2", "tbl"))),
+SubqueryAlias("source", UnresolvedRelation(Seq("testcat2", "ns1", 
"ns2", "tbl"))),
+EqualTo(UnresolvedAttribute("target.col1"), 
UnresolvedAttribute("source.col1")),
+Seq(DeleteAction(Some(EqualTo(UnresolvedAttribute("target.col2"), 
Literal("delete",
+  UpdateAction(Some(EqualTo(UnresolvedAttribute("target.col2"), 
Literal("update to 1"))),
+Seq(Assignment(UnresolvedAttribute("target.col2"), Literal(1,
+  UpdateAction(Some(EqualTo(UnresolvedAttribute("target.col2"), 
Literal("update to 2"))),
+Seq(Assignment(UnresolvedAttribute("target.col2"), Literal(2),
+Seq(InsertAction(Some(EqualTo(UnresolvedAttribute("target.col2"), 
Literal("insert 1"))),
+  Seq(Assignment(UnresolvedAttribute("target.col1"), 
UnresolvedAttribute("source.col1")),
+Assignment(UnresolvedAttribute("target.col2"), Literal(1,
+  InsertAction(Some(EqualTo(UnresolvedAttribute("target.col2"), 
Literal("insert 2"))),
+Seq(Assignment(UnresolvedAttribute("target.col1"), 
UnresolvedAttribute("source.col1")),
+  Assignment(UnresolvedAttribute("target.col2"), Literal(2)))
   }
 
-  test("merge into table:

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28875:
URL: https://github.com/apache/spark/pull/28875#issuecomment-650680512







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28685:
URL: https://github.com/apache/spark/pull/28685#issuecomment-650680524







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-27 Thread GitBox



AmplabJenkins commented on pull request #28875:
URL: https://github.com/apache/spark/pull/28875#issuecomment-650680512







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-06-27 Thread GitBox



AmplabJenkins commented on pull request #28685:
URL: https://github.com/apache/spark/pull/28685#issuecomment-650680524







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #28866: [SPARK-31845][CORE][TESTS] Refactor DAGSchedulerSuite by introducing completeAndCheckAnswer and using completeNextStageWithFetchFailure

2020-06-27 Thread GitBox



beliefer commented on pull request #28866:
URL: https://github.com/apache/spark/pull/28866#issuecomment-650680239


   @dongjoon-hyun @Ngone51 Thanks for your help!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-06-27 Thread GitBox



SparkQA commented on pull request #28685:
URL: https://github.com/apache/spark/pull/28685#issuecomment-650680160


   **[Test build #124586 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124586/testReport)**
 for PR 28685 at commit 
[`47b68e7`](https://github.com/apache/spark/commit/47b68e753c20fb865967a583b1d377b2e7f744cf).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-27 Thread GitBox



SparkQA commented on pull request #28875:
URL: https://github.com/apache/spark/pull/28875#issuecomment-650680152


   **[Test build #124585 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124585/testReport)**
 for PR 28875 at commit 
[`d5edef3`](https://github.com/apache/spark/commit/d5edef3c2b950440614fc5c9ee1e770bcd0b9884).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-06-27 Thread GitBox



beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r446593095



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -363,6 +363,11 @@ abstract class OffsetWindowFunction
*/
   val direction: SortDirection
 
+  /**
+   * Whether the offset is based on the entire frame.
+   */
+  val isWholeBased: Boolean = false

Review comment:
   I added this flag used to distinguish `OffsetWindowFunctionFrame` and 
`FixedOffsetWindowFunctionFrame`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



LantaoJin commented on pull request #28935:
URL: https://github.com/apache/spark/pull/28935#issuecomment-650676333


   Thanks @HyukjinKwon, if this could be merged, can you help on python side?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28935:
URL: https://github.com/apache/spark/pull/28935#issuecomment-650676056







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-06-27 Thread GitBox



beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r446591904



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala
##
@@ -151,10 +168,41 @@ final class OffsetWindowFunctionFrame(
 }
 inputIndex += 1
   }
+}
 
-  override def currentLowerBound(): Int = throw new 
UnsupportedOperationException()
+/**
+ * The fixed offset window frame calculates frames containing
+ * NTH_VALUE/FIRST_VALUE/LAST_VALUE statements.
+ * The fixed offset windwo frame return the same value for all rows in the 
window partition.
+ */
+class FixedOffsetWindowFunctionFrame(
+target: InternalRow,
+ordinal: Int,
+expressions: Array[OffsetWindowFunction],
+inputSchema: Seq[Attribute],
+newMutableProjection: (Seq[Expression], Seq[Attribute]) => 
MutableProjection,
+offset: Int)
+  extends OffsetWindowFunctionFrameBase(
+target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
 
-  override def currentUpperBound(): Int = throw new 
UnsupportedOperationException()
+  var rowOption: Option[UnsafeRow] = None

Review comment:
   OK.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



AmplabJenkins commented on pull request #28935:
URL: https://github.com/apache/spark/pull/28935#issuecomment-650676056







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28913: [SPARK-23631][ML][PySpark] Add summary to RandomForestClassificationModel

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28913:
URL: https://github.com/apache/spark/pull/28913#issuecomment-650675872


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124580/
   Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



SparkQA commented on pull request #28935:
URL: https://github.com/apache/spark/pull/28935#issuecomment-650675879


   **[Test build #124584 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124584/testReport)**
 for PR 28935 at commit 
[`ba2ef06`](https://github.com/apache/spark/commit/ba2ef0648188c1ddbf7488bf9ed03edfb6d6c53f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-06-27 Thread GitBox



beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r446591742



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala
##
@@ -151,10 +168,41 @@ final class OffsetWindowFunctionFrame(
 }
 inputIndex += 1
   }
+}
 
-  override def currentLowerBound(): Int = throw new 
UnsupportedOperationException()
+/**
+ * The fixed offset window frame calculates frames containing
+ * NTH_VALUE/FIRST_VALUE/LAST_VALUE statements.

Review comment:
   I will open other PR to support first_value and last_value.
   I will delete first_value and last_value temporarily, until the other PR 
support them.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28913: [SPARK-23631][ML][PySpark] Add summary to RandomForestClassificationModel

2020-06-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28913:
URL: https://github.com/apache/spark/pull/28913#issuecomment-650675868


   Merged build finished. Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28913: [SPARK-23631][ML][PySpark] Add summary to RandomForestClassificationModel

2020-06-27 Thread GitBox



AmplabJenkins commented on pull request #28913:
URL: https://github.com/apache/spark/pull/28913#issuecomment-650675868







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28913: [SPARK-23631][ML][PySpark] Add summary to RandomForestClassificationModel

2020-06-27 Thread GitBox



SparkQA commented on pull request #28913:
URL: https://github.com/apache/spark/pull/28913#issuecomment-650675689


   **[Test build #124580 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124580/testReport)**
 for PR 28913 at commit 
[`55b52bd`](https://github.com/apache/spark/commit/55b52bdb28a001157d0b0b265c687023267bb58c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28913: [SPARK-23631][ML][PySpark] Add summary to RandomForestClassificationModel

2020-06-27 Thread GitBox



SparkQA removed a comment on pull request #28913:
URL: https://github.com/apache/spark/pull/28913#issuecomment-650665077


   **[Test build #124580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124580/testReport)**
 for PR 28913 at commit 
[`55b52bd`](https://github.com/apache/spark/commit/55b52bdb28a001157d0b0b265c687023267bb58c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-06-27 Thread GitBox



beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r446591742



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala
##
@@ -151,10 +168,41 @@ final class OffsetWindowFunctionFrame(
 }
 inputIndex += 1
   }
+}
 
-  override def currentLowerBound(): Int = throw new 
UnsupportedOperationException()
+/**
+ * The fixed offset window frame calculates frames containing
+ * NTH_VALUE/FIRST_VALUE/LAST_VALUE statements.

Review comment:
   I will open other PR to support first_value and last_value.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-06-27 Thread GitBox



beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r446591681



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala
##
@@ -151,10 +168,41 @@ final class OffsetWindowFunctionFrame(
 }
 inputIndex += 1
   }
+}
 
-  override def currentLowerBound(): Int = throw new 
UnsupportedOperationException()
+/**
+ * The fixed offset window frame calculates frames containing
+ * NTH_VALUE/FIRST_VALUE/LAST_VALUE statements.
+ * The fixed offset windwo frame return the same value for all rows in the 
window partition.

Review comment:
   Thanks.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-06-27 Thread GitBox



beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r446591593



##
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##
@@ -993,6 +993,30 @@ object functions {
 Lead(e.expr, Literal(offset), Literal(defaultValue))
   }
 
+  /**
+   * Window function: returns the value that is the `offset`th row of the 
window frame
+   * (counting from 1), and `null` if the size of window frame is less than 
`offset` rows.
+   *
+   * This is equivalent to the nth_value function in SQL.
+   *
+   * @group window_funcs
+   * @since 3.1.0
+   */
+  def nth_value(columnName: String, offset: Int): Column = {
+nth_value(Column(columnName), offset)
+  }
+
+  /**
+   * Window function: returns the value that is the `offset`th row of the 
window frame
+   * (counting from 1), and `null` if the size of window frame is less than 
`offset` rows.
+   *
+   * This is equivalent to the nth_value function in SQL.
+   *
+   * @group window_funcs
+   * @since 3.0.0

Review comment:
   OK.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

2020-06-27 Thread GitBox



LantaoJin commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-650675131


   Before that, I think we still need to fix the problem describe in the 
description. 
https://github.com/apache/spark/pull/28833#pullrequestreview-435416974 is a 
good idea to handle it. I file #28935 as a new fixing. @maropu @cloud-fan 
@wangyum 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive

2020-06-27 Thread GitBox



SparkQA commented on pull request #28935:
URL: https://github.com/apache/spark/pull/28935#issuecomment-650674973


   **[Test build #124583 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124583/testReport)**
 for PR 28935 at commit 
[`17b1853`](https://github.com/apache/spark/commit/17b185302433c4bc823148ab5bff57222872af0c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 332 matches

Mail list logo