[GitHub] [spark] viirya commented on pull request #28934: [SPARK-32113][SQL] Avoid coalescing shuffle partitions if join condition has inequality predicate
viirya commented on pull request #28934: URL: https://github.com/apache/spark/pull/28934#issuecomment-650705906 Can you elaborate it more why should not coalesce shuffle partition if the join condition has inequality predicate? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kiszk commented on a change in pull request #28937: [SPARK-32115][SQL] Incorrect results for SUBSTRING when overflow
kiszk commented on a change in pull request #28937: URL: https://github.com/apache/spark/pull/28937#discussion_r446608738 ## File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ## @@ -341,8 +341,17 @@ public UTF8String substringSQL(int pos, int length) { // to the -ith element before the end of the sequence. If a start index i is 0, it // refers to the first element. int len = numChars(); +// `len + pos` does not overflow as `len >= 0`. Review comment: What happens if `len = 10` and `pos = Integer.MIN_VALUE`. I guess that `start` would have an incorrect value. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.
AmplabJenkins removed a comment on pull request #28917: URL: https://github.com/apache/spark/pull/28917#issuecomment-650705470 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.
AmplabJenkins commented on pull request #28917: URL: https://github.com/apache/spark/pull/28917#issuecomment-650705470 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.
SparkQA commented on pull request #28917: URL: https://github.com/apache/spark/pull/28917#issuecomment-650705186 **[Test build #124587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124587/testReport)** for PR 28917 at commit [`350fa8d`](https://github.com/apache/spark/commit/350fa8dc6b8ec0d9b28c5200cd287b48a22cfca0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.
SparkQA removed a comment on pull request #28917: URL: https://github.com/apache/spark/pull/28917#issuecomment-650681799 **[Test build #124587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124587/testReport)** for PR 28917 at commit [`350fa8d`](https://github.com/apache/spark/commit/350fa8dc6b8ec0d9b28c5200cd287b48a22cfca0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kiszk commented on a change in pull request #28937: [SPARK-32115][SQL] Incorrect results for SUBSTRING when overflow
kiszk commented on a change in pull request #28937: URL: https://github.com/apache/spark/pull/28937#discussion_r446607693 ## File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ## @@ -341,8 +341,17 @@ public UTF8String substringSQL(int pos, int length) { // to the -ith element before the end of the sequence. If a start index i is 0, it // refers to the first element. int len = numChars(); +// `len + pos` does not overflow as `len >= 0`. int start = (pos > 0) ? pos -1 : ((pos < 0) ? len + pos : 0); -int end = (length == Integer.MAX_VALUE) ? len : start + length; + +int end; +if((long) start + length > Integer.MAX_VALUE) { Review comment: nit: `if ((long) start ...` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on pull request #28629: [SPARK-31769][CORE] Add MDC support for driver threads
igreenfield commented on pull request #28629: URL: https://github.com/apache/spark/pull/28629#issuecomment-650704314 @dongjoon-hyun the failed test does not seems to be connected to the code changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Fokko commented on pull request #28754: [SPARK-10520][SQL] Allow average out of DateType
Fokko commented on pull request #28754: URL: https://github.com/apache/spark/pull/28754#issuecomment-650704018 That looks relevant @dongjoon-hyun, thanks for pointing out. I've removed the check since it is allowed to cast from/to date. The cast is asserted by newly added tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28937: [SPARK-32115][SQL] Incorrect results for SUBSTRING when overflow
maropu commented on a change in pull request #28937: URL: https://github.com/apache/spark/pull/28937#discussion_r446606269 ## File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ## @@ -341,8 +341,17 @@ public UTF8String substringSQL(int pos, int length) { // to the -ith element before the end of the sequence. If a start index i is 0, it // refers to the first element. int len = numChars(); +// `len + pos` does not overflow as `len >= 0`. int start = (pos > 0) ? pos -1 : ((pos < 0) ? len + pos : 0); -int end = (length == Integer.MAX_VALUE) ? len : start + length; Review comment: Nice catch! Could you add some tests in `UTF8StringSuite`, too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static
ulysses-you commented on pull request #26875: URL: https://github.com/apache/spark/pull/26875#issuecomment-650702868 yeah, but it also has little performance regression with normal case seen as test3. I think it's a reason to do this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
viirya commented on a change in pull request #27690: URL: https://github.com/apache/spark/pull/27690#discussion_r446605636 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala ## @@ -97,12 +99,38 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { options = Map.empty) } - protected def getExternalTmpPath( + // Mostly copied from Context.java#getMRTmpPath of Hive 2.3. + // Visible for testing. + private[execution] def getNonBlobTmpPath( + hadoopConf: Configuration, + sessionScratchDir: String, + scratchDir: String): Path = { + +// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1', +// which is ruled by 'hive.exec.scratchdir' including file system. +// This is the same as Spark's #oldVersionExternalTempPath. +// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090. +// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir' +// Here it uses session_path unless it's emtpy, otherwise uses scratchDir. +val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else scratchDir +val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), hadoopConf, sessionPath) Review comment: If any one scheme doesn't work? Do we have clear error message to tell users what happens? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
viirya commented on a change in pull request #27690: URL: https://github.com/apache/spark/pull/27690#discussion_r446605636 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala ## @@ -97,12 +99,38 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { options = Map.empty) } - protected def getExternalTmpPath( + // Mostly copied from Context.java#getMRTmpPath of Hive 2.3. + // Visible for testing. + private[execution] def getNonBlobTmpPath( + hadoopConf: Configuration, + sessionScratchDir: String, + scratchDir: String): Path = { + +// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1', +// which is ruled by 'hive.exec.scratchdir' including file system. +// This is the same as Spark's #oldVersionExternalTempPath. +// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090. +// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir' +// Here it uses session_path unless it's emtpy, otherwise uses scratchDir. +val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else scratchDir +val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), hadoopConf, sessionPath) Review comment: If any one scheme doesn't work? Do we have clear error message to tell users what happens and how to fix it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
moomindani commented on a change in pull request #27690: URL: https://github.com/apache/spark/pull/27690#discussion_r446516892 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala ## @@ -97,12 +99,38 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { options = Map.empty) } - protected def getExternalTmpPath( + // Mostly copied from Context.java#getMRTmpPath of Hive 2.3. + // Visible for testing. + private[execution] def getNonBlobTmpPath( + hadoopConf: Configuration, + sessionScratchDir: String, + scratchDir: String): Path = { + +// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1', +// which is ruled by 'hive.exec.scratchdir' including file system. +// This is the same as Spark's #oldVersionExternalTempPath. +// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090. +// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir' +// Here it uses session_path unless it's emtpy, otherwise uses scratchDir. +val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else scratchDir +val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), hadoopConf, sessionPath) Review comment: Hive has two kinds of scratch dir accordingly, one in local, the other in hdfs. https://mingyue.me/2018/11/17/hive-scratch-working-directory/ In this pull-request, the latter one, `hive.exec.scratchdir` is used. In my recognition, we can assume HDFS schema in most cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28905: [SPARK-32071][SQL][TESTS] Add `make_interval` benchmark
AmplabJenkins removed a comment on pull request #28905: URL: https://github.com/apache/spark/pull/28905#issuecomment-650701364 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28905: [SPARK-32071][SQL][TESTS] Add `make_interval` benchmark
AmplabJenkins commented on pull request #28905: URL: https://github.com/apache/spark/pull/28905#issuecomment-650701364 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28905: [SPARK-32071][SQL][TESTS] Add `make_interval` benchmark
SparkQA removed a comment on pull request #28905: URL: https://github.com/apache/spark/pull/28905#issuecomment-650658676 **[Test build #124579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124579/testReport)** for PR 28905 at commit [`3c5b604`](https://github.com/apache/spark/commit/3c5b6041477194a855667059629d0fe4b0258b23). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
moomindani commented on a change in pull request #27690: URL: https://github.com/apache/spark/pull/27690#discussion_r446604034 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala ## @@ -97,12 +99,38 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { options = Map.empty) } - protected def getExternalTmpPath( + // Mostly copied from Context.java#getMRTmpPath of Hive 2.3. + // Visible for testing. + private[execution] def getNonBlobTmpPath( + hadoopConf: Configuration, + sessionScratchDir: String, + scratchDir: String): Path = { + +// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1', +// which is ruled by 'hive.exec.scratchdir' including file system. +// This is the same as Spark's #oldVersionExternalTempPath. +// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090. +// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir' +// Here it uses session_path unless it's emtpy, otherwise uses scratchDir. +val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else scratchDir +val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), hadoopConf, sessionPath) Review comment: Currently we just rely on `hive.exec.scratchdir` (not directly on `fs.default.name`), and it works in most use cases even if `hive.exec.scratchdir` is not configured explicitly. I do not want to restrict this feature to HDFS only because I have seen some clusters which do not have HDFS. I want to let end-users choose any scheme where they want to store temporary data. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28905: [SPARK-32071][SQL][TESTS] Add `make_interval` benchmark
SparkQA commented on pull request #28905: URL: https://github.com/apache/spark/pull/28905#issuecomment-650701112 **[Test build #124579 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124579/testReport)** for PR 28905 at commit [`3c5b604`](https://github.com/apache/spark/commit/3c5b6041477194a855667059629d0fe4b0258b23). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
moomindani commented on a change in pull request #27690: URL: https://github.com/apache/spark/pull/27690#discussion_r446604034 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala ## @@ -97,12 +99,38 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { options = Map.empty) } - protected def getExternalTmpPath( + // Mostly copied from Context.java#getMRTmpPath of Hive 2.3. + // Visible for testing. + private[execution] def getNonBlobTmpPath( + hadoopConf: Configuration, + sessionScratchDir: String, + scratchDir: String): Path = { + +// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1', +// which is ruled by 'hive.exec.scratchdir' including file system. +// This is the same as Spark's #oldVersionExternalTempPath. +// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090. +// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir' +// Here it uses session_path unless it's emtpy, otherwise uses scratchDir. +val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else scratchDir +val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), hadoopConf, sessionPath) Review comment: Currently we just rely on `hive.exec.scratchdir`, and it works in most use cases even if `hive.exec.scratchdir` is not configured explicitly. I do not want to restrict this feature to HDFS only because I have seen some clusters which do not have HDFS. I want to let end-users choose any scheme where they want to store temporary data. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28901: [WIP][SPARK-32064][SQL] Supporting create temporary table
AmplabJenkins removed a comment on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-650700128 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28901: [WIP][SPARK-32064][SQL] Supporting create temporary table
AmplabJenkins commented on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-650700128 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28901: [WIP][SPARK-32064][SQL] Supporting create temporary table
SparkQA commented on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-650700037 **[Test build #124594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124594/testReport)** for PR 28901 at commit [`6d3274f`](https://github.com/apache/spark/commit/6d3274f08c1c81262c8b0c21aa133a04e31c6796). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on pull request #28901: [WIP][SPARK-32064][SQL] Supporting create temporary table
LantaoJin commented on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-650698393 @gatorsmile Yes. Just like Hive temporary table or Teradata volatile table. We are migrating our Spark to v3.0. This is one of inside features which had widely used in our prodution. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand
AmplabJenkins removed a comment on pull request #28647: URL: https://github.com/apache/spark/pull/28647#issuecomment-650697849 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124581/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand
AmplabJenkins removed a comment on pull request #28647: URL: https://github.com/apache/spark/pull/28647#issuecomment-650697842 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand
AmplabJenkins commented on pull request #28647: URL: https://github.com/apache/spark/pull/28647#issuecomment-650697842 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand
SparkQA removed a comment on pull request #28647: URL: https://github.com/apache/spark/pull/28647#issuecomment-650673555 **[Test build #124581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124581/testReport)** for PR 28647 at commit [`5c63477`](https://github.com/apache/spark/commit/5c634779f429ae148ff2d7f0453e3935109b7785). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28647: [SPARK-31828][SQL] Retain table properties at CreateTableLikeCommand
SparkQA commented on pull request #28647: URL: https://github.com/apache/spark/pull/28647#issuecomment-650697664 **[Test build #124581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124581/testReport)** for PR 28647 at commit [`5c63477`](https://github.com/apache/spark/commit/5c634779f429ae148ff2d7f0453e3935109b7785). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 edited a comment on pull request #28918: [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab
TJX2014 edited a comment on pull request #28918: URL: https://github.com/apache/spark/pull/28918#issuecomment-650686745 @dongjoon-hyun Thanks, I have done. :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 edited a comment on pull request #28918: [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab
TJX2014 edited a comment on pull request #28918: URL: https://github.com/apache/spark/pull/28918#issuecomment-650686745 @dongjoon-hyun Thanks, I have done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28937: [SPARK-32115][SQL] Incorrect results for SUBSTRING when overflow
AmplabJenkins removed a comment on pull request #28937: URL: https://github.com/apache/spark/pull/28937#issuecomment-650694350 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28937: [SPARK-32115][SQL] Incorrect results for SUBSTRING when overflow
AmplabJenkins commented on pull request #28937: URL: https://github.com/apache/spark/pull/28937#issuecomment-650694350 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path
HeartSaVioR edited a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-650694094 Please take a look at how Kafka data source options apply with both batch and streaming query. The semantic of the option should be applied differently. http://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-for-batch-queries `startingOffsetsByTimestamp`, `startingOffsets`, `endingOffsetsByTimestamp`, `endingOffsets` If we are not fully sure about how to do it, let's only apply the option to batch query, and file an issue to address for the streaming query. Btw, that said, I prefer to have lower bound + upper bound instead of only lower bound, as commented earlier on reviewing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28841: [SPARK-31962][SQL][SS] Provide option to load files after a specified date when reading from a folder path
HeartSaVioR commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-650694094 Please take a look at how Kafka data source options apply with both batch and streaming query. The semantic of the option should be applied differently. http://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-for-batch-queries `startingOffsetsByTimestamp`, `startingOffsets`, `endingOffsetsByTimestamp`, `endingOffsets` If we are not fully sure about, let's only apply the option to batch query, and file an issue to address for the streaming query. Btw, that said, I prefer to have lower bound + upper bound instead of only lower bound, as commented earlier on reviewing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28937: [SPARK-32115][SQL] Incorrect results for SUBSTRING when overflow
SparkQA commented on pull request #28937: URL: https://github.com/apache/spark/pull/28937#issuecomment-650694181 **[Test build #124593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124593/testReport)** for PR 28937 at commit [`5f109a8`](https://github.com/apache/spark/commit/5f109a87bcdadf693352f995bab0e72faf360824). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r446590900 ## File path: sql/core/src/test/resources/sql-tests/inputs/postgreSQL/window_part1.sql ## @@ -301,7 +301,7 @@ FROM tenk1 WHERE unique1 < 10; -- unique1, four -- FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four); --- [SPARK-27951] ANSI SQL: NTH_VALUE function +-- [SPARK-30708] first_value/last_value window function throws ParseException Review comment: Because #25082 has reverted, SPARK-30708 not need. I updated this comments with SPARK-28310 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking opened a new pull request #28937: [SPARK-32115][SQL] Incorrect results for SUBSTRING when overflow
xuanyuanking opened a new pull request #28937: URL: https://github.com/apache/spark/pull/28937 ### What changes were proposed in this pull request? Bug fix for overflow case in `UTF8String.substringSQL`. ### Why are the changes needed? SQL query `SELECT SUBSTRING("abc", -1207959552, -1207959552)` incorrectly returns` "abc"` against expected output of `""`. For query `SUBSTRING("abc", -100, -100)`, we'll get the right output of `""`. ### Does this PR introduce _any_ user-facing change? Yes, bug fix for the overflow case. ### How was this patch tested? New UT. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on pull request #28937: [SPARK-32115][SQL] Incorrect results for SUBSTRING when overflow
xuanyuanking commented on pull request #28937: URL: https://github.com/apache/spark/pull/28937#issuecomment-650693563 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans
AmplabJenkins removed a comment on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-650692254 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans
AmplabJenkins commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-650692254 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans
SparkQA commented on pull request #28898: URL: https://github.com/apache/spark/pull/28898#issuecomment-650692166 **[Test build #124592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124592/testReport)** for PR 28898 at commit [`acce8c5`](https://github.com/apache/spark/commit/acce8c5d8d51bae5f981e56a8811f075cb07d214). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] frankyin-factual commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans
frankyin-factual commented on a change in pull request #28898: URL: https://github.com/apache/spark/pull/28898#discussion_r446598259 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -39,6 +39,22 @@ object NestedColumnAliasing { NestedColumnAliasing.replaceToAliases(plan, nestedFieldToAlias, attrToAliases) } +/** + * This is to solve a `LogicalPlan` like `Project`->`Filter`->`Window`. + * In this case, `Window` can be plan that is `canProjectPushThrough`. + * By adding this, it allows nested columns to be passed onto next stages. + * Currently, not adding `Filter` into `canProjectPushThrough` due to + * infinitely loop in optimizers during the predicate push-down rule. + */ Review comment: I don't know exactly why it's broken, but here is a simple query that can reproduce this issue: `select name.last from contacts where name.first='Jane'` The error message is like: ``` 20/06/27 21:17:41 WARN internal.BaseSessionStateBuilder$$anon$2: Max iterations (100) reached for batch Operator Optimization before Inferring Filters, please set 'spark.sql.optimizer.maxIterations' to a larger value. 20/06/27 21:17:41 WARN internal.BaseSessionStateBuilder$$anon$2: Max iterations (100) reached for batch Operator Optimization after Inferring Filters, please set 'spark.sql.optimizer.maxIterations' to a larger value. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-650690135 UPDATE: now SPARK-30946 + SPARK-30462 writes 11879 which RES is still less than 2G (around 1.7G). I'll stop the sustaining test for enough heap and run the another sustaining test for smaller heap (1.5G). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28935: [SPARK-20680][SQL] Adding HiveVoidType in Spark to be compatible with Hive
AmplabJenkins removed a comment on pull request #28935: URL: https://github.com/apache/spark/pull/28935#issuecomment-650688121 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
AmplabJenkins commented on pull request #28685: URL: https://github.com/apache/spark/pull/28685#issuecomment-650688119 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
AmplabJenkins removed a comment on pull request #28685: URL: https://github.com/apache/spark/pull/28685#issuecomment-650688119 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28935: [SPARK-20680][SQL] Adding HiveVoidType in Spark to be compatible with Hive
AmplabJenkins commented on pull request #28935: URL: https://github.com/apache/spark/pull/28935#issuecomment-650688121 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28935: [SPARK-20680][SQL] Adding HiveVoidType in Spark to be compatible with Hive
SparkQA commented on pull request #28935: URL: https://github.com/apache/spark/pull/28935#issuecomment-650687920 **[Test build #124590 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124590/testReport)** for PR 28935 at commit [`17aace2`](https://github.com/apache/spark/commit/17aace25df565491012a729d24c9035b988904d6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
SparkQA commented on pull request #28685: URL: https://github.com/apache/spark/pull/28685#issuecomment-650687924 **[Test build #124591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124591/testReport)** for PR 28685 at commit [`f7c2b1e`](https://github.com/apache/spark/commit/f7c2b1e7b7134d32691a7a844c497a5cdf731aad). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu edited a comment on pull request #28737: [SPARK-31913][SQL] Fix StackOverflowError in FileScanRDD
maropu edited a comment on pull request #28737: URL: https://github.com/apache/spark/pull/28737#issuecomment-650686603 Yea, we need env-independent tests to reproduce this issue... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28918: [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab
AmplabJenkins removed a comment on pull request #28918: URL: https://github.com/apache/spark/pull/28918#issuecomment-650687180 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28918: [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab
AmplabJenkins commented on pull request #28918: URL: https://github.com/apache/spark/pull/28918#issuecomment-650687180 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28936: [SPARK-30798][SS][FOLLOW-UP] Scope Session.active in IncrementalExecution
AmplabJenkins removed a comment on pull request #28936: URL: https://github.com/apache/spark/pull/28936#issuecomment-650687026 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28936: [SPARK-30798][SS][FOLLOW-UP] Scope Session.active in IncrementalExecution
AmplabJenkins commented on pull request #28936: URL: https://github.com/apache/spark/pull/28936#issuecomment-650687026 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28918: [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab
SparkQA removed a comment on pull request #28918: URL: https://github.com/apache/spark/pull/28918#issuecomment-650656834 **[Test build #124577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124577/testReport)** for PR 28918 at commit [`698cae6`](https://github.com/apache/spark/commit/698cae60d76988939c0c80bf9abcfc8eb8214bd2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28936: [SPARK-30798][SS][FOLLOW-UP] Scope Session.active in IncrementalExecution
SparkQA commented on pull request #28936: URL: https://github.com/apache/spark/pull/28936#issuecomment-650686881 **[Test build #124589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124589/testReport)** for PR 28936 at commit [`5ca38fd`](https://github.com/apache/spark/commit/5ca38fd63233752f302225b1ab3d0f54f5847831). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28918: [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab
SparkQA commented on pull request #28918: URL: https://github.com/apache/spark/pull/28918#issuecomment-650686877 **[Test build #124577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124577/testReport)** for PR 28918 at commit [`698cae6`](https://github.com/apache/spark/commit/698cae60d76988939c0c80bf9abcfc8eb8214bd2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on pull request #28918: [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab
TJX2014 commented on pull request #28918: URL: https://github.com/apache/spark/pull/28918#issuecomment-650686745 @dongjoon-hyun Thanks, I may have done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive
dongjoon-hyun commented on pull request #28935: URL: https://github.com/apache/spark/pull/28935#issuecomment-650686555 Thank you for working on this, @LantaoJin ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r446590900 ## File path: sql/core/src/test/resources/sql-tests/inputs/postgreSQL/window_part1.sql ## @@ -301,7 +301,7 @@ FROM tenk1 WHERE unique1 < 10; -- unique1, four -- FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four); --- [SPARK-27951] ANSI SQL: NTH_VALUE function +-- [SPARK-30708] first_value/last_value window function throws ParseException Review comment: Because #25082 has reverted, SPARK-30708 not need. I will delete this line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28737: [SPARK-31913][SQL] Fix StackOverflowError in FileScanRDD
maropu commented on pull request #28737: URL: https://github.com/apache/spark/pull/28737#issuecomment-650686603 Yea, we need env-independent tests for this issue... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive
dongjoon-hyun commented on a change in pull request #28935: URL: https://github.com/apache/spark/pull/28935#discussion_r446595871 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types + +/** + * A hive null type for compatibility. These datatypes should only used for parsing, + * and should NOT be used anywhere else. Any instance of these data types should be + * replaced by a [[NullType]] before analysis. + */ +class HiveNullType private() extends DataType { + + override def defaultSize: Int = 1 + + override private[spark] def asNullable: HiveNullType = this + + override def simpleString: String = "void" +} + +case object HiveNullType extends HiveNullType { + def replaceNullType(dt: DataType): DataType = dt match { +case ArrayType(et, nullable) => + ArrayType(replaceNullType(et), nullable) +case MapType(kt, vt, nullable) => + MapType(replaceNullType(kt), replaceNullType(vt), nullable) +case StructType(fields) => + StructType(fields.map { field => +field.copy(dataType = replaceNullType(field.dataType)) + }) +case _: HiveNullType => NullType +case _ => dt + } + + + def containsNullType(dt: DataType): Boolean = dt match { Review comment: Shall we remove this because this PR doesn't use this at all? You can add later when you need. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking opened a new pull request #28936: [SPARK-30798][SS][FOLLOW-UP] Scope Session.active in IncrementalExecution
xuanyuanking opened a new pull request #28936: URL: https://github.com/apache/spark/pull/28936 ### What changes were proposed in this pull request? The `optimizedPlan` in IncrementalExecution should also be scoped in `withActive`. ### Why are the changes needed? Follow-up of SPARK-30798 for the Streaming side. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UT. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on pull request #28936: [SPARK-30798][SS][FOLLOW-UP] Scope Session.active in IncrementalExecution
xuanyuanking commented on pull request #28936: URL: https://github.com/apache/spark/pull/28936#issuecomment-650686260 cc @cloud-fan @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive
dongjoon-hyun commented on a change in pull request #28935: URL: https://github.com/apache/spark/pull/28935#discussion_r446595757 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types + +/** + * A hive null type for compatibility. These datatypes should only used for parsing, + * and should NOT be used anywhere else. Any instance of these data types should be + * replaced by a [[NullType]] before analysis. + */ +class HiveNullType private() extends DataType { + + override def defaultSize: Int = 1 + + override private[spark] def asNullable: HiveNullType = this + + override def simpleString: String = "void" +} + +case object HiveNullType extends HiveNullType { + def replaceNullType(dt: DataType): DataType = dt match { +case ArrayType(et, nullable) => + ArrayType(replaceNullType(et), nullable) +case MapType(kt, vt, nullable) => + MapType(replaceNullType(kt), replaceNullType(vt), nullable) +case StructType(fields) => + StructType(fields.map { field => +field.copy(dataType = replaceNullType(field.dataType)) + }) Review comment: Maybe, the following is shorter as a one-liner. ```scala StructType(fields.map(f => f.copy(dataType = replaceNullType(f.dataType ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive
LantaoJin commented on a change in pull request #28935: URL: https://github.com/apache/spark/pull/28935#discussion_r446595548 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types + +/** + * A hive null type for compatibility. These datatypes should only used for parsing, + * and should NOT be used anywhere else. Any instance of these data types should be + * replaced by a [[NullType]] before analysis. + */ +class HiveNullType private() extends DataType { Review comment: > `null` is a `value` and Hive exposes `void` as a type. You are right. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive
dongjoon-hyun commented on a change in pull request #28935: URL: https://github.com/apache/spark/pull/28935#discussion_r446595406 ## File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala ## @@ -132,7 +132,9 @@ class ResolveSessionCatalog( } } // Add Hive type string to metadata. Review comment: Please update this description together in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive
dongjoon-hyun commented on a change in pull request #28935: URL: https://github.com/apache/spark/pull/28935#discussion_r446595364 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types + +/** + * A hive null type for compatibility. These datatypes should only used for parsing, + * and should NOT be used anywhere else. Any instance of these data types should be + * replaced by a [[NullType]] before analysis. + */ +class HiveNullType private() extends DataType { + + override def defaultSize: Int = 1 + + override private[spark] def asNullable: HiveNullType = this + + override def simpleString: String = "void" +} + +case object HiveNullType extends HiveNullType { + def replaceNullType(dt: DataType): DataType = dt match { Review comment: This will be `replaceVoidType` and it will be less ambiguous. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on pull request #28737: [SPARK-31913][SQL] Fix StackOverflowError in FileScanRDD
xuanyuanking commented on pull request #28737: URL: https://github.com/apache/spark/pull/28737#issuecomment-650685434 Same question with Takeshi here https://github.com/apache/spark/pull/28737#discussion_r437831211 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive
dongjoon-hyun commented on a change in pull request #28935: URL: https://github.com/apache/spark/pull/28935#discussion_r446595267 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types + +/** + * A hive null type for compatibility. These datatypes should only used for parsing, + * and should NOT be used anywhere else. Any instance of these data types should be + * replaced by a [[NullType]] before analysis. + */ +class HiveNullType private() extends DataType { Review comment: `null` is a `value` and Hive exposes `void` as a type. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive
dongjoon-hyun commented on a change in pull request #28935: URL: https://github.com/apache/spark/pull/28935#discussion_r446595209 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types + +/** + * A hive null type for compatibility. These datatypes should only used for parsing, + * and should NOT be used anywhere else. Any instance of these data types should be + * replaced by a [[NullType]] before analysis. + */ +class HiveNullType private() extends DataType { Review comment: Currently, the description is interpreted like "`hive null type` should be replaced by a NullType before analysis". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive
dongjoon-hyun commented on a change in pull request #28935: URL: https://github.com/apache/spark/pull/28935#discussion_r446595141 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveNullType.scala ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types + +/** + * A hive null type for compatibility. These datatypes should only used for parsing, + * and should NOT be used anywhere else. Any instance of these data types should be + * replaced by a [[NullType]] before analysis. + */ +class HiveNullType private() extends DataType { Review comment: I know the context, but can we name this `HiveVoidType` literally? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink to avoid memory issue
xuanyuanking commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-650684827 Very impressive! I'll review this in 2 days. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive
LantaoJin commented on a change in pull request #28935: URL: https://github.com/apache/spark/pull/28935#discussion_r446595075 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2184,7 +2184,9 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging * Create a Spark DataType. */ private def visitSparkDataType(ctx: DataTypeContext): DataType = { -HiveStringType.replaceCharType(typedVisit(ctx)) +HiveNullType.replaceNullType( + HiveStringType.replaceCharType(typedVisit(ctx)) +) Review comment: Yes. Split to two lines is just for readability. Ok, I will coalesce to one line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive
dongjoon-hyun commented on a change in pull request #28935: URL: https://github.com/apache/spark/pull/28935#discussion_r446594898 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2260,7 +2263,9 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging // Add Hive type string to metadata. val rawDataType = typedVisit[DataType](ctx.dataType) -val cleanedDataType = HiveStringType.replaceCharType(rawDataType) +val cleanedDataType = HiveNullType.replaceNullType( + HiveStringType.replaceCharType(rawDataType) +) Review comment: ditto. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive
dongjoon-hyun commented on a change in pull request #28935: URL: https://github.com/apache/spark/pull/28935#discussion_r446594858 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2184,7 +2184,9 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging * Create a Spark DataType. */ private def visitSparkDataType(ctx: DataTypeContext): DataType = { -HiveStringType.replaceCharType(typedVisit(ctx)) +HiveNullType.replaceNullType( + HiveStringType.replaceCharType(typedVisit(ctx)) +) Review comment: One line is enough, isn't it? ```scala HiveNullType.replaceNullType(HiveStringType.replaceCharType(typedVisit(ctx))) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28895: [SPARK-32055][CORE][SQL] Unify getReader and getReaderForRange in ShuffleManager
dongjoon-hyun commented on a change in pull request #28895: URL: https://github.com/apache/spark/pull/28895#discussion_r446476096 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -335,23 +335,6 @@ private[spark] abstract class MapOutputTracker(conf: SparkConf) extends Logging * tuples describing the shuffle blocks that are stored at that block manager. */ def getMapSizesByExecutorId( - shuffleId: Int, - startPartition: Int, - endPartition: Int) - : Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])] - - /** - * Called from executors to get the server URIs and output sizes for each shuffle block that - * needs to be read from a given range of map output partitions (startPartition is included but - * endPartition is excluded from the range) and is produced by - * a range of mappers (startMapIndex, endMapIndex, startMapIndex is included and - * the endMapIndex is excluded). Review comment: Hi, @Ngone51 . This should be the function description of the unified `getMapSizesByExecutorId`. Did I understand correctly? Or, could you add a comment about `startMapIndex` and `endMapIndex` and about when we don't care about that because of `actualEndMapIndex` (you don't need to mention this variable name specifically). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28916: [SPARK-32083][SQL] Coalesce to one partition when all partitions are empty in AQE
AmplabJenkins removed a comment on pull request #28916: URL: https://github.com/apache/spark/pull/28916#issuecomment-650683327 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28916: [SPARK-32083][SQL] Coalesce to one partition when all partitions are empty in AQE
AmplabJenkins commented on pull request #28916: URL: https://github.com/apache/spark/pull/28916#issuecomment-650683327 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28916: [SPARK-32083][SQL] Coalesce to one partition when all partitions are empty in AQE
SparkQA commented on pull request #28916: URL: https://github.com/apache/spark/pull/28916#issuecomment-650683117 **[Test build #124588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124588/testReport)** for PR 28916 at commit [`84031c1`](https://github.com/apache/spark/commit/84031c1642f2085028163a67e365c012c3b3a906). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28916: [SPARK-32083][SQL] Coalesce to one partition when all partitions are empty in AQE
dongjoon-hyun commented on pull request #28916: URL: https://github.com/apache/spark/pull/28916#issuecomment-650682797 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
viirya commented on a change in pull request #27690: URL: https://github.com/apache/spark/pull/27690#discussion_r446593989 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala ## @@ -97,12 +99,38 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { options = Map.empty) } - protected def getExternalTmpPath( + // Mostly copied from Context.java#getMRTmpPath of Hive 2.3. + // Visible for testing. + private[execution] def getNonBlobTmpPath( + hadoopConf: Configuration, + sessionScratchDir: String, + scratchDir: String): Path = { + +// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1', +// which is ruled by 'hive.exec.scratchdir' including file system. +// This is the same as Spark's #oldVersionExternalTempPath. +// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090. +// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir' +// Here it uses session_path unless it's emtpy, otherwise uses scratchDir. +val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else scratchDir +val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), hadoopConf, sessionPath) Review comment: When this new feature is enabled, it is possible that a scheme which doesn't work for this feature is used, e.g. local scheme. If it is happened and causes some error, end-users might not know how to deal with it. Because we don't know if every scheme supports this feature, we use a list of schemes as config value, instead of a boolean config. Similarly, I think we should not reply on an assumption that `fs.default.name` always works for this feature. Can we just restrict this feature to HDFS only? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.
AmplabJenkins commented on pull request #28917: URL: https://github.com/apache/spark/pull/28917#issuecomment-650682148 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.
AmplabJenkins removed a comment on pull request #28917: URL: https://github.com/apache/spark/pull/28917#issuecomment-650682148 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.
SparkQA commented on pull request #28917: URL: https://github.com/apache/spark/pull/28917#issuecomment-650681799 **[Test build #124587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124587/testReport)** for PR 28917 at commit [`350fa8d`](https://github.com/apache/spark/commit/350fa8dc6b8ec0d9b28c5200cd287b48a22cfca0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.
beliefer commented on a change in pull request #28917: URL: https://github.com/apache/spark/pull/28917#discussion_r446593625 ## File path: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ## @@ -278,7 +280,26 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi override def beforeEach(): Unit = { super.beforeEach() -init(new SparkConf()) + } + + override protected def test(testName: String, testTags: Tag*)(testFun: => Any) + (implicit pos: Position): Unit = { +testWithSparkConf(testName, testTags: _*)()(testFun)(pos) + } + + protected def testWithSparkConf(testName: String, testTags: Tag*) Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xianyinxin commented on a change in pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
xianyinxin commented on a change in pull request #28875: URL: https://github.com/apache/spark/pull/28875#discussion_r446593547 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala ## @@ -347,23 +347,23 @@ case class MergeIntoTable( } sealed abstract class MergeAction( -condition: Option[Expression]) extends Expression with Unevaluable { +val condition: Option[Expression]) extends Expression with Unevaluable { Review comment: done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xianyinxin commented on a change in pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
xianyinxin commented on a change in pull request #28875: URL: https://github.com/apache/spark/pull/28875#discussion_r446593529 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -468,13 +458,25 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging throw new ParseException("There must be at least one WHEN clause in a MERGE statement", ctx) } // children being empty means that the condition is not set -if (matchedActions.length == 2 && matchedActions.head.children.isEmpty) { - throw new ParseException("When there are 2 MATCHED clauses in a MERGE statement, " + -"the first MATCHED clause must have a condition", ctx) -} -if (matchedActions.groupBy(_.getClass).mapValues(_.size).exists(_._2 > 1)) { +val matchedActionSize = matchedActions.length +if (matchedActionSize >= 2 && !matchedActions.init.forall(_.condition.nonEmpty)) { Review comment: Yes, it was a bug. ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala ## @@ -1134,58 +1134,53 @@ class DDLParserSuite extends AnalysisTest { } } - test("merge into table: at most two matched clauses") { -val exc = intercept[ParseException] { - parsePlan( -""" - |MERGE INTO testcat1.ns1.ns2.tbl AS target - |USING testcat2.ns1.ns2.tbl AS source - |ON target.col1 = source.col1 - |WHEN MATCHED AND (target.col2='delete') THEN DELETE - |WHEN MATCHED AND (target.col2='update1') THEN UPDATE SET target.col2 = source.col2 - |WHEN MATCHED AND (target.col2='update2') THEN UPDATE SET target.col2 = source.col2 - |WHEN NOT MATCHED AND (target.col2='insert') - |THEN INSERT (target.col1, target.col2) values (source.col1, source.col2) -""".stripMargin) -} - -assert(exc.getMessage.contains("There should be at most 2 'WHEN MATCHED' clauses.")) - } - - test("merge into table: at most one not matched clause") { -val exc = intercept[ParseException] { - parsePlan( -""" - |MERGE INTO testcat1.ns1.ns2.tbl AS target - |USING testcat2.ns1.ns2.tbl AS source - |ON target.col1 = source.col1 - |WHEN MATCHED AND (target.col2='delete') THEN DELETE - |WHEN MATCHED AND (target.col2='update1') THEN UPDATE SET target.col2 = source.col2 - |WHEN NOT MATCHED AND (target.col2='insert1') - |THEN INSERT (target.col1, target.col2) values (source.col1, source.col2) - |WHEN NOT MATCHED AND (target.col2='insert2') - |THEN INSERT (target.col1, target.col2) values (source.col1, source.col2) -""".stripMargin) -} - -assert(exc.getMessage.contains("There should be at most 1 'WHEN NOT MATCHED' clause.")) + test("merge into table: multi matched and not matched clauses") { +parseCompare( + """ +|MERGE INTO testcat1.ns1.ns2.tbl AS target +|USING testcat2.ns1.ns2.tbl AS source +|ON target.col1 = source.col1 +|WHEN MATCHED AND (target.col2='delete') THEN DELETE +|WHEN MATCHED AND (target.col2='update to 1') THEN UPDATE SET target.col2 = 1 +|WHEN MATCHED AND (target.col2='update to 2') THEN UPDATE SET target.col2 = 2 +|WHEN NOT MATCHED AND (target.col2='insert 1') +|THEN INSERT (target.col1, target.col2) values (source.col1, 1) +|WHEN NOT MATCHED AND (target.col2='insert 2') +|THEN INSERT (target.col1, target.col2) values (source.col1, 2) + """.stripMargin, + MergeIntoTable( +SubqueryAlias("target", UnresolvedRelation(Seq("testcat1", "ns1", "ns2", "tbl"))), +SubqueryAlias("source", UnresolvedRelation(Seq("testcat2", "ns1", "ns2", "tbl"))), +EqualTo(UnresolvedAttribute("target.col1"), UnresolvedAttribute("source.col1")), +Seq(DeleteAction(Some(EqualTo(UnresolvedAttribute("target.col2"), Literal("delete", + UpdateAction(Some(EqualTo(UnresolvedAttribute("target.col2"), Literal("update to 1"))), +Seq(Assignment(UnresolvedAttribute("target.col2"), Literal(1, + UpdateAction(Some(EqualTo(UnresolvedAttribute("target.col2"), Literal("update to 2"))), +Seq(Assignment(UnresolvedAttribute("target.col2"), Literal(2), +Seq(InsertAction(Some(EqualTo(UnresolvedAttribute("target.col2"), Literal("insert 1"))), + Seq(Assignment(UnresolvedAttribute("target.col1"), UnresolvedAttribute("source.col1")), +Assignment(UnresolvedAttribute("target.col2"), Literal(1, + InsertAction(Some(EqualTo(UnresolvedAttribute("target.col2"), Literal("insert 2"))), +Seq(Assignment(UnresolvedAttribute("target.col1"), UnresolvedAttribute("source.col1")), + Assignment(UnresolvedAttribute("target.col2"), Literal(2))) } - test("merge into table:
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
AmplabJenkins removed a comment on pull request #28875: URL: https://github.com/apache/spark/pull/28875#issuecomment-650680512 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
AmplabJenkins removed a comment on pull request #28685: URL: https://github.com/apache/spark/pull/28685#issuecomment-650680524 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
AmplabJenkins commented on pull request #28875: URL: https://github.com/apache/spark/pull/28875#issuecomment-650680512 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
AmplabJenkins commented on pull request #28685: URL: https://github.com/apache/spark/pull/28685#issuecomment-650680524 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #28866: [SPARK-31845][CORE][TESTS] Refactor DAGSchedulerSuite by introducing completeAndCheckAnswer and using completeNextStageWithFetchFailure
beliefer commented on pull request #28866: URL: https://github.com/apache/spark/pull/28866#issuecomment-650680239 @dongjoon-hyun @Ngone51 Thanks for your help! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
SparkQA commented on pull request #28685: URL: https://github.com/apache/spark/pull/28685#issuecomment-650680160 **[Test build #124586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124586/testReport)** for PR 28685 at commit [`47b68e7`](https://github.com/apache/spark/commit/47b68e753c20fb865967a583b1d377b2e7f744cf). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28875: [SPARK-32030][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
SparkQA commented on pull request #28875: URL: https://github.com/apache/spark/pull/28875#issuecomment-650680152 **[Test build #124585 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124585/testReport)** for PR 28875 at commit [`d5edef3`](https://github.com/apache/spark/commit/d5edef3c2b950440614fc5c9ee1e770bcd0b9884). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r446593095 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -363,6 +363,11 @@ abstract class OffsetWindowFunction */ val direction: SortDirection + /** + * Whether the offset is based on the entire frame. + */ + val isWholeBased: Boolean = false Review comment: I added this flag used to distinguish `OffsetWindowFunctionFrame` and `FixedOffsetWindowFunctionFrame`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive
LantaoJin commented on pull request #28935: URL: https://github.com/apache/spark/pull/28935#issuecomment-650676333 Thanks @HyukjinKwon, if this could be merged, can you help on python side? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28935: [SPARK-20680][SQL] Adding HiveNullType in Spark to be compatible with Hive
AmplabJenkins removed a comment on pull request #28935: URL: https://github.com/apache/spark/pull/28935#issuecomment-650676056 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r446591904 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala ## @@ -151,10 +168,41 @@ final class OffsetWindowFunctionFrame( } inputIndex += 1 } +} - override def currentLowerBound(): Int = throw new UnsupportedOperationException() +/** + * The fixed offset window frame calculates frames containing + * NTH_VALUE/FIRST_VALUE/LAST_VALUE statements. + * The fixed offset windwo frame return the same value for all rows in the window partition. + */ +class FixedOffsetWindowFunctionFrame( +target: InternalRow, +ordinal: Int, +expressions: Array[OffsetWindowFunction], +inputSchema: Seq[Attribute], +newMutableProjection: (Seq[Expression], Seq[Attribute]) => MutableProjection, +offset: Int) + extends OffsetWindowFunctionFrameBase( +target, ordinal, expressions, inputSchema, newMutableProjection, offset) { - override def currentUpperBound(): Int = throw new UnsupportedOperationException() + var rowOption: Option[UnsafeRow] = None Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org