[GitHub] [spark] SparkQA commented on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
SparkQA commented on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-631047525 **[Test build #122850 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122850/testReport)** for PR 28572 at commit [`e7664a1`](https://github.com/apache/spark/commit/e7664a11b7c6f14df0132e25316a1878792963c6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk opened a new pull request #28582: [WIP][SPARK-31762][SQL] Fix perf regression of date/timestamp formatting in toHiveString
MaxGekk opened a new pull request #28582: URL: https://github.com/apache/spark/pull/28582 ### What changes were proposed in this pull request? Add new methods that accept date-time Java types to the DateFormatter and TimestampFormatter traits. The methods format input date-time instances to strings: - TimestampFormatter: - `def format(ts: Timestamp): String` - `def format(instant: Instant): String` - DateFormatter: - `def format(date: Date): String` - `def format(localDate: LocalDate): String` ### Why are the changes needed? To avoid unnecessary overhead of converting Java date-time types to micros/days before formatting. Also formatters have to convert input micros/days back to Java types to pass instances to standard library API. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By existing tests for toHiveString. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
SparkQA removed a comment on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630825781 **[Test build #122850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122850/testReport)** for PR 28572 at commit [`e7664a1`](https://github.com/apache/spark/commit/e7664a11b7c6f14df0132e25316a1878792963c6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tashoyan commented on pull request #28491: [SPARK-30267][SQL] Interoperability tests with Avro records generated by Avro4s
tashoyan commented on pull request #28491: URL: https://github.com/apache/spark/pull/28491#issuecomment-631045701 Technically we can write case classes simulating the output of avro4s. This approach has a disadvantage: no clue about what version of avro4s we are compatible with. With current approach we can see that avro4s 2.x is supported, but not 3.x. If one day Spark migrates to Avro 1.9 (from currently used 1.8), the compatibility with avro4s 2.x might be broken => we can notify users about broken compatibility and migrate to avro4s 3.x. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai closed pull request #28577: [SPARK-31399][CORE][2.4] Support indylambda Scala closure in ClosureCleaner
dbtsai closed pull request #28577: URL: https://github.com/apache/spark/pull/28577 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai commented on pull request #28577: [SPARK-31399][CORE][2.4] Support indylambda Scala closure in ClosureCleaner
dbtsai commented on pull request #28577: URL: https://github.com/apache/spark/pull/28577#issuecomment-631034670 Merged into branch-2.4. Thank you, @rednaxelafx This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai commented on pull request #28577: [SPARK-31399][CORE][2.4] Support indylambda Scala closure in ClosureCleaner
dbtsai commented on pull request #28577: URL: https://github.com/apache/spark/pull/28577#issuecomment-631033792 All the tests are passing in Scala 2.12 with JDK8 and JDK11 builds. I also tested the code in the description, and it works as expected. - Scala 2.12 with JDK8 ``` Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.5.14-apple-SNAPSHOT /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_252) Type in expressions to have them evaluated. Type :help for more information. scala> :paste // Entering paste mode (ctrl-D to finish) class NotSerializableClass(val x: Int) val ns = new NotSerializableClass(42) val topLevelValue = "someValue" val func = (j: Int) => { (1 to j).flatMap { x => (1 to x).map { y => y + topLevelValue } } } // Exiting paste mode, now interpreting. defined class NotSerializableClass ns: NotSerializableClass = NotSerializableClass@2769d577 topLevelValue: String = someValue func: Int => scala.collection.immutable.IndexedSeq[String] = $Lambda$1751/481549862@25297d52 scala> sc.parallelize(0 to 2).map(func).collect res0: Array[scala.collection.immutable.IndexedSeq[String]] = Array(Vector(), Vector(1someValue), Vector(1someValue, 1someValue, 2someValue)) ``` - Scala 2.12 with JDK11 ``` Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.5.14-jdk11-apple-SNAPSHOT /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.7) Type in expressions to have them evaluated. Type :help for more information. scala> :paste // Entering paste mode (ctrl-D to finish) class NotSerializableClass(val x: Int) val ns = new NotSerializableClass(42) val topLevelValue = "someValue" val func = (j: Int) => { (1 to j).flatMap { x => (1 to x).map { y => y + topLevelValue } } } // Exiting paste mode, now interpreting. defined class NotSerializableClass ns: NotSerializableClass = NotSerializableClass@199f2854 topLevelValue: String = someValue func: Int => scala.collection.immutable.IndexedSeq[String] = $Lambda$1852/0x000800c2a040@5c9cbc69 scala> sc.parallelize(0 to 2).map(func).collect res0: Array[scala.collection.immutable.IndexedSeq[String]] = Array(Vector(), Vector(1someValue), Vector(1someValue, 1someValue, 2someValue)) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
dongjoon-hyun edited a comment on pull request #26804: URL: https://github.com/apache/spark/pull/26804#issuecomment-630997965 @h-vetinari . This is wrong, isn't it? Did someone (except you) say it's low priority here? We want new Parquet, but currently it looks infeasible technically. Do you think that all infeasible things are low priority? > I'm surprised (without criticism!) that this has a seemingly low priority This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
dongjoon-hyun commented on pull request #26804: URL: https://github.com/apache/spark/pull/26804#issuecomment-630997965 @h-vetinari . This is wrong, isn't it? Did someone (except you) say it's low priority here? We want that, but currently it looks infeasible technically. Do you think that all infeasible things are low priority? > I'm surprised (without criticism!) that this has a seemingly low priority This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai commented on pull request #28577: [SPARK-31399][CORE][2.4] Support indylambda Scala closure in ClosureCleaner
dbtsai commented on pull request #28577: URL: https://github.com/apache/spark/pull/28577#issuecomment-630986803 @rednaxelafx We have internal Spark 2.4 builds supporting JDK11, JDK8, Scala 2.12, and Scala 2.11. I just cherry-picked this PR, and ran the full tests. I'll update the result here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
igreenfield commented on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-630984555 @cloud-fan What the problem with these tests? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tdas edited a comment on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode
tdas edited a comment on pull request #28523: URL: https://github.com/apache/spark/pull/28523#issuecomment-630975138 Without going into the nitty-gritty arguments about output modes (requires a different venue), I am okay with the changes. I think there are two ways to move forward 1. Merge this to master and branch-3.0. 2. Merge this to branch-3.0 only I think the major advantage of 1 over 2 is that the branches will stay in sync for the time being which makes backporting of fixes etc much easier. Furthermore, we will restore the reliability of the unit tests because it is testing the same thing that runs in production. I think the only disadvantage of 1 over 2, is that for the time being, even if marked internal, we are adding update mode back to the public DSv2 API in master. Personally I think the advantage of 1 over 2 outweighs the disadvantage. The disadvantage is a relatively minor one because this can be always changed in master in a principled way after further discussion. Until then it's best to keep 3.0 and master in sync to minimize the impact on the Spark developer community which is a much larger community than the DSv2 developer community (who anyways should know the risks of depending on unreleased master and internal APIs for developing their DSv2 sources). Hence I propose merging this PR to unblock 3.0 and not have API regressions in it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tdas edited a comment on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode
tdas edited a comment on pull request #28523: URL: https://github.com/apache/spark/pull/28523#issuecomment-630975138 Without going into the nitty-gritty arguments about output modes (requires a different venue), I am okay with the changes. I think there are two ways to move forward 1. Merge this to master and branch-3.0. 2. Merge this to branch-3.0 only I think the major advantage of 1 over 2 is that the branches will stay in sync for the time being which makes backporting of fixes etc much easier. Furthermore, we will restore the reliability of the unit tests because it is testing the same thing that runs in production. I think the only disadvantage of 1 over 2, is that for the time being, even if marked internal, we are adding update mode back to the public DSv2 API in master. Personally I think the advantage of 1 over 2 outweighs the disadvantage. The disadvantage is a relatively minor one because this can be always changed in master in a principled way after further discussion. Until then it's best to keep 3.0 and master in sync to minimize the impact on the Spark developer community which is a much larger community than the DSv2 developer community (who anyways should know the risks of depending on unreleased master and internal APIs for developing their DSv2 sources). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tdas commented on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode
tdas commented on pull request #28523: URL: https://github.com/apache/spark/pull/28523#issuecomment-630975138 Without going into the nitty-gritty arguments about output modes (requires a different venue), I am okay with the changes. I think there are two ways to move forward 1. Merge this to master and branch-3.0. 2. Merge this to branch-3.0 only I think the major advantage of 1 over 2 is that the branches will stay in sync for the time being which makes backporting of fixes etc much easier. Furthermore, we will restore the reliability of the unit tests because it is testing the same thing that runs in production. I think the only disadvantage of 1 over 2, is that for the time being, even if marked internal, we are adding update mode back to the public DSv2 API in master. Personally I think the advantage of 1 over 2 outweighs the disadvantage. The disadvantage is a relatively minor one because this can be always changed in master in a principled way after further discussion. Until then it's best to keep 3.0 and master in sync to minimize the impact on the Spark developer community which is a much larger community than the DSv2 developer community (who anyways should know the risks of depending on unreleased master for developing their DSv2 sources). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28579: [SPARK-31757][CORE] Improve HistoryServerDiskManager.updateAccessTime()
AmplabJenkins removed a comment on pull request #28579: URL: https://github.com/apache/spark/pull/28579#issuecomment-630962041 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28579: [SPARK-31757][CORE] Improve HistoryServerDiskManager.updateAccessTime()
AmplabJenkins commented on pull request #28579: URL: https://github.com/apache/spark/pull/28579#issuecomment-630962041 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28579: [SPARK-31757][CORE] Improve HistoryServerDiskManager.updateAccessTime()
SparkQA commented on pull request #28579: URL: https://github.com/apache/spark/pull/28579#issuecomment-630961488 **[Test build #122853 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122853/testReport)** for PR 28579 at commit [`f212f33`](https://github.com/apache/spark/commit/f212f33e9fe7f38e490f35f61e1e67fd466e5949). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28576: [SPARK-31755][SQL] allow missing year/hour when parsing date/timestamp string
AmplabJenkins removed a comment on pull request #28576: URL: https://github.com/apache/spark/pull/28576#issuecomment-630958815 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122851/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28576: [SPARK-31755][SQL] allow missing year/hour when parsing date/timestamp string
AmplabJenkins removed a comment on pull request #28576: URL: https://github.com/apache/spark/pull/28576#issuecomment-630958801 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28576: [SPARK-31755][SQL] allow missing year/hour when parsing date/timestamp string
SparkQA commented on pull request #28576: URL: https://github.com/apache/spark/pull/28576#issuecomment-630958614 **[Test build #122851 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122851/testReport)** for PR 28576 at commit [`58443e2`](https://github.com/apache/spark/commit/58443e2d211a614c206ca16b657875209e5302e1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28576: [SPARK-31755][SQL] allow missing year/hour when parsing date/timestamp string
AmplabJenkins commented on pull request #28576: URL: https://github.com/apache/spark/pull/28576#issuecomment-630958801 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28576: [SPARK-31755][SQL] allow missing year/hour when parsing date/timestamp string
SparkQA removed a comment on pull request #28576: URL: https://github.com/apache/spark/pull/28576#issuecomment-630893465 **[Test build #122851 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122851/testReport)** for PR 28576 at commit [`58443e2`](https://github.com/apache/spark/commit/58443e2d211a614c206ca16b657875209e5302e1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on pull request #28579: [SPARK-31757][CORE] Improve HistoryServerDiskManager.updateAccessTime()
jiangxb1987 commented on pull request #28579: URL: https://github.com/apache/spark/pull/28579#issuecomment-630958421 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28581: KCL 2 support added to solve few ongoing issue with KCL 1 implementation
AmplabJenkins removed a comment on pull request #28581: URL: https://github.com/apache/spark/pull/28581#issuecomment-630942114 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] h-vetinari commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
h-vetinari commented on pull request #26804: URL: https://github.com/apache/spark/pull/26804#issuecomment-630945817 > @dongjoon-hyun: Please feel free to open a working PR. Then, the community will welcome. Sorry if my message came across as demanding. I'm not deeply involved in the community here (yet?), and neither in the respective code bases, but if someone as involved as @iemejia is stuck, I have little hope to make an impact in the current situation. The problem he outlines sounds like a very thorny issue that will need collaboration with other projects (HIVE, AVRO, PARQUET etc), and even knowing how OSS works, this seems like a problem on a scale that will require active maintainer involvement. So coming back to what I wrote: I'm surprised (without criticism!) that this has a seemingly low priority, and I hope someone can find a way forward. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28581: KCL 2 support added to solve few ongoing issue with KCL 1 implementation
AmplabJenkins commented on pull request #28581: URL: https://github.com/apache/spark/pull/28581#issuecomment-630945312 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28581: KCL 2 support added to solve few ongoing issue with KCL 1 implementation
AmplabJenkins commented on pull request #28581: URL: https://github.com/apache/spark/pull/28581#issuecomment-630942114 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tprabh509 opened a new pull request #28581: KCL 2 support added to solve few ongoing issue with KCL 1 implementation
tprabh509 opened a new pull request #28581: URL: https://github.com/apache/spark/pull/28581 ### What changes were proposed in this pull request? Currently KCL 1 is no longer supported by AWS and they already moved to KCL 2. Since Spark kinesis asl library user KCL 1 it can cause issues reported by AWS for KCL 1. https://github.com/awslabs/amazon-kinesis-client/issues/391 The issue I have already reported in JIRA https://issues.apache.org/jira/browse/SPARK-31236 ### Why are the changes needed? Added KCL 2 support. Did not remove KCL 1. With current KCL 1 implementation the user can run into few limitation 1. Application cannot use kinesis direct end. We can use custom URL with KCL 2 2. dynamoDB proxy cannot be used. With KCL 2 added support for dynamoDB proxy 3. cloud watch direct endpoint cannot be used. We can use custom URL Application cannot run without internet connection or firewall restrictions. ### Does this PR introduce _any_ user-facing change? Added KCL 2 support. Did not remove KCL 1. With current KCL 1 implementation the user can run into few limitation 1. Application cannot use kinesis direct end. 2. dynamoDB proxy cannot be used 3. cloud watch direct endpoint cannot be used Application cannot run without internet connection or firewall restrictions. ### How was this patch tested? Tested with our application and test client updated with KCL 2 testing. import org.apache.spark.streaming.kinesis2.KinesisInputDStream; import org.apache.spark.streaming.kinesis2.SparkAWSCredentials; SparkAWSCredentials credentials = SparkAWSCredentials.builder().basicCredentials(awsKey, awsSecret).build(); URI uri = new URI(endpointURL); URI cloudWatchURI = new URI(cloudWatchURL); InitialPositionInStream initPosition = InitialPositionInStream.TRIM_HORIZON; KinesisInputDStream kinStream =KinesisInputDStream.builder() .streamingContext(jssc) .checkpointAppName(applicationName) .streamName(streamName) .regionName(regionName) .endpointUrl(uri) .cloudWatchUrl(cloudWatchURI) .kinesisCreds(credentials) .dynamoDBCreds(credentials) .maxRecords(maxRecords) .protocol(httpProtocol) .initialPositionInStream(initPosition) .cloudWatchCreds(credentials) .dynamoProxyHost(proxyHost) .dynamoProxyPort(proxyPort) .checkpointInterval(checkpointInterval) .storageLevel(StorageLevel.MEMORY_AND_DISK_2()).build(); This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-630938158 Sounds good, happy to help coordinate with any reviews needed. Would like us to be able to start using this in 3.1 :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same
AmplabJenkins removed a comment on pull request #28511: URL: https://github.com/apache/spark/pull/28511#issuecomment-630936518 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the
AmplabJenkins commented on pull request #28511: URL: https://github.com/apache/spark/pull/28511#issuecomment-630936518 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table
SparkQA commented on pull request #28511: URL: https://github.com/apache/spark/pull/28511#issuecomment-630934499 **[Test build #122847 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122847/testReport)** for PR 28511 at commit [`78e0972`](https://github.com/apache/spark/commit/78e097284096b27839928668a3deaf6a49cab336). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as th
SparkQA removed a comment on pull request #28511: URL: https://github.com/apache/spark/pull/28511#issuecomment-630749618 **[Test build #122847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122847/testReport)** for PR 28511 at commit [`78e0972`](https://github.com/apache/spark/commit/78e097284096b27839928668a3deaf6a49cab336). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinooganesh commented on a change in pull request #28128: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener
vinooganesh commented on a change in pull request #28128: URL: https://github.com/apache/spark/pull/28128#discussion_r427422859 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -91,6 +91,7 @@ class SparkContext(config: SparkConf) extends Logging { val startTime = System.currentTimeMillis() private[spark] val stopped: AtomicBoolean = new AtomicBoolean(false) + private[spark] val sessionListenerRegistered: AtomicBoolean = new AtomicBoolean(false) Review comment: @cloud-fan - updated, how does this look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vganesh-veraset commented on a change in pull request #28128: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener
vganesh-veraset commented on a change in pull request #28128: URL: https://github.com/apache/spark/pull/28128#discussion_r427421050 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -91,6 +91,7 @@ class SparkContext(config: SparkConf) extends Logging { val startTime = System.currentTimeMillis() private[spark] val stopped: AtomicBoolean = new AtomicBoolean(false) + private[spark] val sessionListenerRegistered: AtomicBoolean = new AtomicBoolean(false) Review comment: @cloud-fan - updated. how does this look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iemejia commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
iemejia commented on pull request #26804: URL: https://github.com/apache/spark/pull/26804#issuecomment-630917979 @dongjoon-hyun You are absolutely right about no Hive with Avro 1.9 and that's the REAL problem. I don't think creating a PR that passes all UT (including Hive 1.2/2.3 profile) for Spark with Avro 1.9 is possible because Hive is leaking older versions of Avro that are not API compatible. I don't know how to deal with this. I tried to patch Hive [HIVE-21737](https://issues.apache.org/jira/browse/HIVE-21737) for this but was blocked on testing issues there, but the issue is also that even if merged we need them to backport the fix back to version 2.x (Hive in master is already in version 4.x). Notice that the Avro upgrade addresses also various security issues in its deps that are still leaking and present on Spark (yes jackson among others). I really want this to happen to get Avro 1.9.x downstream but it feels we are somehow locked because of Hive. If you or anyone can suggest how to do this, I will be more than glad to help with what I can. Also if someone knows someone at the Hive project who can care about this, maybe that would be another big help. CC: @kgyrtkirk for eventual comments/suggestions because he tried to help me in the Hive side. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
AmplabJenkins removed a comment on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630912501 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
AmplabJenkins commented on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630912501 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
SparkQA removed a comment on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630709892 **[Test build #122844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122844/testReport)** for PR 28572 at commit [`bc0bbec`](https://github.com/apache/spark/commit/bc0bbeca2350d47fe9531fe03c40af55e0f4ae2c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
SparkQA commented on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630911354 **[Test build #122844 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122844/testReport)** for PR 28572 at commit [`bc0bbec`](https://github.com/apache/spark/commit/bc0bbeca2350d47fe9531fe03c40af55e0f4ae2c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] bart-samwel commented on a change in pull request #28576: [SPARK-31755][SQL] allow missing year/hour when parsing date/timestamp string
bart-samwel commented on a change in pull request #28576: URL: https://github.com/apache/spark/pull/28576#discussion_r427401584 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala ## @@ -31,17 +31,50 @@ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy._ trait DateTimeFormatterHelper { + private def getOrDefault(accessor: TemporalAccessor, field: ChronoField, default: Int): Int = { +if (accessor.isSupported(field)) { + accessor.get(field) +} else { + default +} + } + + protected def toLocalDate(accessor: TemporalAccessor, allowMissingYear: Boolean): LocalDate = { +val year = if (accessor.isSupported(ChronoField.YEAR)) { + accessor.get(ChronoField.YEAR) +} else if (allowMissingYear) { + // To keep backward compatibility with Spark 2.x, we pick 1970 as the default value of year. + 1970 +} else { + throw new SparkUpgradeException("3.0", +"Year must be given in the date/timestamp string to be parsed. You can set " + Review comment: Maybe also suggest the alternative workaround, e.g. prepending `'1970 ' to the string-to-be-parsed and prepending `' '` to the format string. That one works without the legacy setting so I'd say it's preferred. ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala ## @@ -31,17 +31,50 @@ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy._ trait DateTimeFormatterHelper { + private def getOrDefault(accessor: TemporalAccessor, field: ChronoField, default: Int): Int = { +if (accessor.isSupported(field)) { + accessor.get(field) +} else { + default +} + } + + protected def toLocalDate(accessor: TemporalAccessor, allowMissingYear: Boolean): LocalDate = { +val year = if (accessor.isSupported(ChronoField.YEAR)) { + accessor.get(ChronoField.YEAR) +} else if (allowMissingYear) { + // To keep backward compatibility with Spark 2.x, we pick 1970 as the default value of year. + 1970 +} else { + throw new SparkUpgradeException("3.0", +"Year must be given in the date/timestamp string to be parsed. You can set " + + SQLConf.LEGACY_ALLOW_MISSING_YEAR_DURING_PARSING.key + " to true, to pick 1970 as " + + "the default value of year.", null) +} +val month = getOrDefault(accessor, ChronoField.MONTH_OF_YEAR, 1) Review comment: Are we also going to error out if they specify the day but not the month? Really, the only formats that make sense are the ones where a full prefix is given in the y-m-d h-m-s sequence, and all others are likely to be a case where they made a mistake (e.g. asked for "mm" twice where they meant MM). ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2586,6 +2586,15 @@ object SQLConf { .checkValue(_ > 0, "The timeout value must be positive") .createWithDefault(10L) + val LEGACY_ALLOW_MISSING_YEAR_DURING_PARSING = +buildConf("spark.sql.legacy.allowMissingYearDuringParsing") + .internal() + .doc("When true, DateFormatter/TimestampFormatter allows parsing date/timestamp string " + Review comment: Here too, you could suggest the alternative workaround that doesn't require setting the legacy flag. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28540: [SPARK-31719][SQL] Refactor JoinSelection
AmplabJenkins removed a comment on pull request #28540: URL: https://github.com/apache/spark/pull/28540#issuecomment-630905053 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122843/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
dongjoon-hyun edited a comment on pull request #26804: URL: https://github.com/apache/spark/pull/26804#issuecomment-630904663 @h-vetinari . Parquet is a de-facto standard in Apache Spark and is related to all the other module. That's the reason why Parquet should not break anything in all the other Spark modules. It's the same for the other libraries. Apache Spark uses Apache Hadoop 2.7.3/2.7.4 for a long time and still it's the default Hadoop. Apache Spark uses unofficial Hive 1.2.1 fork for a long time and still couldn't remove it. Please feel free to open a working PR. Then, the community will welcome. BTW, we are in Apache Spark community. For the other community issues, please ping them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28540: [SPARK-31719][SQL] Refactor JoinSelection
AmplabJenkins removed a comment on pull request #28540: URL: https://github.com/apache/spark/pull/28540#issuecomment-630905040 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28540: [SPARK-31719][SQL] Refactor JoinSelection
AmplabJenkins commented on pull request #28540: URL: https://github.com/apache/spark/pull/28540#issuecomment-630905040 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28540: [SPARK-31719][SQL] Refactor JoinSelection
SparkQA removed a comment on pull request #28540: URL: https://github.com/apache/spark/pull/28540#issuecomment-630686085 **[Test build #122843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122843/testReport)** for PR 28540 at commit [`10da76a`](https://github.com/apache/spark/commit/10da76adc4710e8035cd302ddcd168707341d168). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
AmplabJenkins removed a comment on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630903846 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #26804: [SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0
dongjoon-hyun commented on pull request #26804: URL: https://github.com/apache/spark/pull/26804#issuecomment-630904663 @h-vetinari . Parquet is a de-facto standard in Apache Spark and is related to all the other module. That's the reason why Parquet should not break anything in all the other Spark modules. It's the same for the other libraries. Apache Spark uses Apache Hadoop 2.7.3/2.7.4 for a long time and still it's the default Hadoop. Apache Spark uses unofficial Hive 1.2.1 fork for a long time and still couldn't remove it. Please feel free to open a working PR. Then, the community will welcome. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28540: [SPARK-31719][SQL] Refactor JoinSelection
SparkQA commented on pull request #28540: URL: https://github.com/apache/spark/pull/28540#issuecomment-630904087 **[Test build #122843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122843/testReport)** for PR 28540 at commit [`10da76a`](https://github.com/apache/spark/commit/10da76adc4710e8035cd302ddcd168707341d168). * This patch **fails from timeout after a configured wait of `400m`**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
AmplabJenkins commented on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630903846 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
SparkQA commented on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630903093 **[Test build #122852 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122852/testReport)** for PR 28572 at commit [`e7664a1`](https://github.com/apache/spark/commit/e7664a11b7c6f14df0132e25316a1878792963c6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
AmplabJenkins removed a comment on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-630901505 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122849/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
AmplabJenkins removed a comment on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-630901494 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
AmplabJenkins commented on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-630901494 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
Ngone51 commented on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630901730 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
SparkQA removed a comment on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-630796651 **[Test build #122849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122849/testReport)** for PR 26624 at commit [`d5c1aa9`](https://github.com/apache/spark/commit/d5c1aa97bd69a25de5de03ec79284f39dace3198). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
SparkQA commented on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-630900818 **[Test build #122849 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122849/testReport)** for PR 26624 at commit [`d5c1aa9`](https://github.com/apache/spark/commit/d5c1aa97bd69a25de5de03ec79284f39dace3198). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on pull request #28577: [SPARK-31399][CORE][2.4] Support indylambda Scala closure in ClosureCleaner
holdenk commented on pull request #28577: URL: https://github.com/apache/spark/pull/28577#issuecomment-630896567 Thank you for backporting this. I’m off today and tomorrow for health reasons, but if no one has time to review by Thursday I’ll take a look :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28576: [SPARK-31755][SQL] allow missing year/hour when parsing date/timestamp string
AmplabJenkins removed a comment on pull request #28576: URL: https://github.com/apache/spark/pull/28576#issuecomment-630894202 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28576: [SPARK-31755][SQL] allow missing year/hour when parsing date/timestamp string
AmplabJenkins commented on pull request #28576: URL: https://github.com/apache/spark/pull/28576#issuecomment-630894202 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28576: [SPARK-31755][SQL] allow missing year/hour when parsing date/timestamp string
SparkQA commented on pull request #28576: URL: https://github.com/apache/spark/pull/28576#issuecomment-630893465 **[Test build #122851 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122851/testReport)** for PR 28576 at commit [`58443e2`](https://github.com/apache/spark/commit/58443e2d211a614c206ca16b657875209e5302e1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28580: [SPARK-31759][Deploy] Support configurable max number of rotate logs for spark daemons
AmplabJenkins removed a comment on pull request #28580: URL: https://github.com/apache/spark/pull/28580#issuecomment-630877166 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
AmplabJenkins removed a comment on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630876764 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122842/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28580: [SPARK-31759][Deploy] Support configurable max number of rotate logs for spark daemons
AmplabJenkins commented on pull request #28580: URL: https://github.com/apache/spark/pull/28580#issuecomment-630877166 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
AmplabJenkins removed a comment on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630876740 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
AmplabJenkins commented on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630876740 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28580: [SPARK-31759][Deploy] Support configurable max number of rotate logs for spark daemons
SparkQA removed a comment on pull request #28580: URL: https://github.com/apache/spark/pull/28580#issuecomment-630720005 **[Test build #122845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122845/testReport)** for PR 28580 at commit [`311d0f3`](https://github.com/apache/spark/commit/311d0f389452973c6fc8b7cfa332bc0a2210c84d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28580: [SPARK-31759][Deploy] Support configurable max number of rotate logs for spark daemons
SparkQA commented on pull request #28580: URL: https://github.com/apache/spark/pull/28580#issuecomment-630875695 **[Test build #122845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122845/testReport)** for PR 28580 at commit [`311d0f3`](https://github.com/apache/spark/commit/311d0f389452973c6fc8b7cfa332bc0a2210c84d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
SparkQA removed a comment on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630662267 **[Test build #122842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122842/testReport)** for PR 28572 at commit [`6b70e77`](https://github.com/apache/spark/commit/6b70e778266b90bdd8ab5b3190c07168f4a12caf). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
SparkQA commented on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630875386 **[Test build #122842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122842/testReport)** for PR 28572 at commit [`6b70e77`](https://github.com/apache/spark/commit/6b70e778266b90bdd8ab5b3190c07168f4a12caf). * This patch **fails from timeout after a configured wait of `400m`**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28577: [SPARK-31399][CORE][2.4] Support indylambda Scala closure in ClosureCleaner
dongjoon-hyun commented on pull request #28577: URL: https://github.com/apache/spark/pull/28577#issuecomment-630871278 Thank you, @rednaxelafx . cc @holdenk since she is a release manager of Apache Spark 2.4.6. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen closed pull request #27525: [MINOR] update dstream.py with more accurate exceptions
srowen closed pull request #27525: URL: https://github.com/apache/spark/pull/27525 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #27525: [MINOR] update dstream.py with more accurate exceptions
srowen commented on pull request #27525: URL: https://github.com/apache/spark/pull/27525#issuecomment-630868876 Reopen if you update this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL]TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer
AmplabJenkins removed a comment on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-630858130 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122848/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL]TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer
AmplabJenkins removed a comment on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-630858115 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28534: [SPARK-31710][SQL]TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer
SparkQA removed a comment on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-630789294 **[Test build #122848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122848/testReport)** for PR 28534 at commit [`0355ce4`](https://github.com/apache/spark/commit/0355ce4f788cde24c82473741af339d47095bdfc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL]TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer
AmplabJenkins commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-630858115 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL]TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer
SparkQA commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-630857909 **[Test build #122848 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122848/testReport)** for PR 28534 at commit [`0355ce4`](https://github.com/apache/spark/commit/0355ce4f788cde24c82473741af339d47095bdfc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SecondsToTimestamp(child: Expression)` * `case class MilliSecondsToTimestamp(child: Expression)` * `case class MicroSecondsToTimestamp(child: Expression)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28576: [SPARK-31755][SQL] allow missing year/hour when parsing date/timestamp string
cloud-fan commented on a change in pull request #28576: URL: https://github.com/apache/spark/pull/28576#discussion_r427337528 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/util/TimestampFormatterSuite.scala ## @@ -291,4 +291,95 @@ class TimestampFormatterSuite extends SparkFunSuite with SQLHelper with Matchers } } } + + test("parsing hour with various patterns") { +def createFormatter(pattern: String): TimestampFormatter = { + // Use `SIMPLE_DATE_FORMAT`, so that the legacy parser also fails with invalid value range. + TimestampFormatter(pattern, ZoneOffset.UTC, LegacyDateFormats.SIMPLE_DATE_FORMAT, false) +} + +withClue("HH") { + val formatter = createFormatter("-MM-dd HH") + + val micros1 = formatter.parse("2009-12-12 00") + assert(micros1 === TimeUnit.SECONDS.toMicros( +LocalDateTime.of(2009, 12, 12, 0, 0, 0).toEpochSecond(ZoneOffset.UTC))) + + val micros2 = formatter.parse("2009-12-12 15") + assert(micros2 === TimeUnit.SECONDS.toMicros( +LocalDateTime.of(2009, 12, 12, 15, 0, 0).toEpochSecond(ZoneOffset.UTC))) Review comment: It's UT, which is whitebox testing. It's clear from the code that the time parsing is unrelated to the value of year This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28576: [SPARK-31755][SQL] allow missing year/hour when parsing date/timestamp string
cloud-fan commented on a change in pull request #28576: URL: https://github.com/apache/spark/pull/28576#discussion_r427054501 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala ## @@ -72,18 +94,14 @@ trait DateTimeFormatterHelper { // DateTimeParseException will address by the caller side. protected def checkDiffResult[T]( s: String, legacyParseFunc: String => T): PartialFunction[Throwable, T] = { -case e: DateTimeParseException if SQLConf.get.legacyTimeParserPolicy == EXCEPTION => - val res = try { -Some(legacyParseFunc(s)) - } catch { -case _: Throwable => None - } - if (res.nonEmpty) { +case e: DateTimeException if SQLConf.get.legacyTimeParserPolicy == EXCEPTION => Review comment: when the field value exceeds the valid range, the JDK throws `DateTimeException, we should catch it. `DateTimeParseException` extends `DateTimeException` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28576: [SPARK-31755][SQL] allow missing year/hour when parsing date/timestamp string
cloud-fan commented on a change in pull request #28576: URL: https://github.com/apache/spark/pull/28576#discussion_r427054501 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala ## @@ -72,18 +94,14 @@ trait DateTimeFormatterHelper { // DateTimeParseException will address by the caller side. protected def checkDiffResult[T]( s: String, legacyParseFunc: String => T): PartialFunction[Throwable, T] = { -case e: DateTimeParseException if SQLConf.get.legacyTimeParserPolicy == EXCEPTION => - val res = try { -Some(legacyParseFunc(s)) - } catch { -case _: Throwable => None - } - if (res.nonEmpty) { +case e: DateTimeException if SQLConf.get.legacyTimeParserPolicy == EXCEPTION => Review comment: when the field value exceeds the valid range, the JDK throws `DateTimeException`, and we should catch it too. `DateTimeParseException` extends `DateTimeException` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28576: [SPARK-31755][SQL] allow missing year/hour when parsing date/timestamp string
cloud-fan commented on a change in pull request #28576: URL: https://github.com/apache/spark/pull/28576#discussion_r427333877 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala ## @@ -31,17 +31,39 @@ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy._ trait DateTimeFormatterHelper { + private def getOrDefault(accessor: TemporalAccessor, field: ChronoField, default: Int): Int = { +if (accessor.isSupported(field)) { + accessor.get(field) +} else { + default +} + } + + protected def toLocalDate(temporalAccessor: TemporalAccessor): LocalDate = { +val year = getOrDefault(temporalAccessor, ChronoField.YEAR, 1970) Review comment: `TemporalAccessor.get` is not getting the field, it's actually a query. So for ` G`, we can still query the `YEAR` field because both `YEAR_OF_ERA` and `ERA` are available. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table
cloud-fan closed pull request #28511: URL: https://github.com/apache/spark/pull/28511 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the tabl
cloud-fan commented on pull request #28511: URL: https://github.com/apache/spark/pull/28511#issuecomment-630843338 This fixes a bug for a corner case, when table and partition locations are in different file systems. I'm merging it to master only, to reduce risk. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
AmplabJenkins removed a comment on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630838249 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122838/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
AmplabJenkins removed a comment on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630838241 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
AmplabJenkins commented on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630838241 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
SparkQA removed a comment on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630633507 **[Test build #122838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122838/testReport)** for PR 28572 at commit [`8fe0490`](https://github.com/apache/spark/commit/8fe049068e7a52235afb79c97db4da6492a4a22a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
SparkQA commented on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630837013 **[Test build #122838 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122838/testReport)** for PR 28572 at commit [`8fe0490`](https://github.com/apache/spark/commit/8fe049068e7a52235afb79c97db4da6492a4a22a). * This patch **fails from timeout after a configured wait of `400m`**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
AmplabJenkins removed a comment on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630826358 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
AmplabJenkins commented on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630826358 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
SparkQA commented on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-630825781 **[Test build #122850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122850/testReport)** for PR 28572 at commit [`e7664a1`](https://github.com/apache/spark/commit/e7664a11b7c6f14df0132e25316a1878792963c6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
Ngone51 commented on a change in pull request #28572: URL: https://github.com/apache/spark/pull/28572#discussion_r427308224 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ## @@ -2439,6 +2439,17 @@ class DataFrameSuite extends QueryTest val nestedDecArray = Array(decSpark) checkAnswer(Seq(nestedDecArray).toDF(), Row(Array(wrapRefArray(decJava } + + test("SPARK-31750: eliminate UpCast if child's dataType is DecimalType") { +withTempPath { f => + sql("select cast(11 as decimal(38, 0)) as d") Review comment: Yes, I've changed it to `1` to simplify the test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28576: [SPARK-31755][SQL] allow missing year/hour when parsing date/timestamp string
cloud-fan commented on pull request #28576: URL: https://github.com/apache/spark/pull/28576#issuecomment-630820332 > The reason being that it is not a leap year, which means that it would never parse Feb 29. This is a good point. Now I agree 1970 is not a good default. I'll fail it by default, with a legacy config to use 1970 as default. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
cloud-fan commented on a change in pull request #28572: URL: https://github.com/apache/spark/pull/28572#discussion_r427300529 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ## @@ -2439,6 +2439,17 @@ class DataFrameSuite extends QueryTest val nestedDecArray = Array(decSpark) checkAnswer(Seq(nestedDecArray).toDF(), Row(Array(wrapRefArray(decJava } + + test("SPARK-31750: eliminate UpCast if child's dataType is DecimalType") { +withTempPath { f => + sql("select cast(11 as decimal(38, 0)) as d") Review comment: this test can still reproduce the bug even if we use `1` instead of `...`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same
AmplabJenkins removed a comment on pull request #28511: URL: https://github.com/apache/spark/pull/28511#issuecomment-630805536 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the
AmplabJenkins commented on pull request #28511: URL: https://github.com/apache/spark/pull/28511#issuecomment-630805536 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28534: [SPARK-31710][SQL]TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer
cloud-fan commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r427286816 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ## @@ -3495,6 +3495,28 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark assert(df4.schema.head.name === "randn(1)") checkIfSeedExistsInExplain(df2) } + + test("SPARK-31710: " + Review comment: let's move the tests to `datetime.sql` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28534: [SPARK-31710][SQL]TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer
cloud-fan commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r427286362 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -401,6 +401,92 @@ case class DayOfYear(child: Expression) extends UnaryExpression with ImplicitCas } } +@ExpressionDescription( + usage = "_FUNC_(date) - Returns timestamp from seconds.", + examples = """ +Examples: + > SELECT _FUNC_(1230219000); + "2008-12-25 07:30:00.0" + """, + group = "datetime_funcs", + since = "3.1.0") +case class SecondsToTimestamp(child: Expression) + extends NumberToTimestampBase { + + override def upScaleFactor: SQLTimestamp = MICROS_PER_SECOND + + override def prettyName: String = "timestamp_seconds" +} + +@ExpressionDescription( + usage = "_FUNC_(date) - Returns timestamp from milliseconds.", + examples = """ +Examples: + > SELECT _FUNC_(123021900); + "2008-12-25 07:30:00.0" + """, + group = "datetime_funcs", + since = "3.1.0") +case class MilliSecondsToTimestamp(child: Expression) + extends NumberToTimestampBase { + + override def upScaleFactor: SQLTimestamp = MICROS_PER_MILLIS + + override def prettyName: String = "timestamp_milliseconds" +} + +@ExpressionDescription( + usage = "_FUNC_(date) - Returns timestamp from microseconds.", + examples = """ +Examples: + > SELECT _FUNC_(12302190); + "2008-12-25 07:30:00.0" + """, + group = "datetime_funcs", + since = "3.1.0") +case class MicroSecondsToTimestamp(child: Expression) + extends NumberToTimestampBase { + + override def upScaleFactor: SQLTimestamp = 1L + + override def prettyName: String = "timestamp_microseconds" +} + +abstract class NumberToTimestampBase extends UnaryExpression + with ImplicitCastInputTypes{ + + protected def upScaleFactor: Long + + override def inputTypes: Seq[AbstractDataType] = Seq(LongType, IntegerType) + + override def dataType: DataType = TimestampType + + override def eval(input: InternalRow): Any = { Review comment: we can override `nullSafeEval`, to skip the null check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table
SparkQA commented on pull request #28511: URL: https://github.com/apache/spark/pull/28511#issuecomment-630803839 **[Test build #122836 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122836/testReport)** for PR 28511 at commit [`78e0972`](https://github.com/apache/spark/commit/78e097284096b27839928668a3deaf6a49cab336). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org