[GitHub] [spark] AmplabJenkins removed a comment on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
AmplabJenkins removed a comment on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629982682 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122785/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
AmplabJenkins commented on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629982670 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
SparkQA commented on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629982536 **[Test build #122785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122785/testReport)** for PR 27066 at commit [`c99c086`](https://github.com/apache/spark/commit/c99c086a2796aef8727328d15f13f9ecb0dc2977). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
SparkQA removed a comment on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629943931 **[Test build #122785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122785/testReport)** for PR 27066 at commit [`c99c086`](https://github.com/apache/spark/commit/c99c086a2796aef8727328d15f13f9ecb0dc2977). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
AmplabJenkins removed a comment on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629982670 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Fokko commented on a change in pull request #28554: [SPARK-31735][CORE] Include date/timestamp in the summary report
Fokko commented on a change in pull request #28554: URL: https://github.com/apache/spark/pull/28554#discussion_r426402521 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala ## @@ -264,7 +264,10 @@ object StatFunctions extends Logging { } val selectedCols = ds.logicalPlan.output - .filter(a => a.dataType.isInstanceOf[NumericType] || a.dataType.isInstanceOf[StringType]) + .filter(a => a.dataType.isInstanceOf[NumericType] +|| a.dataType.isInstanceOf[StringType] +|| a.dataType.isInstanceOf[DateType] Review comment: I'm working on getting the test suite running on my machine, so I need some time. I don't think that the `mean` will be the issue, this is just the element in the middle of the sorted collection, however, the `stddev` will be tricky. For the StringType this is just `null`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28208: [SPARK-31440][SQL] Improve SQL Rest API
AmplabJenkins removed a comment on pull request #28208: URL: https://github.com/apache/spark/pull/28208#issuecomment-629978159 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28208: [SPARK-31440][SQL] Improve SQL Rest API
AmplabJenkins commented on pull request #28208: URL: https://github.com/apache/spark/pull/28208#issuecomment-629978159 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned
Ngone51 commented on a change in pull request #28370: URL: https://github.com/apache/spark/pull/28370#discussion_r426398285 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ## @@ -1829,7 +1901,58 @@ private[spark] class BlockManager( data.dispose() } + /** + * Class to handle block manager decommissioning retries + * It creates a Thread to retry offloading all RDD cache blocks + */ + private class BlockManagerDecommissionManager(conf: SparkConf) { +@volatile private var stopped = false +private val sleepInterval = conf.get( + config.STORAGE_DECOMMISSION_REPLICATION_REATTEMPT_INTERVAL) + +private val blockReplicationThread = new Thread { + override def run(): Unit = { +var failures = 0 +while (blockManagerDecommissioning + && !stopped + && !Thread.interrupted() Review comment: I believe in normal cases like `wait`, `sleep`, the status will be cleared according to JDK doc: ``` * @throws InterruptedException * if any thread has interrupted the current thread. The * interrupted status of the current thread is * cleared when this exception is thrown. ``` And even if the status is not cleared in other cases, the following `Thread.sleep(sleepInterval)` will throw `InterruptedException` firstly and set `stopped` to true and `Thread.interrupted()` still does not take effect. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned
Ngone51 commented on a change in pull request #28370: URL: https://github.com/apache/spark/pull/28370#discussion_r426398285 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ## @@ -1829,7 +1901,58 @@ private[spark] class BlockManager( data.dispose() } + /** + * Class to handle block manager decommissioning retries + * It creates a Thread to retry offloading all RDD cache blocks + */ + private class BlockManagerDecommissionManager(conf: SparkConf) { +@volatile private var stopped = false +private val sleepInterval = conf.get( + config.STORAGE_DECOMMISSION_REPLICATION_REATTEMPT_INTERVAL) + +private val blockReplicationThread = new Thread { + override def run(): Unit = { +var failures = 0 +while (blockManagerDecommissioning + && !stopped + && !Thread.interrupted() Review comment: I believe in normal cases like `wait`, `sleep`, the status will be cleared according to JDK doc: ``` * @throws InterruptedException * if any thread has interrupted the current thread. The * interrupted status of the current thread is * cleared when this exception is thrown. ``` And even if the status is not be cleared in other cases, the following `Thread.sleep(sleepInterval)` will throw `InterruptedException` firstly and set `stopped` to true and `Thread.interrupted()` still does not take effect. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28208: [SPARK-31440][SQL] Improve SQL Rest API
SparkQA removed a comment on pull request #28208: URL: https://github.com/apache/spark/pull/28208#issuecomment-629890268 **[Test build #122771 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122771/testReport)** for PR 28208 at commit [`2f9522f`](https://github.com/apache/spark/commit/2f9522f4da807260b133085dd70011785f7823c5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28208: [SPARK-31440][SQL] Improve SQL Rest API
SparkQA commented on pull request #28208: URL: https://github.com/apache/spark/pull/28208#issuecomment-629977464 **[Test build #122771 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122771/testReport)** for PR 28208 at commit [`2f9522f`](https://github.com/apache/spark/commit/2f9522f4da807260b133085dd70011785f7823c5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same
yaooqinn commented on a change in pull request #28511: URL: https://github.com/apache/spark/pull/28511#discussion_r426394163 ## File path: sql/hive/benchmarks/InsertIntoHiveTableBenchmark-results.txt ## @@ -0,0 +1,11 @@ +Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.4 +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz +insert hive table benchmark: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +INSERT INTO DYNAMIC7346 7470 175 0.0 717423.0 1.0X +INSERT INTO HYBRID 1179 1188 13 0.0 115184.2 6.2X +INSERT INTO STATIC 344367 48 0.0 33585.1 21.4X +INSERT OVERWRITE DYNAMIC 7656 7714 82 0.0 747622.7 1.0X +INSERT OVERWRITE HYBRID1179 1183 6 0.0 115163.3 6.2X +INSERT OVERWRITE STATIC 400408 10 0.0 39014.2 18.4X Review comment: ```sql -INSERT INTO DYNAMIC7742 7918 248 0.0 756044.0 1.0X -INSERT INTO HYBRID 1289 1307 26 0.0 125866.3 6.0X -INSERT INTO STATIC 371393 38 0.0 36219.4 20.9X -INSERT OVERWRITE DYNAMIC 8456 8554 138 0.0 825790.3 0.9X -INSERT OVERWRITE HYBRID1303 1311 12 0.0 127198.4 5.9X -INSERT OVERWRITE STATIC 434447 13 0.0 42373.8 17.8X +INSERT INTO DYNAMIC7382 7456 105 0.0 720904.8 1.0X +INSERT INTO HYBRID 1128 1129 1 0.0 110169.4 6.5X +INSERT INTO STATIC 349370 39 0.0 34095.4 21.1X +INSERT OVERWRITE DYNAMIC 8149 8362 301 0.0 795821.8 0.9X +INSERT OVERWRITE HYBRID1317 1318 2 0.0 128616.7 5.6X +INSERT OVERWRITE STATIC 387408 37 0.0 37804.1 19.1X ``` + for master - for this PR both using hive 2.3.7 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same
yaooqinn commented on a change in pull request #28511: URL: https://github.com/apache/spark/pull/28511#discussion_r426394163 ## File path: sql/hive/benchmarks/InsertIntoHiveTableBenchmark-results.txt ## @@ -0,0 +1,11 @@ +Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.4 +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz +insert hive table benchmark: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +INSERT INTO DYNAMIC7346 7470 175 0.0 717423.0 1.0X +INSERT INTO HYBRID 1179 1188 13 0.0 115184.2 6.2X +INSERT INTO STATIC 344367 48 0.0 33585.1 21.4X +INSERT OVERWRITE DYNAMIC 7656 7714 82 0.0 747622.7 1.0X +INSERT OVERWRITE HYBRID1179 1183 6 0.0 115163.3 6.2X +INSERT OVERWRITE STATIC 400408 10 0.0 39014.2 18.4X Review comment: ```sql -INSERT INTO DYNAMIC7742 7918 248 0.0 756044.0 1.0X -INSERT INTO HYBRID 1289 1307 26 0.0 125866.3 6.0X -INSERT INTO STATIC 371393 38 0.0 36219.4 20.9X -INSERT OVERWRITE DYNAMIC 8456 8554 138 0.0 825790.3 0.9X -INSERT OVERWRITE HYBRID1303 1311 12 0.0 127198.4 5.9X -INSERT OVERWRITE STATIC 434447 13 0.0 42373.8 17.8X +INSERT INTO DYNAMIC7382 7456 105 0.0 720904.8 1.0X +INSERT INTO HYBRID 1128 1129 1 0.0 110169.4 6.5X +INSERT INTO STATIC 349370 39 0.0 34095.4 21.1X +INSERT OVERWRITE DYNAMIC 8149 8362 301 0.0 795821.8 0.9X +INSERT OVERWRITE HYBRID1317 1318 2 0.0 128616.7 5.6X +INSERT OVERWRITE STATIC 387408 37 0.0 37804.1 19.1X ``` *+ for master *- for this PR both using hive 2.3.7 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same
yaooqinn commented on a change in pull request #28511: URL: https://github.com/apache/spark/pull/28511#discussion_r426394163 ## File path: sql/hive/benchmarks/InsertIntoHiveTableBenchmark-results.txt ## @@ -0,0 +1,11 @@ +Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.4 +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz +insert hive table benchmark: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +INSERT INTO DYNAMIC7346 7470 175 0.0 717423.0 1.0X +INSERT INTO HYBRID 1179 1188 13 0.0 115184.2 6.2X +INSERT INTO STATIC 344367 48 0.0 33585.1 21.4X +INSERT OVERWRITE DYNAMIC 7656 7714 82 0.0 747622.7 1.0X +INSERT OVERWRITE HYBRID1179 1183 6 0.0 115163.3 6.2X +INSERT OVERWRITE STATIC 400408 10 0.0 39014.2 18.4X Review comment: ```sql -INSERT INTO DYNAMIC7742 7918 248 0.0 756044.0 1.0X -INSERT INTO HYBRID 1289 1307 26 0.0 125866.3 6.0X -INSERT INTO STATIC 371393 38 0.0 36219.4 20.9X -INSERT OVERWRITE DYNAMIC 8456 8554 138 0.0 825790.3 0.9X -INSERT OVERWRITE HYBRID1303 1311 12 0.0 127198.4 5.9X -INSERT OVERWRITE STATIC 434447 13 0.0 42373.8 17.8X +INSERT INTO DYNAMIC7382 7456 105 0.0 720904.8 1.0X +INSERT INTO HYBRID 1128 1129 1 0.0 110169.4 6.5X +INSERT INTO STATIC 349370 39 0.0 34095.4 21.1X +INSERT OVERWRITE DYNAMIC 8149 8362 301 0.0 795821.8 0.9X +INSERT OVERWRITE HYBRID1317 1318 2 0.0 128616.7 5.6X +INSERT OVERWRITE STATIC 387408 37 0.0 37804.1 19.1X ``` - + for master - - for this PR both using hive 2.3.7 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL]Fix millisecond and microsecond convert to timestamp in to_timestamp
AmplabJenkins removed a comment on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-629976971 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL]Fix millisecond and microsecond convert to timestamp in to_timestamp
AmplabJenkins commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-629976971 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL]Fix millisecond and microsecond convert to timestamp in to_timestamp
SparkQA commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-629976178 **[Test build #122765 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122765/testReport)** for PR 28534 at commit [`7d562f4`](https://github.com/apache/spark/commit/7d562f4810608100e7b852b55d3d1c49690512ae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28534: [SPARK-31710][SQL]Fix millisecond and microsecond convert to timestamp in to_timestamp
SparkQA removed a comment on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-629885392 **[Test build #122765 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122765/testReport)** for PR 28534 at commit [`7d562f4`](https://github.com/apache/spark/commit/7d562f4810608100e7b852b55d3d1c49690512ae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28560: [SPARK-27217][SQL] Nested column aliasing for more operators which can prune nested column
viirya commented on a change in pull request #28560: URL: https://github.com/apache/spark/pull/28560#discussion_r426391779 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -68,10 +76,23 @@ object NestedColumnAliasing { */ def replaceChildrenWithAliases( plan: LogicalPlan, + nestedFieldToAlias: Map[ExtractValue, Alias], attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = { plan.withNewChildren(plan.children.map { plan => Project(plan.output.flatMap(a => attrToAliases.getOrElse(a.exprId, Seq(a))), plan) -}) +}).transformExpressions { + case f: ExtractValue if nestedFieldToAlias.contains(f) => +nestedFieldToAlias(f).toAttribute +} + } + + /** + * Returns true for those operators that we can prune nested column on it. + */ + private def canPruneOn(plan: LogicalPlan) = plan match { +case _: Aggregate => true +case _: Expand => true +case _ => false Review comment: I think I was wrong. Re-checking `FlatMapGroupsInPandas`'s Python API, it looks like ```python df.groupby("id").apply(udf).show() ``` So basically the python udf takes no nested column selection but a full columns of DataFrame. It doesn't do nested column pruning. `MapInPandas` is also the same. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same
yaooqinn commented on a change in pull request #28511: URL: https://github.com/apache/spark/pull/28511#discussion_r426394163 ## File path: sql/hive/benchmarks/InsertIntoHiveTableBenchmark-results.txt ## @@ -0,0 +1,11 @@ +Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.4 +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz +insert hive table benchmark: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +INSERT INTO DYNAMIC7346 7470 175 0.0 717423.0 1.0X +INSERT INTO HYBRID 1179 1188 13 0.0 115184.2 6.2X +INSERT INTO STATIC 344367 48 0.0 33585.1 21.4X +INSERT OVERWRITE DYNAMIC 7656 7714 82 0.0 747622.7 1.0X +INSERT OVERWRITE HYBRID1179 1183 6 0.0 115163.3 6.2X +INSERT OVERWRITE STATIC 400408 10 0.0 39014.2 18.4X Review comment: ```sql -INSERT INTO DYNAMIC7742 7918 248 0.0 756044.0 1.0X -INSERT INTO HYBRID 1289 1307 26 0.0 125866.3 6.0X -INSERT INTO STATIC 371393 38 0.0 36219.4 20.9X -INSERT OVERWRITE DYNAMIC 8456 8554 138 0.0 825790.3 0.9X -INSERT OVERWRITE HYBRID1303 1311 12 0.0 127198.4 5.9X -INSERT OVERWRITE STATIC 434447 13 0.0 42373.8 17.8X +INSERT INTO DYNAMIC7382 7456 105 0.0 720904.8 1.0X +INSERT INTO HYBRID 1128 1129 1 0.0 110169.4 6.5X +INSERT INTO STATIC 349370 39 0.0 34095.4 21.1X +INSERT OVERWRITE DYNAMIC 8149 8362 301 0.0 795821.8 0.9X +INSERT OVERWRITE HYBRID1317 1318 2 0.0 128616.7 5.6X +INSERT OVERWRITE STATIC 387408 37 0.0 37804.1 19.1X ``` + for master - for this PR both using hive 2.3.7 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28560: [SPARK-27217][SQL] Nested column aliasing for more operators which can prune nested column
viirya commented on a change in pull request #28560: URL: https://github.com/apache/spark/pull/28560#discussion_r426391779 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -68,10 +76,23 @@ object NestedColumnAliasing { */ def replaceChildrenWithAliases( plan: LogicalPlan, + nestedFieldToAlias: Map[ExtractValue, Alias], attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = { plan.withNewChildren(plan.children.map { plan => Project(plan.output.flatMap(a => attrToAliases.getOrElse(a.exprId, Seq(a))), plan) -}) +}).transformExpressions { + case f: ExtractValue if nestedFieldToAlias.contains(f) => +nestedFieldToAlias(f).toAttribute +} + } + + /** + * Returns true for those operators that we can prune nested column on it. + */ + private def canPruneOn(plan: LogicalPlan) = plan match { +case _: Aggregate => true +case _: Expand => true +case _ => false Review comment: I think I was wrong. Re-checking `FlatMapGroupsInPandas`'s Python API, it looks like ```python df.groupby("id").apply(udf).show() ``` So basically the python udf takes no nested column selection but a full columns of DataFrame. It doesn't do nested column pruning. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28560: [SPARK-27217][SQL] Nested column aliasing for more operators which can prune nested column
viirya commented on a change in pull request #28560: URL: https://github.com/apache/spark/pull/28560#discussion_r426391779 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -68,10 +76,23 @@ object NestedColumnAliasing { */ def replaceChildrenWithAliases( plan: LogicalPlan, + nestedFieldToAlias: Map[ExtractValue, Alias], attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = { plan.withNewChildren(plan.children.map { plan => Project(plan.output.flatMap(a => attrToAliases.getOrElse(a.exprId, Seq(a))), plan) -}) +}).transformExpressions { + case f: ExtractValue if nestedFieldToAlias.contains(f) => +nestedFieldToAlias(f).toAttribute +} + } + + /** + * Returns true for those operators that we can prune nested column on it. + */ + private def canPruneOn(plan: LogicalPlan) = plan match { +case _: Aggregate => true +case _: Expand => true +case _ => false Review comment: I think I was wrong. Re-checking `FlatMapGroupsInPandas`'s Python API, it looks like ```python df.groupby("id").apply(normalize).show() ``` So basically the python udf takes no nested column selection but a full columns of DataFrame. It doesn't do nested column pruning. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
AmplabJenkins removed a comment on pull request #28558: URL: https://github.com/apache/spark/pull/28558#issuecomment-629968873 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
AmplabJenkins commented on pull request #28558: URL: https://github.com/apache/spark/pull/28558#issuecomment-629968873 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
SparkQA removed a comment on pull request #28558: URL: https://github.com/apache/spark/pull/28558#issuecomment-629964194 **[Test build #122790 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122790/testReport)** for PR 28558 at commit [`8a43b8c`](https://github.com/apache/spark/commit/8a43b8c61cb356d8d94985ad499ed8a233a3e306). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
SparkQA commented on pull request #28558: URL: https://github.com/apache/spark/pull/28558#issuecomment-629968746 **[Test build #122790 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122790/testReport)** for PR 28558 at commit [`8a43b8c`](https://github.com/apache/spark/commit/8a43b8c61cb356d8d94985ad499ed8a233a3e306). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same
AmplabJenkins removed a comment on pull request #28511: URL: https://github.com/apache/spark/pull/28511#issuecomment-629966971 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table
SparkQA commented on pull request #28511: URL: https://github.com/apache/spark/pull/28511#issuecomment-629966651 **[Test build #122791 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122791/testReport)** for PR 28511 at commit [`f7c6b51`](https://github.com/apache/spark/commit/f7c6b5126b601a87a62ac6eb50217f03f3c1b27a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28544: [SPARK-31387][test-maven] Handle unknown operation/session ID in HiveThriftServer2Listener
maropu commented on a change in pull request #28544: URL: https://github.com/apache/spark/pull/28544#discussion_r426387696 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala ## @@ -131,60 +132,81 @@ private[thriftserver] class HiveThriftServer2Listener( updateLiveStore(session) } - private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit = { -val session = sessionList.get(e.sessionId) -session.finishTimestamp = e.finishTime -updateStoreWithTriggerEnabled(session) -sessionList.remove(e.sessionId) - } + private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit = +Option(sessionList.get(e.sessionId)) match { + case None => logWarning(s"onSessionClosed called with unknown session id: ${e.sessionId}") + case Some(sessionData) => +val session = sessionData +session.finishTimestamp = e.finishTime +updateStoreWithTriggerEnabled(session) +sessionList.remove(e.sessionId) +} - private def onOperationStart(e: SparkListenerThriftServerOperationStart): Unit = { -val info = getOrCreateExecution( - e.id, - e.statement, - e.sessionId, - e.startTime, - e.userName) - -info.state = ExecutionState.STARTED -executionList.put(e.id, info) -sessionList.get(e.sessionId).totalExecution += 1 -executionList.get(e.id).groupId = e.groupId -updateLiveStore(executionList.get(e.id)) -updateLiveStore(sessionList.get(e.sessionId)) - } + private def onOperationStart(e: SparkListenerThriftServerOperationStart): Unit = +Option(sessionList.get(e.sessionId)) match { + case None => logWarning(s"onOperationStart called with unknown session id: ${e.sessionId}") Review comment: We can keep processing queries even in this case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the
AmplabJenkins commented on pull request #28511: URL: https://github.com/apache/spark/pull/28511#issuecomment-629966971 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
maropu commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629965293 Looks nice! Thanks for re-trigger the tests, @HyukjinKwon . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
AmplabJenkins removed a comment on pull request #28558: URL: https://github.com/apache/spark/pull/28558#issuecomment-629964663 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins removed a comment on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629964658 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629964658 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
AmplabJenkins commented on pull request #28558: URL: https://github.com/apache/spark/pull/28558#issuecomment-629964663 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
SparkQA commented on pull request #28558: URL: https://github.com/apache/spark/pull/28558#issuecomment-629964194 **[Test build #122790 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122790/testReport)** for PR 28558 at commit [`8a43b8c`](https://github.com/apache/spark/commit/8a43b8c61cb356d8d94985ad499ed8a233a3e306). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
SparkQA commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629964220 **[Test build #122789 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122789/testReport)** for PR 28565 at commit [`49fa2a8`](https://github.com/apache/spark/commit/49fa2a87254238eb42e2801321dc50fb94cc50bb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28138: [SPARK-31366][DOCS][SQL] Add doc for the aggregation in SQL reference guide
maropu commented on pull request #28138: URL: https://github.com/apache/spark/pull/28138#issuecomment-629964286 Ah, I forgot to close this Thanks, @srowen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28544: [SPARK-31387][test-maven] Handle unknown operation/session ID in HiveThriftServer2Listener
maropu commented on a change in pull request #28544: URL: https://github.com/apache/spark/pull/28544#discussion_r426385130 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala ## @@ -131,60 +132,81 @@ private[thriftserver] class HiveThriftServer2Listener( updateLiveStore(session) } - private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit = { -val session = sessionList.get(e.sessionId) -session.finishTimestamp = e.finishTime -updateStoreWithTriggerEnabled(session) -sessionList.remove(e.sessionId) - } + private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit = +Option(sessionList.get(e.sessionId)) match { + case None => logWarning(s"onSessionClosed called with unknown session id: ${e.sessionId}") Review comment: Could you move the `None` pattern into the place after the `Some` pattern? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
HyukjinKwon commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629963202 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
HyukjinKwon commented on a change in pull request #28566: URL: https://github.com/apache/spark/pull/28566#discussion_r426384293 ## File path: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/LocalityPlacementStrategySuite.scala ## @@ -57,7 +62,6 @@ class LocalityPlacementStrategySuite extends SparkFunSuite { // goal is to create enough requests for localized containers (so there should be many // tasks on several hosts that have no allocated containers). -val resource = Resource.newInstance(8 * 1024, 4) Review comment: should be fixed now. thanks for pointing out quickly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
MaxGekk commented on a change in pull request #28558: URL: https://github.com/apache/spark/pull/28558#discussion_r426383975 ## File path: docs/sql-ref-datetime-pattern.md ## @@ -76,6 +76,57 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. +- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. Here are examples for all supported pattern letters: + - `'M'` or `'L'`: Month number in a year starting from 1. There is no difference between 'M' and 'L'. Month from 1 to 9 are printed without padding. +```sql +spark-sql> select date_format(date '1970-01-01', "M"); +1 +spark-sql> select date_format(date '1970-12-01', "L"); +12 +``` + - `'MM'` or `'LL'`: Month number in a year starting from 1. Zero padding is added for month 1-9. + ```sql + spark-sql> select date_format(date '1970-1-01', "LL"); + 01 + spark-sql> select date_format(date '1970-09-01', "MM"); + 09 + ``` + - `'MMM'`: Short textual representation in the standard form. The month pattern should be a part of a date pattern not just a stand-alone month except locales where there is no difference between stand and stand-alone forms like in English. +```sql +spark-sql> select date_format(date '1970-01-01', "d MMM"); +1 Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'dd MMM', 'locale', 'RU')); +01 янв. +``` + - `'LLL'`: Short textual representation in the stand-alone form. It should be used to format/parse only months without any other date fields. +```sql +spark-sql> select date_format(date '1970-01-01', "LLL"); +Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'LLL', 'locale', 'RU')); +янв. +``` + - `''`: full textual month representation in the standard form. It is used for parsing/formatting months as a part of dates/timestamps. +```sql +spark-sql> select date_format(date '1970-01-01', " "); +January 1970 +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'd ', 'locale', 'RU')); +1 января +``` + - `''`: full textual month representation in the stand-alone form. The pattern can be used to format/parse only months. +```sql +spark-sql> select date_format(date '1970-01-01', ""); +January +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', '', 'locale', 'RU')); +январь +``` + - `'L'` or `'M'`: Narrow textual representation of standard or stand-alone forms. Typically it is a single letter. Review comment: Added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
maropu commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629962468 It seems the failure above is not related to this PR. See: https://github.com/apache/spark/pull/28566 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
MaxGekk commented on a change in pull request #28558: URL: https://github.com/apache/spark/pull/28558#discussion_r426383919 ## File path: docs/sql-ref-datetime-pattern.md ## @@ -76,6 +76,57 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. +- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. Here are examples for all supported pattern letters: Review comment: Added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
HyukjinKwon commented on a change in pull request #28566: URL: https://github.com/apache/spark/pull/28566#discussion_r426383725 ## File path: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/LocalityPlacementStrategySuite.scala ## @@ -57,7 +62,6 @@ class LocalityPlacementStrategySuite extends SparkFunSuite { // goal is to create enough requests for localized containers (so there should be many // tasks on several hosts that have no allocated containers). -val resource = Resource.newInstance(8 * 1024, 4) Review comment: ah, yeah. Seems it's used in the branch-3.0. I will fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path
AmplabJenkins removed a comment on pull request #28527: URL: https://github.com/apache/spark/pull/28527#issuecomment-629961956 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path
AmplabJenkins commented on pull request #28527: URL: https://github.com/apache/spark/pull/28527#issuecomment-629961956 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
maropu commented on a change in pull request #28566: URL: https://github.com/apache/spark/pull/28566#discussion_r426383379 ## File path: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/LocalityPlacementStrategySuite.scala ## @@ -57,7 +62,6 @@ class LocalityPlacementStrategySuite extends SparkFunSuite { // goal is to create enough requests for localized containers (so there should be many // tasks on several hosts that have no allocated containers). -val resource = Resource.newInstance(8 * 1024, 4) Review comment: https://github.com/apache/spark/pull/28566#issuecomment-629961204 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu edited a comment on pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
maropu edited a comment on pull request #28566: URL: https://github.com/apache/spark/pull/28566#issuecomment-629961204 @HyukjinKwon It seems branch-3.0 broken? ``` [info] Done packaging. [error] /home/jenkins/workspace/SparkPullRequestBuilder@4/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/LocalityPlacementStrategySuite.scala:65: not found: value resource [error] yarnConf, resource, new MockResolver()) [error] ^ [info] Packaging /home/jenkins/workspace/SparkPullRequestBuilder@4/external/kafka-0-10-token-provider/target/scala-2.12/spark-token-provider-kafka-0-10_2.12-3.0.1-SNAPSHOT-tests.jar ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path
SparkQA removed a comment on pull request #28527: URL: https://github.com/apache/spark/pull/28527#issuecomment-629879870 **[Test build #122764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122764/testReport)** for PR 28527 at commit [`7d50c17`](https://github.com/apache/spark/commit/7d50c17ceca0051e455ec1faf17f3c9ad05a206f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
maropu commented on pull request #28566: URL: https://github.com/apache/spark/pull/28566#issuecomment-629961204 @HyukjinKwon It seems branch-3.0 broken? ``` LocalityPlacementStrategySuite ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path
SparkQA commented on pull request #28527: URL: https://github.com/apache/spark/pull/28527#issuecomment-629961162 **[Test build #122764 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122764/testReport)** for PR 28527 at commit [`7d50c17`](https://github.com/apache/spark/commit/7d50c17ceca0051e455ec1faf17f3c9ad05a206f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins removed a comment on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629960072 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122788/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
SparkQA commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629960051 **[Test build #122788 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122788/testReport)** for PR 28565 at commit [`49fa2a8`](https://github.com/apache/spark/commit/49fa2a87254238eb42e2801321dc50fb94cc50bb). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629960066 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
SparkQA removed a comment on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629957300 **[Test build #122788 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122788/testReport)** for PR 28565 at commit [`49fa2a8`](https://github.com/apache/spark/commit/49fa2a87254238eb42e2801321dc50fb94cc50bb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins removed a comment on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629960066 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28563: [SPARK-31743][CORE] Add spark_info metric into PrometheusResource
AmplabJenkins commented on pull request #28563: URL: https://github.com/apache/spark/pull/28563#issuecomment-629959652 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28563: [SPARK-31743][CORE] Add spark_info metric into PrometheusResource
AmplabJenkins removed a comment on pull request #28563: URL: https://github.com/apache/spark/pull/28563#issuecomment-629959652 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same
yaooqinn commented on a change in pull request #28511: URL: https://github.com/apache/spark/pull/28511#discussion_r426381332 ## File path: sql/hive/benchmarks/InsertIntoHiveTableBenchmark-results.txt ## @@ -0,0 +1,11 @@ +Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.4 +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz +insert hive table benchmark: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +INSERT INTO DYNAMIC7346 7470 175 0.0 717423.0 1.0X +INSERT INTO HYBRID 1179 1188 13 0.0 115184.2 6.2X +INSERT INTO STATIC 344367 48 0.0 33585.1 21.4X +INSERT OVERWRITE DYNAMIC 7656 7714 82 0.0 747622.7 1.0X +INSERT OVERWRITE HYBRID1179 1183 6 0.0 115163.3 6.2X +INSERT OVERWRITE STATIC 400408 10 0.0 39014.2 18.4X Review comment: Let me run this benchmark on the master branch and update the result later in the PR description. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28563: [SPARK-31743][CORE] Add spark_info metric into PrometheusResource
SparkQA removed a comment on pull request #28563: URL: https://github.com/apache/spark/pull/28563#issuecomment-629892529 **[Test build #122773 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122773/testReport)** for PR 28563 at commit [`a0aff8a`](https://github.com/apache/spark/commit/a0aff8af880f6d64e43a7229f44fa7237dfd718e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28563: [SPARK-31743][CORE] Add spark_info metric into PrometheusResource
SparkQA commented on pull request #28563: URL: https://github.com/apache/spark/pull/28563#issuecomment-629958887 **[Test build #122773 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122773/testReport)** for PR 28563 at commit [`a0aff8a`](https://github.com/apache/spark/commit/a0aff8af880f6d64e43a7229f44fa7237dfd718e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28562: [SPARK-31742][TESTS] Increase the eventually time limit for Mino kdc in tests to fix flakiness
AmplabJenkins commented on pull request #28562: URL: https://github.com/apache/spark/pull/28562#issuecomment-629958185 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
AmplabJenkins removed a comment on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629957757 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122775/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28562: [SPARK-31742][TESTS] Increase the eventually time limit for Mino kdc in tests to fix flakiness
AmplabJenkins removed a comment on pull request #28562: URL: https://github.com/apache/spark/pull/28562#issuecomment-629958185 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same
yaooqinn commented on a change in pull request #28511: URL: https://github.com/apache/spark/pull/28511#discussion_r426380145 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/execution/benchmark/InsertIntoHiveTableBenchmark.scala ## @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.Benchmark +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.hive.HiveUtils +import org.apache.spark.sql.hive.test.TestHive + +/** + * Benchmark to measure hive table write performance. + * To run this benchmark: + * {{{ + * 1. without sbt: bin/spark-submit --class + *--jars ,, + *--packages org.spark-project.hive:hive-exec:1.2.1.spark2 + * + * 2. build/sbt "hive/test:runMain " -Phive-1.2 or + * build/sbt "hive/test:runMain " -Phive-2.3 + * 3. generate result: + * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "hive/test:runMain " + * Results will be written to "benchmarks/InsertIntoHiveTableBenchmark-results.txt". + * 4. -Phive-1.2 does not work for JDK 11 + * }}} + */ +object InsertIntoHiveTableBenchmark extends SqlBasedBenchmark { + + override def getSparkSession: SparkSession = TestHive.sparkSession + + val tempTable = "temp" + val numRows = 1024 * 10 + val sql = spark.sql _ + + // scalastyle:off hadoopconfiguration + private val hadoopConf = spark.sparkContext.hadoopConfiguration + // scalastyle:on hadoopconfiguration + hadoopConf.set("hive.exec.dynamic.partition", "true") + hadoopConf.set("hive.exec.dynamic.partition.mode", "nonstrict") + hadoopConf.set("hive.exec.max.dynamic.partitions", numRows.toString) + + def withTempTable(tableNames: String*)(f: => Unit): Unit = { +val ds = spark.range(numRows) +tableNames.foreach { name => + ds.createOrReplaceTempView(name) +} +try f finally tableNames.foreach(spark.catalog.dropTempView) + } + + def withTable(tableNames: String*)(f: => Unit): Unit = { +tableNames.foreach { name => + sql(s"CREATE TABLE $name(a INT) STORED AS TEXTFILE PARTITIONED BY (b INT, c INT)") +} +try f finally { + tableNames.foreach { name => +spark.sql(s"DROP TABLE IF EXISTS $name") + } +} + } + + def insertOverwriteDynamic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT OVERWRITE DYNAMIC") { _ => + sql(s"INSERT OVERWRITE TABLE $table SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertOverwriteHybrid(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT OVERWRITE HYBRID") { _ => + sql(s"INSERT OVERWRITE TABLE $table partition(b=1, c) SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertOverwriteStatic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT OVERWRITE STATIC") { _ => + sql(s"INSERT OVERWRITE TABLE $table partition(b=1, c=10) SELECT CAST(id AS INT) AS a" + +s" FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertIntoDynamic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT INTO DYNAMIC") { _ => + sql(s"INSERT INTO TABLE $table SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertIntoHybrid(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT INTO HYBRID") { _ => + sql(s"INSERT INTO TABLE $table partition(b=1, c) SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertIntoStatic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT INTO STATIC") { _ => + sql(s"INSERT INTO TABLE $table partition(b=1, c=10) SELECT CAST(id AS INT) AS a" + +s" FROM $tempTable DISTRIBUTE BY a") +} + } + + override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { +withTempTable(tempTable) { + val t1 = "t1"
[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same
yaooqinn commented on a change in pull request #28511: URL: https://github.com/apache/spark/pull/28511#discussion_r426379977 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/execution/benchmark/InsertIntoHiveTableBenchmark.scala ## @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.Benchmark +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.hive.HiveUtils +import org.apache.spark.sql.hive.test.TestHive + +/** + * Benchmark to measure hive table write performance. + * To run this benchmark: + * {{{ + * 1. without sbt: bin/spark-submit --class + *--jars ,, + *--packages org.spark-project.hive:hive-exec:1.2.1.spark2 + * + * 2. build/sbt "hive/test:runMain " -Phive-1.2 or + * build/sbt "hive/test:runMain " -Phive-2.3 + * 3. generate result: + * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "hive/test:runMain " + * Results will be written to "benchmarks/InsertIntoHiveTableBenchmark-results.txt". + * 4. -Phive-1.2 does not work for JDK 11 + * }}} + */ +object InsertIntoHiveTableBenchmark extends SqlBasedBenchmark { + + override def getSparkSession: SparkSession = TestHive.sparkSession + + val tempTable = "temp" + val numRows = 1024 * 10 + val sql = spark.sql _ + + // scalastyle:off hadoopconfiguration + private val hadoopConf = spark.sparkContext.hadoopConfiguration + // scalastyle:on hadoopconfiguration + hadoopConf.set("hive.exec.dynamic.partition", "true") + hadoopConf.set("hive.exec.dynamic.partition.mode", "nonstrict") + hadoopConf.set("hive.exec.max.dynamic.partitions", numRows.toString) + + def withTempTable(tableNames: String*)(f: => Unit): Unit = { +val ds = spark.range(numRows) +tableNames.foreach { name => + ds.createOrReplaceTempView(name) +} +try f finally tableNames.foreach(spark.catalog.dropTempView) + } + + def withTable(tableNames: String*)(f: => Unit): Unit = { +tableNames.foreach { name => + sql(s"CREATE TABLE $name(a INT) STORED AS TEXTFILE PARTITIONED BY (b INT, c INT)") +} +try f finally { + tableNames.foreach { name => +spark.sql(s"DROP TABLE IF EXISTS $name") + } +} + } + + def insertOverwriteDynamic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT OVERWRITE DYNAMIC") { _ => + sql(s"INSERT OVERWRITE TABLE $table SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertOverwriteHybrid(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT OVERWRITE HYBRID") { _ => + sql(s"INSERT OVERWRITE TABLE $table partition(b=1, c) SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertOverwriteStatic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT OVERWRITE STATIC") { _ => + sql(s"INSERT OVERWRITE TABLE $table partition(b=1, c=10) SELECT CAST(id AS INT) AS a" + +s" FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertIntoDynamic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT INTO DYNAMIC") { _ => + sql(s"INSERT INTO TABLE $table SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertIntoHybrid(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT INTO HYBRID") { _ => + sql(s"INSERT INTO TABLE $table partition(b=1, c) SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertIntoStatic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT INTO STATIC") { _ => + sql(s"INSERT INTO TABLE $table partition(b=1, c=10) SELECT CAST(id AS INT) AS a" + +s" FROM $tempTable DISTRIBUTE BY a") +} + } + + override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { +withTempTable(tempTable) { + val t1 = "t1"
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
AmplabJenkins removed a comment on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629957755 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
AmplabJenkins commented on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629957755 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629957600 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28562: [SPARK-31742][TESTS] Increase the eventually time limit for Mino kdc in tests to fix flakiness
SparkQA removed a comment on pull request #28562: URL: https://github.com/apache/spark/pull/28562#issuecomment-629891408 **[Test build #122772 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122772/testReport)** for PR 28562 at commit [`8bc6df2`](https://github.com/apache/spark/commit/8bc6df2c2e86d917e48c4debd78e87f714a27151). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
SparkQA removed a comment on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629900457 **[Test build #122775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122775/testReport)** for PR 27066 at commit [`7f76539`](https://github.com/apache/spark/commit/7f76539ac50f59264cc443cfec93e4a8f4e495ab). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins removed a comment on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629957600 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
SparkQA commented on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629957411 **[Test build #122775 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122775/testReport)** for PR 27066 at commit [`7f76539`](https://github.com/apache/spark/commit/7f76539ac50f59264cc443cfec93e4a8f4e495ab). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28562: [SPARK-31742][TESTS] Increase the eventually time limit for Mino kdc in tests to fix flakiness
SparkQA commented on pull request #28562: URL: https://github.com/apache/spark/pull/28562#issuecomment-629957428 **[Test build #122772 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122772/testReport)** for PR 28562 at commit [`8bc6df2`](https://github.com/apache/spark/commit/8bc6df2c2e86d917e48c4debd78e87f714a27151). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
SparkQA commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629957300 **[Test build #122788 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122788/testReport)** for PR 28565 at commit [`49fa2a8`](https://github.com/apache/spark/commit/49fa2a87254238eb42e2801321dc50fb94cc50bb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
HyukjinKwon closed pull request #28566: URL: https://github.com/apache/spark/pull/28566 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
HyukjinKwon commented on pull request #28566: URL: https://github.com/apache/spark/pull/28566#issuecomment-629956109 Merged to master and branch-3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
HyukjinKwon commented on pull request #28566: URL: https://github.com/apache/spark/pull/28566#issuecomment-629955913 I am going to merge this, see https://github.com/apache/spark/pull/28463#issuecomment-629955825. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner
HyukjinKwon commented on pull request #28463: URL: https://github.com/apache/spark/pull/28463#issuecomment-629955825 `LocalityPlacementStrategySuite` was failed again. Potentially related. I am going to merge https://github.com/apache/spark/pull/28566 together. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
maropu commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629955864 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner
AmplabJenkins removed a comment on pull request #28463: URL: https://github.com/apache/spark/pull/28463#issuecomment-629955676 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner
AmplabJenkins commented on pull request #28463: URL: https://github.com/apache/spark/pull/28463#issuecomment-629955676 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner
cloud-fan closed pull request #28463: URL: https://github.com/apache/spark/pull/28463 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner
SparkQA commented on pull request #28463: URL: https://github.com/apache/spark/pull/28463#issuecomment-629955235 **[Test build #122781 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122781/testReport)** for PR 28463 at commit [`978e60e`](https://github.com/apache/spark/commit/978e60e171e35b01ee166e00c4f63da3db877aad). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `// starting closure (in class T)` * `// we need to track calls from \"inner closure\" to outer classes relative to it (class T, A, B)` * `logDebug(s\"found inner class $ownerExternalName\")` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner
cloud-fan commented on pull request #28463: URL: https://github.com/apache/spark/pull/28463#issuecomment-629955260 We don't have many critical changes after the last success build: https://github.com/apache/spark/pull/28463#issuecomment-624694820 The failed flaky tests are unrelated to this PR, and we need to unblock 3.0 ASAP. I'm merging it first, will monitor the jenkins builds later. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner
SparkQA removed a comment on pull request #28463: URL: https://github.com/apache/spark/pull/28463#issuecomment-629923433 **[Test build #122781 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122781/testReport)** for PR 28463 at commit [`978e60e`](https://github.com/apache/spark/commit/978e60e171e35b01ee166e00c4f63da3db877aad). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
AmplabJenkins commented on pull request #28561: URL: https://github.com/apache/spark/pull/28561#issuecomment-629954597 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
SparkQA commented on pull request #28561: URL: https://github.com/apache/spark/pull/28561#issuecomment-629954432 **[Test build #122786 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122786/testReport)** for PR 28561 at commit [`77c2e14`](https://github.com/apache/spark/commit/77c2e14669f64c19d6068dbec695287b08f54205). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
SparkQA removed a comment on pull request #28561: URL: https://github.com/apache/spark/pull/28561#issuecomment-629949398 **[Test build #122786 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122786/testReport)** for PR 28561 at commit [`77c2e14`](https://github.com/apache/spark/commit/77c2e14669f64c19d6068dbec695287b08f54205). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
AmplabJenkins removed a comment on pull request #28561: URL: https://github.com/apache/spark/pull/28561#issuecomment-629954597 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
cloud-fan commented on a change in pull request #28558: URL: https://github.com/apache/spark/pull/28558#discussion_r426375094 ## File path: docs/sql-ref-datetime-pattern.md ## @@ -76,6 +76,57 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. +- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. Here are examples for all supported pattern letters: + - `'M'` or `'L'`: Month number in a year starting from 1. There is no difference between 'M' and 'L'. Month from 1 to 9 are printed without padding. +```sql +spark-sql> select date_format(date '1970-01-01', "M"); +1 +spark-sql> select date_format(date '1970-12-01', "L"); +12 +``` + - `'MM'` or `'LL'`: Month number in a year starting from 1. Zero padding is added for month 1-9. + ```sql + spark-sql> select date_format(date '1970-1-01', "LL"); + 01 + spark-sql> select date_format(date '1970-09-01', "MM"); + 09 + ``` + - `'MMM'`: Short textual representation in the standard form. The month pattern should be a part of a date pattern not just a stand-alone month except locales where there is no difference between stand and stand-alone forms like in English. +```sql +spark-sql> select date_format(date '1970-01-01', "d MMM"); +1 Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'dd MMM', 'locale', 'RU')); +01 янв. +``` + - `'LLL'`: Short textual representation in the stand-alone form. It should be used to format/parse only months without any other date fields. +```sql +spark-sql> select date_format(date '1970-01-01', "LLL"); +Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'LLL', 'locale', 'RU')); +янв. +``` + - `''`: full textual month representation in the standard form. It is used for parsing/formatting months as a part of dates/timestamps. +```sql +spark-sql> select date_format(date '1970-01-01', " "); +January 1970 +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'd ', 'locale', 'RU')); +1 января +``` + - `''`: full textual month representation in the stand-alone form. The pattern can be used to format/parse only months. +```sql +spark-sql> select date_format(date '1970-01-01', ""); +January +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', '', 'locale', 'RU')); +январь +``` + - `'L'` or `'M'`: Narrow textual representation of standard or stand-alone forms. Typically it is a single letter. Review comment: how about ``` Here are examples for all supported pattern letters (more than 5 letter is invalid): ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode
AmplabJenkins removed a comment on pull request #28523: URL: https://github.com/apache/spark/pull/28523#issuecomment-629951647 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode
AmplabJenkins commented on pull request #28523: URL: https://github.com/apache/spark/pull/28523#issuecomment-629951647 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned
holdenk commented on a change in pull request #28370: URL: https://github.com/apache/spark/pull/28370#discussion_r426374454 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ## @@ -1829,7 +1901,58 @@ private[spark] class BlockManager( data.dispose() } + /** + * Class to handle block manager decommissioning retries + * It creates a Thread to retry offloading all RDD cache blocks + */ + private class BlockManagerDecommissionManager(conf: SparkConf) { Review comment: So if you look at the parent issue you can see there is another sub issue that says migrate shuffle blocks. It’s ok to ask for a follow up even if there is one (we all miss things in reading), but attempt to vote a -1 has a higher bar than just asking for something. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode
SparkQA commented on pull request #28523: URL: https://github.com/apache/spark/pull/28523#issuecomment-629951351 **[Test build #122787 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122787/testReport)** for PR 28523 at commit [`1955f01`](https://github.com/apache/spark/commit/1955f01fa870cd180f66a22070ee1b0ca9a73ca3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
dongjoon-hyun closed pull request #28561: URL: https://github.com/apache/spark/pull/28561 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned
holdenk commented on a change in pull request #28370: URL: https://github.com/apache/spark/pull/28370#discussion_r426373154 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ## @@ -1829,7 +1901,58 @@ private[spark] class BlockManager( data.dispose() } + /** + * Class to handle block manager decommissioning retries + * It creates a Thread to retry offloading all RDD cache blocks + */ + private class BlockManagerDecommissionManager(conf: SparkConf) { +@volatile private var stopped = false +private val sleepInterval = conf.get( + config.STORAGE_DECOMMISSION_REPLICATION_REATTEMPT_INTERVAL) + +private val blockReplicationThread = new Thread { + override def run(): Unit = { +var failures = 0 +while (blockManagerDecommissioning + && !stopped + && !Thread.interrupted() Review comment: If an interrupt exception is caught the thread would still be marked as interrupted This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned
holdenk commented on a change in pull request #28370: URL: https://github.com/apache/spark/pull/28370#discussion_r426372989 ## File path: core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionSuite.scala ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.storage + +import java.util.concurrent.Semaphore + +import scala.collection.mutable.ArrayBuffer +import scala.concurrent.duration._ + +import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, SparkFunSuite, Success} +import org.apache.spark.internal.config +import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd, SparkListenerTaskStart} +import org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend +import org.apache.spark.util.{ResetSystemProperties, ThreadUtils} + +class BlockManagerDecommissionSuite extends SparkFunSuite with LocalSparkContext +with ResetSystemProperties { + + override def beforeEach(): Unit = { +val conf = new SparkConf().setAppName("test") + .set(config.Worker.WORKER_DECOMMISSION_ENABLED, true) + .set(config.STORAGE_DECOMMISSION_ENABLED, true) + +sc = new SparkContext("local-cluster[2, 1, 1024]", "test", conf) + } + + test(s"verify that an already running task which is going to cache data succeeds " + +s"on a decommissioned executor") { +// Create input RDD with 10 partitions +val input = sc.parallelize(1 to 10, 10) +val accum = sc.longAccumulator("mapperRunAccumulator") +// Do a count to wait for the executors to be registered. Review comment: That’s ok for this test. But no harm in changing to the utility function This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org