[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18258 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77866/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18258 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18258 **[Test build #77866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77866/testReport)** for PR 18258 at commit [`e4cfe1c`](https://github.com/apache/spark/commit/e4cfe1ca358aee4842395904a1d29da4b9d534e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16986: [SPARK-18891][SQL] Support for Scala Map collection type...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16986 ping @michalsenkyr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18009: [SPARK-18891][SQL] Support for specific Java List subtyp...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18009 ping @michalsenkyr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 If there is no regression, I'd remove the flag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18248: [SPARK-21031] [SQL] Add `alterTableStats` to store spark...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18248 LGTM except one minor comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18248: [SPARK-21031] [SQL] Add `alterTableStats` to stor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18248#discussion_r121250769 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -614,9 +594,11 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from the old table definition, --- End diff -- please update comments to include states --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18248: [SPARK-21031] [SQL] Add `alterTableStats` to store spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18248 **[Test build #77868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77868/testReport)** for PR 18248 at commit [`38d03d7`](https://github.com/apache/spark/commit/38d03d7d5c492c4d9bb2e0029ea9595f0eefda0e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18248: [SPARK-21031] [SQL] Add `alterTableStats` to stor...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18248#discussion_r121250467 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -313,60 +313,70 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto } } - test("alter table SET TBLPROPERTIES after analyze table") { --- End diff -- Here I found the logic is the same for two cases except only the command (set and unset, respectively), so I extract the common logic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18257: [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange o...
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/18257 The end result looks good to me. Thanks for fixing it! Although, I'd prefer fixing the actual integer overflow handling in `RangeExec`'s codegen, too. Even though your fix will handle the problematic case for sure, it still feels a bit awkward to rely on an optimization for correctness. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18248 **[Test build #77867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77867/testReport)** for PR 18248 at commit [`2649135`](https://github.com/apache/spark/commit/2649135dac1472b90534abf0e51ac2760360c134). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18258 Sure. Three times for each. Track = F: Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = F 12657 / 12700 6.6 150.9 1.0X codegen = T hashmap = F, track = F6779 / 7582 12.4 80.8 1.9X codegen = T hashmap = T, track = F1505 / 1619 55.7 17.9 8.4X Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = F 10085 / 10597 8.3 120.2 1.0X codegen = T hashmap = F, track = F5915 / 6069 14.2 70.5 1.7X codegen = T hashmap = T, track = F1610 / 1796 52.1 19.2 6.3X Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = F 10275 / 10584 8.2 122.5 1.0X codegen = T hashmap = F, track = F6140 / 6557 13.7 73.2 1.7X codegen = T hashmap = T, track = F1301 / 1565 64.5 15.5 7.9X Track = T: Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = T 10723 / 10865 7.8 127.8 1.0X codegen = T hashmap = F, track = T6246 / 6432 13.4 74.5 1.7X codegen = T hashmap = T, track = T1465 / 1571 57.3 17.5 7.3X Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = T 9964 / 10348 8.4 118.8 1.0X codegen = T hashmap = F, track = T6225 / 6375 13.5 74.2 1.6X codegen = T hashmap = T, track = T1361 / 1485 61.6 16.2 7.3X Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = T 10125 / 10674 8.3 120.7 1.0X codegen = T hashmap = F, track = T6865 / 6980 12.2 81.8 1.5X codegen = T hashmap = T, track = T1491 / 1579 56.3 17.8 6.8X --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Can you run it a few more times to tell? Right now it's a difference of 7% almost --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18258 Is it significant? Seems to me that it's in the variance of different runs? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 16.8 vs 15.8? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18258 I just ran the existing `AggregateBenchmark` with the new tracking config: Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.27-moby Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = F 10655 / 11043 7.9 127.0 1.0X codegen = T hashmap = F, track = F 6923 / 7133 12.1 82.5 1.5X codegen = T hashmap = T, track = F 1325 / 1511 63.3 15.8 8.0X Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.27-moby Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = T 10809 / 11007 7.8 128.9 1.0X codegen = T hashmap = F, track = T6581 / 6629 12.7 78.4 1.6X codegen = T hashmap = T, track = T1411 / 1552 59.4 16.8 7.7X Looks like no obvious perf degradation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18258 Sure. Will update later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Can you test the perf degradation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18258 The `enablePerfMetrics` parameter of `UnsafeFixedWidthAggregationMap` has this comment: * @param enablePerfMetrics if true, performance metrics will be recorded (has minor perf impact) It's true those metrics are simple counter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/18248 @cloud-fan Oh, right, let me try. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Why would the tracking have perf impact? It's just a simple counter increase isn't it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18257: [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18257 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18258 **[Test build #77866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77866/testReport)** for PR 18258 at commit [`e4cfe1c`](https://github.com/apache/spark/commit/e4cfe1ca358aee4842395904a1d29da4b9d534e4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18257: [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18257 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77865/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18257: [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18257 **[Test build #77865 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77865/testReport)** for PR 18257 at commit [`6ab9b6f`](https://github.com/apache/spark/commit/6ab9b6fb559ab11ac95078a1672672edf85efbac). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to a...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/18258 [SPARK-20953][SQL][WIP] Add hash map metrics to aggregate ## What changes were proposed in this pull request? This adds the average hash map probe metrics to hash aggregate. A new config is added: `spark.sql.execution.trackhashMapProbe`. Once the config is enabled, we record the metrics in `HashAggregateExec`. `BytesToBytesMap` already has API to get the metrics, this PR adds an API to `UnsafeFixedWidthAggregationMap` to access it. Preparing a test for this metrics seems tricky, because we don't know what collision keys are. For now, the test case generates random data large enough to have desired probe. TODO: add hash map metrics to join. ## How was this patch tested? Added test to SQLMetricsSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-20953 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18258.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18258 commit e4cfe1ca358aee4842395904a1d29da4b9d534e4 Author: Liang-Chi Hsieh Date: 2017-06-04T00:15:24Z Report average hashmap probe when the config is enabled. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18150 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18150 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77864/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18150 **[Test build #77864 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77864/testReport)** for PR 18150 at commit [`78bce79`](https://github.com/apache/spark/commit/78bce792c5b057120ba1e4e02ed00cd0c3487629). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18248 `alterTable` can read states from the old table and keep it: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L564 basically `alterTable` should ignore the stats from input table metadata and keep the stats as it was in the old table metadata. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/18248 I mean how can we keep existing stats? Since we cannot tell whether it's from hive or spark, if we store it as spark's stats, then we come back to the problem. If we don't, then we lost stats if it's generated by spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/18248 @cloud-fan How can we tell in `alterTable` whether it's new stats or not? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18248 `alterTable` won't set new stats but can still keep existing states, can we implement this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/18248 @cloud-fan Actually that is my first version. It also has problems: if we generate spark's stats first (through analyze command and `alterTableStats`), then do a regular alter table command, the stats will lost. Because we don't store stats info (which is from`CatalogStatistics`) through `alterTable`, and we don't know where the stats info come from (hive or spark), so we can't decide whether to store the stats or not in `alterTable`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18209: [SPARK-20992][Scheduler] Add support for Nomad as a sche...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/18209 Given that at Facebook we use our own in-house scheduler, I see why people would want to see their scheduler impls added right in Spark codebase as a first class citizen. Like @srowen said, this should not be part of Spark codebase. There would be many schedulers that would be developed in future and if we keep adding those to Spark, it will be lot of maintenance burden. Mesos and YARN have their place in the codebase given that they were there in early days of the project and are widely used in deployments (not saying that this is good or bad). This PR is making things worse for future by taking the easier route of adding yet another `if ..else` or `switch case` for a new scheduler impl. One can blame that this should have been cleanly right in the start for mesos / YARN but to give benefit of doubt to original authors, they didn't expect new schedulers to come up in future. I would recommend to add interfaces in Spark code to abstract out things so that plugging in new schedulers would be as simple as adding a jar to the classpath. As a matter of fact, we are currently able to use Spark with our own scheduler by extending the existing interfaces. The only patching we have to do is 3-4 lines which is not super pain during upgrades. Seems like you need more than the current interfaces. Having said this, I do appreciate the efforts put in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18244: [SPARK-20211][SQL] Fix the Precision and Scale of Decima...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18244 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18251: [SPARK-21033][SQL] fix the potential OOM in UnsafeExtern...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18251 also cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18248 I think the real issue is that, we mistakenly add statistics in `ALTER TABLE`. This is because `ExternalCatalog.alterTable` is heavily used when we wanna change something for a table. I think it would be better to introduce a `ExternalCatalog.alterTableStats` and use it in `ANALYZE TABLE`, so that `ALTER TABLE` won't add statistics. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18231 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18231 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77862/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18231 **[Test build #77862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77862/testReport)** for PR 18231 at commit [`5dd0e77`](https://github.com/apache/spark/commit/5dd0e5c302e915b601b8c4b17806219ebb3b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18257: [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18257 **[Test build #77865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77865/testReport)** for PR 18257 at commit [`6ab9b6f`](https://github.com/apache/spark/commit/6ab9b6fb559ab11ac95078a1672672edf85efbac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18257: [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalid...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/18257 [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange optimizer ## What changes were proposed in this pull request? This PR aims to add an optimizer remove invalid `Range` operators from the beginning. There are two cases of invalidity. 1. The `start` and `end` values are equal. 2. The sign of `step` does not match `start` and `end`'s increasing order. With `wholeStageCodeGen` option, both cases generates unnecessary java codes. More worse, [SPARK-21041](https://issues.apache.org/jira/browse/SPARK-21041) reports that the second case returns wrong result. **BEFORE** ``` scala> spark.range(0,0,1).explain == Physical Plan == *Range (0, 0, step=1, splits=8) ``` **AFTER** ``` scala> spark.range(0,0,1).explain == Physical Plan == LocalTableScan , [id#0L] ``` ## How was this patch tested? Pass the Jenkins with newly added test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-21041 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18257.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18257 commit 6ab9b6fb559ab11ac95078a1672672edf85efbac Author: Dongjoon Hyun Date: 2017-06-10T01:45:23Z [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange optimizer **BEFORE** ``` scala> spark.range(0,0,1).explain == Physical Plan == *Range (0, 0, step=1, splits=8) ``` **AFTER** ``` scala> spark.range(0,0,1).explain == Physical Plan == LocalTableScan , [id#0L] ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18231 **[Test build #77863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77863/testReport)** for PR 18231 at commit [`1e72eab`](https://github.com/apache/spark/commit/1e72eab5a7bc3ee5a48f8380dc4398874f1c36ac). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18231 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77863/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18231 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18209: [SPARK-20992][Scheduler] Add support for Nomad as a sche...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18209 The next one to add is probably Kubernetes. Even the Spark on Kubernetes is going through this cycle of maintaining a separate project for it first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18228: [SPARK-21007][SQL]Add SQL function - RIGHT && LEFT
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18228 Are these ANSI SQL functions? If it is just some esoteric MySQL function I don't think we should add them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18236: [SPARK-21015] Check field name is not null and empty in ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18236 Why do we want this check? If the user passes in null value, it is ok if it is not found, isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18252: [SPARK-17914][SQL] Fix parsing of timestamp strin...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18252#discussion_r121246583 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -32,7 +32,7 @@ import org.apache.spark.unsafe.types.UTF8String * Helper functions for converting between internal and external date and time representations. * Dates are exposed externally as java.sql.Date and are represented internally as the number of * dates since the Unix epoch (1970-01-01). Timestamps are exposed externally as java.sql.Timestamp - * and are stored internally as longs, which are capable of storing timestamps with 100 nanosecond + * and are stored internally as longs, which are capable of storing timestamps with microsecond --- End diff -- 100 ns is different from micro, isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18256: [SPARK-21042][SQL] Document Dataset.union is reso...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18256 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18256 Merging in master/branch-2.2. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18256 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18256 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77861/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18256 **[Test build #77861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77861/testReport)** for PR 18256 at commit [`b464adb`](https://github.com/apache/spark/commit/b464adbf16145efc5b6aa1f3db70bf60912ea439). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18150 **[Test build #77864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77864/testReport)** for PR 18150 at commit [`78bce79`](https://github.com/apache/spark/commit/78bce792c5b057120ba1e4e02ed00cd0c3487629). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18150 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77859/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18150 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18150 **[Test build #77859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77859/testReport)** for PR 18150 at commit [`ec89ac1`](https://github.com/apache/spark/commit/ec89ac1d53beb5a598f12d3a0bd4c6774cdd1976). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18248 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18248 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77860/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18248 **[Test build #77860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77860/testReport)** for PR 18248 at commit [`835b6f2`](https://github.com/apache/spark/commit/835b6f280eca4c63c3e6e19df7e33237c9cd505d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r121244808 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2651,4 +2652,28 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { val e = intercept[AnalysisException](sql("SELECT nvl(1, 2, 3)")) assert(e.message.contains("Invalid number of arguments")) } + + test("TRIM function") { +val ae1 = intercept[ParseException]{ + sql("select LTRIM(BOTH 'S' FROM 'SS abc S')").head +} +assert(ae1.getMessage contains + "The specified function LTRIM doesn't support with option BOTH") +val ae2 = intercept[ParseException]{ + sql("select RTRIM(TRAILING 'S' FROM 'SS abc S')").head +} +assert(ae2.getMessage contains + "The specified function RTRIM doesn't support with option TRAILING") +val ae3 = intercept[ParseException]{ + sql("select TRIM(OVER 'S' AND 'SS abc S')").head +} +assert(ae3.getMessage contains + "Literals of type 'OVER' are currently not supported") +checkAnswer( + sql("SELECT TRIM(BOTH '@$%&( )abc' FROM '@ $ % & ()abc ' )"), Row("") :: Nil) +checkAnswer( + sql("SELECT TRIM(LEADING 'c []' FROM '[ bcc ')"), Row("bcc ") :: Nil) +checkAnswer( + sql("SELECT TRIM(TRAILING 'c&^,.' FROM 'bc...,,,&&&ccc' )"), Row("b") :: Nil) --- End diff -- OK, can you also update the description about this syntax? We are supporting two equivalent syntax here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18238: [SPARK-21016][core]Improve code fault tolerance for conv...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/18238 If a string has a leading or trailing space, it's expected it cannot be converted to a number. Java's string type also follows this convention. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 @vanzin Thanks again for comments :) I refined accordingly, please take another look when you have time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18231: [SPARK-20994] Remove reduant characters in OpenBl...
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r121242495 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,51 @@ private ShuffleMetrics() { } } + private class ManagedBufferIterator implements Iterator { + +private int index = 0; +private String appId; +private String execId; +private String shuffleId; +// An array containing mapId and reduceId pairs. +private int[] mapIdAndReduceIds; + +ManagedBufferIterator(String appId, String execId, String[] blockIds) { + this.appId = appId; + this.execId = execId; + String[] blockId0Parts = blockIds[0].split("_"); + if (blockId0Parts.length < 4) { +throw new IllegalArgumentException("Unexpected block id format: " + blockIds[0]); + } else if (!blockId0Parts[0].equals("shuffle")) { +throw new IllegalArgumentException("Expected shuffle block id, got: " + blockIds[0]); + } + this.shuffleId = blockId0Parts[1]; + mapIdAndReduceIds = new int[2 * blockIds.length]; + for (int i = 0; i< blockIds.length; i++) { +String[] blockIdParts = blockIds[i].split("_"); +if (!blockIdParts[1].equals(shuffleId)) { + throw new IllegalArgumentException("Expected shuffleId=" + shuffleId + +", got:" + blockIds[i]); +} +mapIdAndReduceIds[2 * i] = Integer.parseInt(blockIdParts[2]); +mapIdAndReduceIds[2 * i + 1] = Integer.parseInt(blockIdParts[3]); + } +} + +@Override +public boolean hasNext() { + return index < mapIdAndReduceIds.length / 2; +} + +@Override +public ManagedBuffer next() { + String blockId = "shuffle_" + shuffleId + "_" + mapIdAndReduceIds[2 * index] + "_" + +mapIdAndReduceIds[2 * index + 1]; + final ManagedBuffer block = blockManager.getBlockData(appId, execId, blockId); --- End diff -- I was thinking about this. I added a new `getBlockData` in ExternalShuffleBlockResolver`. But not sure if it is appropriate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18231 **[Test build #77863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77863/testReport)** for PR 18231 at commit [`1e72eab`](https://github.com/apache/spark/commit/1e72eab5a7bc3ee5a48f8380dc4398874f1c36ac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18231: [WIP][SPARK-20994] Remove reduant characters in OpenBloc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18231 **[Test build #77862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77862/testReport)** for PR 18231 at commit [`5dd0e77`](https://github.com/apache/spark/commit/5dd0e5c302e915b601b8c4b17806219ebb3b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r121241362 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,51 @@ private ShuffleMetrics() { } } + private class ManagedBufferIterator implements Iterator { + +private int index = 0; +private String appId; --- End diff -- Yes, I will refine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/18199 Left one last comment. Otherwise LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18199: [SPARK-20979][SS]Add RateSource to generate value...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/18199#discussion_r121240973 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/RateSourceSuite.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming + +import java.util.concurrent.TimeUnit + +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.streaming.{StreamingQueryException, StreamTest} +import org.apache.spark.util.ManualClock + +class RateSourceSuite extends StreamTest { + + import testImplicits._ + + case class AdvanceRateManualClock(seconds: Long) extends AddData { +override def addData(query: Option[StreamExecution]): (Source, Offset) = { + assert(query.nonEmpty) + val rateSource = query.get.logicalPlan.collect { +case StreamingExecutionRelation(source, _) if source.isInstanceOf[RateStreamSource] => + source.asInstanceOf[RateStreamSource] + }.head + rateSource.clock.asInstanceOf[ManualClock].advance(TimeUnit.SECONDS.toMillis(seconds)) + (rateSource, rateSource.getOffset.get) +} + } + + test("basic") { +val input = spark.readStream + .format("rate") + .option("rowsPerSecond", "10") + .option("useManualClock", "true") + .load() +testStream(input)( + AdvanceRateManualClock(seconds = 1), + CheckLastBatch((0 until 10).map(v => new java.sql.Timestamp(v * 100L) -> v): _*), + StopStream, + StartStream(), + // Advance 2 seconds because creating a new RateSource will also create a new ManualClock + AdvanceRateManualClock(seconds = 2), + CheckLastBatch((10 until 20).map(v => new java.sql.Timestamp(v * 100L) -> v): _*) +) + } + + test("uniform distribution of event timestamps") { +val input = spark.readStream + .format("rate") + .option("rowsPerSecond", "1500") + .option("useManualClock", "true") + .load() + .as[(java.sql.Timestamp, Long)] + .map(v => (v._1.getTime, v._2)) +val expectedAnswer = (0 until 1500).map { v => + (math.round(v * (1000.0 / 1500)), v) +} +testStream(input)( + AdvanceRateManualClock(seconds = 1), + CheckLastBatch(expectedAnswer: _*) +) + } + + test("valueAtSecond") { +import RateStreamSource._ + +assert(valueAtSecond(seconds = 0, rowsPerSecond = 5, rampUpTimeSeconds = 2) === 0) --- End diff -- would be nice to add one test where `rampUpTimeSeconds = 0` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18252: [SPARK-17914][SQL] Fix parsing of timestamp strin...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18252#discussion_r121240900 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -399,13 +399,13 @@ object DateTimeUtils { digitsMilli += 1 } -if (!justTime && isInvalidDate(segments(0), segments(1), segments(2))) { - return None +while (digitsMilli > 6) { --- End diff -- add a comment indicating we are truncating the microsecond part and its lossy? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18255: [SPARK-21038][SQL] Reduce redundant generated init code ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18255 **[Test build #3788 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3788/testReport)** for PR 18255 at commit [`de9ce5b`](https://github.com/apache/spark/commit/de9ce5b19719a629ceccc48389e856a4c1959174). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18199 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77858/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18199 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18199 **[Test build #77858 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77858/testReport)** for PR 18199 at commit [`d5e7492`](https://github.com/apache/spark/commit/d5e749202776eb1de5f9f1e15a3e19e8e49cf586). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18150 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18150 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77856/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18256 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18150 **[Test build #77856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77856/testReport)** for PR 18150 at commit [`810a101`](https://github.com/apache/spark/commit/810a10131200f46902a7563422558c63967f5c09). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18256 Thanks! LGTM pending tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18256 **[Test build #77861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77861/testReport)** for PR 18256 at commit [`b464adb`](https://github.com/apache/spark/commit/b464adbf16145efc5b6aa1f3db70bf60912ea439). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18256: [SPARK-21042][SQL] Document Dataset.union is reso...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18256 [SPARK-21042][SQL] Document Dataset.union is resolution by position ## What changes were proposed in this pull request? Document Dataset.union is resolution by position, not by name, since this has been a confusing point for a lot of users. ## How was this patch tested? N/A - doc only change. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-21042 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18256.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18256 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17180: [SPARK-19839][Core]release longArray in BytesToBy...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/17180#discussion_r121238154 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -358,10 +358,20 @@ public long spill(long numBytes) throws IOException { return 0L; } +long released = 0L; +// longArray will not be used anymore, but may take significant amount of memory +// but never released until the task is complete. +if (longArray != null) { + released += longArray.memoryBlock().size(); + freeArray(longArray); + longArray = null; +} +if (released >= numBytes) { + return released; --- End diff -- @cloud-fan, I was wondering this, too: I think that this is okay because we're in a destructive iterator over the map, so we can just walk through the data pages without needing to worry about hash-style lookups. If this is the case, though, then we should just destroy the pointer array earlier (as I commented downthread on the PR). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r121237553 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,51 @@ private ShuffleMetrics() { } } + private class ManagedBufferIterator implements Iterator { + +private int index = 0; +private String appId; --- End diff -- `final`, same for all fields except `index`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r121237684 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,52 @@ private ShuffleMetrics() { } } + private class ManagedBufferIterator implements Iterator { + +private int index = 0; +private String appId; +private String execId; +private String shuffleId; +// An array containing mapId and reduceId pairs. +private int[][] mapIdAndReduceIds; + +ManagedBufferIterator(String appId, String execId, String[] blockIds) { + this.appId = appId; + this.execId = execId; + String[] blockId0Parts = blockIds[0].split("_"); + if (blockId0Parts.length < 4) { +throw new IllegalArgumentException("Unexpected block id format: " + blockIds[0]); + } else if (!blockId0Parts[0].equals("shuffle")) { --- End diff -- I think Sean means that since you're throwing in the previous block, `else` is redundant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r121237874 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,51 @@ private ShuffleMetrics() { } } + private class ManagedBufferIterator implements Iterator { + +private int index = 0; +private String appId; +private String execId; +private String shuffleId; +// An array containing mapId and reduceId pairs. +private int[] mapIdAndReduceIds; + +ManagedBufferIterator(String appId, String execId, String[] blockIds) { + this.appId = appId; + this.execId = execId; + String[] blockId0Parts = blockIds[0].split("_"); + if (blockId0Parts.length < 4) { +throw new IllegalArgumentException("Unexpected block id format: " + blockIds[0]); + } else if (!blockId0Parts[0].equals("shuffle")) { +throw new IllegalArgumentException("Expected shuffle block id, got: " + blockIds[0]); + } + this.shuffleId = blockId0Parts[1]; + mapIdAndReduceIds = new int[2 * blockIds.length]; + for (int i = 0; i< blockIds.length; i++) { +String[] blockIdParts = blockIds[i].split("_"); +if (!blockIdParts[1].equals(shuffleId)) { + throw new IllegalArgumentException("Expected shuffleId=" + shuffleId + +", got:" + blockIds[i]); +} +mapIdAndReduceIds[2 * i] = Integer.parseInt(blockIdParts[2]); +mapIdAndReduceIds[2 * i + 1] = Integer.parseInt(blockIdParts[3]); + } +} + +@Override +public boolean hasNext() { + return index < mapIdAndReduceIds.length / 2; +} + +@Override +public ManagedBuffer next() { + String blockId = "shuffle_" + shuffleId + "_" + mapIdAndReduceIds[2 * index] + "_" + +mapIdAndReduceIds[2 * index + 1]; + final ManagedBuffer block = blockManager.getBlockData(appId, execId, blockId); --- End diff -- Is it too big a change to make `getBlockData` expect the broken down shuffle/map/reduce id, since all it does is parse the string again into integers? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17180: [SPARK-19839][Core]release longArray in BytesToBy...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17180#discussion_r121237843 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -358,10 +358,20 @@ public long spill(long numBytes) throws IOException { return 0L; } +long released = 0L; +// longArray will not be used anymore, but may take significant amount of memory +// but never released until the task is complete. +if (longArray != null) { + released += longArray.memoryBlock().size(); + freeArray(longArray); + longArray = null; +} +if (released >= numBytes) { + return released; --- End diff -- IIUC we can only free this pointer array if we spill its related data pages. Is this short circuit safe to do? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18252: [SPARK-17914][SQL] Fix parsing of timestamp strings with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18252 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77857/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18252: [SPARK-17914][SQL] Fix parsing of timestamp strings with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18252 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18252: [SPARK-17914][SQL] Fix parsing of timestamp strings with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18252 **[Test build #77857 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77857/testReport)** for PR 18252 at commit [`4d057c9`](https://github.com/apache/spark/commit/4d057c99d3d99621811b6d593295533722803a3e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18248 **[Test build #77860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77860/testReport)** for PR 18248 at commit [`835b6f2`](https://github.com/apache/spark/commit/835b6f280eca4c63c3e6e19df7e33237c9cd505d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17723: [SPARK-20434][YARN][CORE] Move kerberos delegatio...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17723#discussion_r121235695 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/security/YARNConfigurableCredentialManager.scala --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn.security + +import java.util.ServiceLoader + +import scala.collection.JavaConverters._ + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.FileSystem +import org.apache.hadoop.security.Credentials + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.security.ConfigurableCredentialManager +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +/** + * This class loads credential providers registered under the YARN-specific + * [[ServiceCredentialProvider]] interface, as well as the builtin credential providers defined + * in [[ConfigurableCredentialManager]]. + */ +private[yarn] class YARNConfigurableCredentialManager( +sparkConf: SparkConf, +hadoopConf: Configuration, +fileSystems: Set[FileSystem]) extends Logging { + + private val configurableCredentialManager = +new ConfigurableCredentialManager(sparkConf, hadoopConf, fileSystems) + + // public for testing + val credentialProviders = getCredentialProviders + + def obtainCredentials( --- End diff -- nit: fits in one line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17723: [SPARK-20434][YARN][CORE] Move kerberos delegatio...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17723#discussion_r121235739 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/security/YARNConfigurableCredentialManager.scala --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn.security + +import java.util.ServiceLoader + +import scala.collection.JavaConverters._ + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.FileSystem +import org.apache.hadoop.security.Credentials + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.security.ConfigurableCredentialManager +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +/** + * This class loads credential providers registered under the YARN-specific + * [[ServiceCredentialProvider]] interface, as well as the builtin credential providers defined + * in [[ConfigurableCredentialManager]]. + */ +private[yarn] class YARNConfigurableCredentialManager( +sparkConf: SparkConf, +hadoopConf: Configuration, +fileSystems: Set[FileSystem]) extends Logging { + + private val configurableCredentialManager = +new ConfigurableCredentialManager(sparkConf, hadoopConf, fileSystems) + + // public for testing + val credentialProviders = getCredentialProviders + + def obtainCredentials( + hadoopConf: Configuration, + creds: Credentials): Long = { + +val superInterval = configurableCredentialManager.obtainCredentials( --- End diff -- nit: fits in one line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17723: [SPARK-20434][YARN][CORE] Move kerberos delegatio...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17723#discussion_r121235242 --- Diff: core/pom.xml --- @@ -357,6 +357,34 @@ org.apache.commons commons-crypto + +
[GitHub] spark pull request #17723: [SPARK-20434][YARN][CORE] Move kerberos delegatio...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17723#discussion_r121235790 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/security/YARNConfigurableCredentialManager.scala --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn.security + +import java.util.ServiceLoader + +import scala.collection.JavaConverters._ + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.FileSystem +import org.apache.hadoop.security.Credentials + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.security.ConfigurableCredentialManager +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +/** + * This class loads credential providers registered under the YARN-specific + * [[ServiceCredentialProvider]] interface, as well as the builtin credential providers defined + * in [[ConfigurableCredentialManager]]. + */ +private[yarn] class YARNConfigurableCredentialManager( +sparkConf: SparkConf, +hadoopConf: Configuration, +fileSystems: Set[FileSystem]) extends Logging { + + private val configurableCredentialManager = +new ConfigurableCredentialManager(sparkConf, hadoopConf, fileSystems) + + // public for testing + val credentialProviders = getCredentialProviders + + def obtainCredentials( + hadoopConf: Configuration, + creds: Credentials): Long = { + +val superInterval = configurableCredentialManager.obtainCredentials( + hadoopConf, + creds) + +credentialProviders.values.flatMap { provider => + if (provider.credentialsRequired(hadoopConf)) { +provider.obtainCredentials(hadoopConf, sparkConf, creds) + } else { +logDebug(s"Service ${provider.serviceName} does not require a token." + + s" Check your configuration to see if security is disabled or not.") +None + } +}.foldLeft(superInterval)(math.min) + } + + private def getCredentialProviders: Map[String, ServiceCredentialProvider] = { +val providers = loadCredentialProviders + +providers. + filter { p => configurableCredentialManager.isServiceEnabled(p.serviceName) } + .map { p => (p.serviceName, p) } + .toMap + } + + private def loadCredentialProviders: List[ServiceCredentialProvider] = { +ServiceLoader.load( --- End diff -- nit: the `load(...)` call fits in one line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17723: [SPARK-20434][YARN][CORE] Move kerberos delegatio...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17723#discussion_r121235520 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HiveCredentialProvider.scala --- @@ -0,0 +1,122 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.security + +import java.lang.reflect.UndeclaredThrowableException +import java.security.PrivilegedExceptionAction + +import scala.util.control.NonFatal + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier +import org.apache.hadoop.hive.conf.HiveConf +import org.apache.hadoop.hive.ql.metadata.Hive +import org.apache.hadoop.io.Text +import org.apache.hadoop.security.{Credentials, UserGroupInformation} +import org.apache.hadoop.security.token.Token + +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +private[security] class HiveCredentialProvider extends HadoopDelegationTokenProvider with Logging { + + override def serviceName: String = "hive" + + private val classNotFoundErrorStr = "You are attempting to use the HiveCredentialProvider," + +"but your Spark distribution is not built with Hive libraries." + + private def hiveConf(hadoopConf: Configuration): Configuration = { +try { + new HiveConf(hadoopConf, classOf[HiveConf]) +} catch { + case NonFatal(e) => +logDebug("Fail to create Hive Configuration", e) +hadoopConf + case e: NoClassDefFoundError => +logWarning(classNotFoundErrorStr) +hadoopConf +} + } + + override def credentialsRequired(hadoopConf: Configuration): Boolean = { +UserGroupInformation.isSecurityEnabled && + hiveConf(hadoopConf).getTrimmed("hive.metastore.uris", "").nonEmpty + } + + override def obtainCredentials( + hadoopConf: Configuration, + creds: Credentials): Option[Long] = { +try { + val conf = hiveConf(hadoopConf) + + val principalKey = "hive.metastore.kerberos.principal" + val principal = conf.getTrimmed(principalKey, "") + require(principal.nonEmpty, s"Hive principal $principalKey undefined") + val metastoreUri = conf.getTrimmed("hive.metastore.uris", "") + require(metastoreUri.nonEmpty, "Hive metastore uri undefined") + + val currentUser = UserGroupInformation.getCurrentUser() + logDebug(s"Getting Hive delegation token for ${currentUser.getUserName()} against " + +s"$principal at $metastoreUri") + + try { +doAsRealUser { + val hive = Hive.get(conf, classOf[HiveConf]) + val tokenStr = hive.getDelegationToken(currentUser.getUserName(), principal) + + val hive2Token = new Token[DelegationTokenIdentifier]() + hive2Token.decodeFromUrlString(tokenStr) + logInfo(s"Get Token from hive metastore: ${hive2Token.toString}") + creds.addToken(new Text("hive.server2.delegation.token"), hive2Token) +} + } catch { +case NonFatal(e) => --- End diff -- Move this catch block to the end of the method? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17723: [SPARK-20434][YARN][CORE] Move kerberos delegatio...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17723#discussion_r121235405 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HadoopFSCredentialProvider.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.security + +import scala.collection.JavaConverters._ +import scala.util.Try + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.FileSystem +import org.apache.hadoop.mapred.Master +import org.apache.hadoop.security.{Credentials, UserGroupInformation} +import org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier + +import org.apache.spark.SparkException +import org.apache.spark.internal.Logging + +private[deploy] class HadoopFSCredentialProvider(fileSystems: Set[FileSystem]) +extends HadoopDelegationTokenProvider with Logging { + // Token renewal interval, this value will be set in the first call, + // if None means no token renewer specified or no token can be renewed, + // so cannot get token renewal interval. + private var tokenRenewalInterval: Option[Long] = null + + override val serviceName: String = "hadoopfs" + + override def obtainCredentials( + hadoopConf: Configuration, + creds: Credentials): Option[Long] = { + +val newCreds = fetchDelegationTokens( + getTokenRenewer(hadoopConf), + fileSystems) + +// Get the token renewal interval if it is not set. It will only be called once. +if (tokenRenewalInterval == null) { + tokenRenewalInterval = getTokenRenewalInterval(hadoopConf, fileSystems) +} + +// Get the time of next renewal. +val nextRenewalDate = tokenRenewalInterval.flatMap { interval => + val nextRenewalDates = newCreds.getAllTokens.asScala + .filter(_.decodeIdentifier().isInstanceOf[AbstractDelegationTokenIdentifier]) +.map { token => + val identifier = token +.decodeIdentifier() +.asInstanceOf[AbstractDelegationTokenIdentifier] + identifier.getIssueDate + interval +} + if (nextRenewalDates.isEmpty) None else Some(nextRenewalDates.min) +} + +creds.addAll(newCreds) +nextRenewalDate + } + + def credentialsRequired(hadoopConf: Configuration): Boolean = { +UserGroupInformation.isSecurityEnabled + } + + private def getTokenRenewer(hadoopConf: Configuration): String = { +val tokenRenewer = Master.getMasterPrincipal(hadoopConf) +logDebug("Delegation token renewer is: " + tokenRenewer) + +if (tokenRenewer == null || tokenRenewer.length() == 0) { + val errorMessage = "Can't get Master Kerberos principal for use as renewer." + logError(errorMessage) + throw new SparkException(errorMessage) +} + +tokenRenewer + } + + private def fetchDelegationTokens( +renewer: String, +filesystems: Set[FileSystem]): Credentials = { +val creds = new Credentials() + +filesystems.foreach { fs => + logInfo("getting token for: " + fs) + fs.addDelegationTokens(renewer, creds) +} + +creds + } + + private def getTokenRenewalInterval( +hadoopConf: Configuration, --- End diff -- nit: indent extra level --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org