[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18258
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77866/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18258
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18258
  
**[Test build #77866 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77866/testReport)**
 for PR 18258 at commit 
[`e4cfe1c`](https://github.com/apache/spark/commit/e4cfe1ca358aee4842395904a1d29da4b9d534e4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16986: [SPARK-18891][SQL] Support for Scala Map collection type...

2017-06-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16986
  
ping @michalsenkyr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18009: [SPARK-18891][SQL] Support for specific Java List subtyp...

2017-06-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18009
  
ping @michalsenkyr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18258
  
Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18258
  
If there is no regression, I'd remove the flag.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18248: [SPARK-21031] [SQL] Add `alterTableStats` to store spark...

2017-06-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18248
  
LGTM except one minor comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18248: [SPARK-21031] [SQL] Add `alterTableStats` to stor...

2017-06-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18248#discussion_r121250769
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -614,9 +594,11 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from 
the old table definition,
--- End diff --

please update comments to include states


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18248: [SPARK-21031] [SQL] Add `alterTableStats` to store spark...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18248
  
**[Test build #77868 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77868/testReport)**
 for PR 18248 at commit 
[`38d03d7`](https://github.com/apache/spark/commit/38d03d7d5c492c4d9bb2e0029ea9595f0eefda0e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18248: [SPARK-21031] [SQL] Add `alterTableStats` to stor...

2017-06-09 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18248#discussion_r121250467
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -313,60 +313,70 @@ class StatisticsSuite extends 
StatisticsCollectionTestBase with TestHiveSingleto
 }
   }
 
-  test("alter table SET TBLPROPERTIES after analyze table") {
--- End diff --

Here I found the logic is the same for two cases except only the command 
(set and unset, respectively), so I extract the common logic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18257: [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange o...

2017-06-09 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/18257
  
The end result looks good to me. Thanks for fixing it!

Although, I'd prefer fixing the actual integer overflow handling in 
`RangeExec`'s codegen, too. Even though your fix will handle the problematic 
case for sure, it still feels a bit awkward to rely on an optimization for 
correctness.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18248
  
**[Test build #77867 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77867/testReport)**
 for PR 18248 at commit 
[`2649135`](https://github.com/apache/spark/commit/2649135dac1472b90534abf0e51ac2760360c134).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18258
  
Sure. Three times for each.

Track = F:

Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = F  12657 / 12700  6.6  
   150.9   1.0X
codegen = T hashmap = F, track = F6779 / 7582 12.4  
80.8   1.9X
codegen = T hashmap = T, track = F1505 / 1619 55.7  
17.9   8.4X

Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = F  10085 / 10597  8.3  
   120.2   1.0X
codegen = T hashmap = F, track = F5915 / 6069 14.2  
70.5   1.7X
codegen = T hashmap = T, track = F1610 / 1796 52.1  
19.2   6.3X

Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = F  10275 / 10584  8.2  
   122.5   1.0X
codegen = T hashmap = F, track = F6140 / 6557 13.7  
73.2   1.7X
codegen = T hashmap = T, track = F1301 / 1565 64.5  
15.5   7.9X

Track = T:

Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = T  10723 / 10865  7.8  
   127.8   1.0X
codegen = T hashmap = F, track = T6246 / 6432 13.4  
74.5   1.7X
codegen = T hashmap = T, track = T1465 / 1571 57.3  
17.5   7.3X

Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = T   9964 / 10348  8.4  
   118.8   1.0X
codegen = T hashmap = F, track = T6225 / 6375 13.5  
74.2   1.6X
codegen = T hashmap = T, track = T1361 / 1485 61.6  
16.2   7.3X

Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = T  10125 / 10674  8.3  
   120.7   1.0X
codegen = T hashmap = F, track = T6865 / 6980 12.2  
81.8   1.5X
codegen = T hashmap = T, track = T1491 / 1579 56.3  
17.8   6.8X




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18258
  
Can you run it a few more times to tell? Right now it's a difference of 7% 
almost 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18258
  
Is it significant? Seems to me that it's in the variance of different runs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18258
  
16.8 vs 15.8?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18258
  
I just ran the existing `AggregateBenchmark` with the new tracking config:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.27-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = F   10655 / 11043  7.9 
127.0   1.0X
codegen = T hashmap = F, track = F 6923 / 7133 12.1 
 82.5   1.5X
codegen = T hashmap = T, track = F 1325 / 1511 63.3 
 15.8   8.0X


Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.27-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = T  10809 / 11007  7.8  
   128.9   1.0X
codegen = T hashmap = F, track = T6581 / 6629 12.7  
78.4   1.6X
codegen = T hashmap = T, track = T1411 / 1552 59.4  
16.8   7.7X

Looks like no obvious perf degradation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18258
  
Sure. Will update later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18258
  
Can you test the perf degradation?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18258
  
The `enablePerfMetrics` parameter of `UnsafeFixedWidthAggregationMap` has 
this comment:

* @param enablePerfMetrics if true, performance metrics will be 
recorded (has minor perf impact)

It's true those metrics are simple counter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...

2017-06-09 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/18248
  
@cloud-fan Oh, right, let me try. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18258
  
Why would the tracking have perf impact? It's just a simple counter 
increase isn't it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18257: [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange o...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18257
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18258
  
**[Test build #77866 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77866/testReport)**
 for PR 18258 at commit 
[`e4cfe1c`](https://github.com/apache/spark/commit/e4cfe1ca358aee4842395904a1d29da4b9d534e4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18257: [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange o...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18257
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77865/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18257: [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange o...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18257
  
**[Test build #77865 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77865/testReport)**
 for PR 18257 at commit 
[`6ab9b6f`](https://github.com/apache/spark/commit/6ab9b6fb559ab11ac95078a1672672edf85efbac).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to a...

2017-06-09 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/18258

[SPARK-20953][SQL][WIP] Add hash map metrics to aggregate

## What changes were proposed in this pull request?

This adds the average hash map probe metrics to hash aggregate.

A new config is added: `spark.sql.execution.trackhashMapProbe`. Once the 
config is enabled, we record the metrics in `HashAggregateExec`.

`BytesToBytesMap` already has API to get the metrics, this PR adds an API 
to `UnsafeFixedWidthAggregationMap` to access it.

Preparing a test for this metrics seems tricky, because we don't know what 
collision keys are. For now, the test case generates random data large enough 
to have desired probe.

TODO: add hash map metrics to join.

## How was this patch tested?

Added test to SQLMetricsSuite.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-20953

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18258.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18258


commit e4cfe1ca358aee4842395904a1d29da4b9d534e4
Author: Liang-Chi Hsieh 
Date:   2017-06-04T00:15:24Z

Report average hashmap probe when the config is enabled.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18150
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18150
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77864/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18150
  
**[Test build #77864 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77864/testReport)**
 for PR 18150 at commit 
[`78bce79`](https://github.com/apache/spark/commit/78bce792c5b057120ba1e4e02ed00cd0c3487629).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...

2017-06-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18248
  
`alterTable` can read states from the old table and keep it: 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L564


basically `alterTable` should ignore the stats from input table metadata 
and keep the stats as it was in the old table metadata.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...

2017-06-09 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/18248
  
I mean how can we keep existing stats? Since we cannot tell whether it's 
from hive or spark, if we store it as spark's stats, then we come back to the 
problem. If we don't, then we lost stats if it's generated by spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...

2017-06-09 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/18248
  
@cloud-fan How can we tell in `alterTable` whether it's new stats or not?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...

2017-06-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18248
  
`alterTable` won't set new stats but can still keep existing states, can we 
implement this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...

2017-06-09 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/18248
  
@cloud-fan Actually that is my first version.
It also has problems: if we generate spark's stats first (through analyze 
command and `alterTableStats`), then do a regular alter table command, the 
stats will lost. Because we don't store stats info (which is 
from`CatalogStatistics`) through `alterTable`, and we don't know where the 
stats info come from (hive or spark), so we can't decide whether to store the 
stats or not in `alterTable`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18209: [SPARK-20992][Scheduler] Add support for Nomad as a sche...

2017-06-09 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/18209
  
Given that at Facebook we use our own in-house scheduler, I see why people 
would want to see their scheduler impls added right in Spark codebase as a 
first class citizen. Like @srowen said, this should not be part of Spark 
codebase. There would be many schedulers that would be developed in future and 
if we keep adding those to Spark, it will be lot of maintenance burden. Mesos 
and YARN have their place in the codebase given that they were there in early 
days of the project and are widely used in deployments (not saying that this is 
good or bad). This PR is making things worse for future by taking the easier 
route of adding yet another `if ..else` or `switch case` for a new scheduler 
impl. One can blame that this should have been cleanly right in the start for 
mesos / YARN but to give benefit of doubt to original authors, they didn't 
expect new schedulers to come up in future.

I would recommend to add interfaces in Spark code to abstract out things so 
that plugging in new schedulers would be as simple as adding a jar to the 
classpath. As a matter of fact, we are currently able to use Spark with our own 
scheduler by extending the existing interfaces. The only patching we have to do 
is 3-4 lines which is not super pain during upgrades. Seems like you need more 
than the current interfaces.

Having said this, I do appreciate the efforts put in this PR.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18244: [SPARK-20211][SQL] Fix the Precision and Scale of Decima...

2017-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18244
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18251: [SPARK-21033][SQL] fix the potential OOM in UnsafeExtern...

2017-06-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18251
  
also cc @JoshRosen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...

2017-06-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18248
  
I think the real issue is that, we mistakenly add statistics in `ALTER 
TABLE`. This is because `ExternalCatalog.alterTable` is heavily used when we 
wanna change something for a table. I think it would be better to introduce a 
`ExternalCatalog.alterTableStats` and use it in `ANALYZE TABLE`, so that `ALTER 
TABLE` won't add statistics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18231
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18231
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77862/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18231
  
**[Test build #77862 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77862/testReport)**
 for PR 18231 at commit 
[`5dd0e77`](https://github.com/apache/spark/commit/5dd0e5c302e915b601b8c4b17806219ebb3b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18257: [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange o...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18257
  
**[Test build #77865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77865/testReport)**
 for PR 18257 at commit 
[`6ab9b6f`](https://github.com/apache/spark/commit/6ab9b6fb559ab11ac95078a1672672edf85efbac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18257: [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalid...

2017-06-09 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/18257

[SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange optimizer

## What changes were proposed in this pull request?

This PR aims to add an optimizer remove invalid `Range` operators from the 
beginning. There are two cases of invalidity.

1. The `start` and `end` values are equal.
2. The sign of `step` does not match `start` and `end`'s increasing order.

With `wholeStageCodeGen` option, both cases generates unnecessary java 
codes. More worse, 
[SPARK-21041](https://issues.apache.org/jira/browse/SPARK-21041) reports that 
the second case returns wrong result.

**BEFORE**
```
scala> spark.range(0,0,1).explain
== Physical Plan ==
*Range (0, 0, step=1, splits=8)
```

**AFTER**
```
scala> spark.range(0,0,1).explain
== Physical Plan ==
LocalTableScan , [id#0L]
```

## How was this patch tested?

Pass the Jenkins with newly added test cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-21041

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18257.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18257


commit 6ab9b6fb559ab11ac95078a1672672edf85efbac
Author: Dongjoon Hyun 
Date:   2017-06-10T01:45:23Z

[SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange optimizer

**BEFORE**
```
scala> spark.range(0,0,1).explain
== Physical Plan ==
*Range (0, 0, step=1, splits=8)
```

**AFTER**
```
scala> spark.range(0,0,1).explain
== Physical Plan ==
LocalTableScan , [id#0L]
```




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18231
  
**[Test build #77863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77863/testReport)**
 for PR 18231 at commit 
[`1e72eab`](https://github.com/apache/spark/commit/1e72eab5a7bc3ee5a48f8380dc4398874f1c36ac).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18231
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77863/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18231
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18209: [SPARK-20992][Scheduler] Add support for Nomad as a sche...

2017-06-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18209
  
The next one to add is probably Kubernetes. Even the Spark on Kubernetes is 
going through this cycle of maintaining a separate project for it first.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18228: [SPARK-21007][SQL]Add SQL function - RIGHT && LEFT

2017-06-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18228
  
Are these ANSI SQL functions? If it is just some esoteric MySQL function I 
don't think we should add them.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18236: [SPARK-21015] Check field name is not null and empty in ...

2017-06-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18236
  
Why do we want this check? If the user passes in null value, it is ok if it 
is not found, isn't it?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18252: [SPARK-17914][SQL] Fix parsing of timestamp strin...

2017-06-09 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18252#discussion_r121246583
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -32,7 +32,7 @@ import org.apache.spark.unsafe.types.UTF8String
  * Helper functions for converting between internal and external date and 
time representations.
  * Dates are exposed externally as java.sql.Date and are represented 
internally as the number of
  * dates since the Unix epoch (1970-01-01). Timestamps are exposed 
externally as java.sql.Timestamp
- * and are stored internally as longs, which are capable of storing 
timestamps with 100 nanosecond
+ * and are stored internally as longs, which are capable of storing 
timestamps with microsecond
--- End diff --

100 ns is different from micro, isn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18256: [SPARK-21042][SQL] Document Dataset.union is reso...

2017-06-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18256


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...

2017-06-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18256
  
Merging in master/branch-2.2. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18256
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18256
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77861/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18256
  
**[Test build #77861 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77861/testReport)**
 for PR 18256 at commit 
[`b464adb`](https://github.com/apache/spark/commit/b464adbf16145efc5b6aa1f3db70bf60912ea439).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18150
  
**[Test build #77864 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77864/testReport)**
 for PR 18150 at commit 
[`78bce79`](https://github.com/apache/spark/commit/78bce792c5b057120ba1e4e02ed00cd0c3487629).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18150
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77859/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18150
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18150
  
**[Test build #77859 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77859/testReport)**
 for PR 18150 at commit 
[`ec89ac1`](https://github.com/apache/spark/commit/ec89ac1d53beb5a598f12d3a0bd4c6774cdd1976).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18248
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18248
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77860/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18248
  
**[Test build #77860 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77860/testReport)**
 for PR 18248 at commit 
[`835b6f2`](https://github.com/apache/spark/commit/835b6f280eca4c63c3e6e19df7e33237c9cd505d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-06-09 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r121244808
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2651,4 +2652,28 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 val e = intercept[AnalysisException](sql("SELECT nvl(1, 2, 3)"))
 assert(e.message.contains("Invalid number of arguments"))
   }
+
+  test("TRIM function") {
+val ae1 = intercept[ParseException]{
+  sql("select LTRIM(BOTH 'S' FROM 'SS abc S')").head
+}
+assert(ae1.getMessage contains
+  "The specified function LTRIM doesn't support with option BOTH")
+val ae2 = intercept[ParseException]{
+  sql("select RTRIM(TRAILING 'S' FROM 'SS abc S')").head
+}
+assert(ae2.getMessage contains
+  "The specified function RTRIM doesn't support with option TRAILING")
+val ae3 = intercept[ParseException]{
+  sql("select TRIM(OVER 'S' AND 'SS abc S')").head
+}
+assert(ae3.getMessage contains
+  "Literals of type 'OVER' are currently not supported")
+checkAnswer(
+  sql("SELECT TRIM(BOTH '@$%&( )abc' FROM '@ $ % & ()abc ' )"), 
Row("") :: Nil)
+checkAnswer(
+  sql("SELECT TRIM(LEADING 'c []' FROM '[ bcc ')"), Row("bcc ") :: 
Nil)
+checkAnswer(
+  sql("SELECT TRIM(TRAILING 'c&^,.' FROM 'bc...,,,&&&ccc' )"), 
Row("b") :: Nil)
--- End diff --

OK, can you also update the description about this syntax? We are 
supporting two equivalent syntax here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18238: [SPARK-21016][core]Improve code fault tolerance for conv...

2017-06-09 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/18238
  
If a string has a leading or trailing space, it's expected it cannot be 
converted to a number.
Java's string type also follows this convention.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...

2017-06-09 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18231
  
@vanzin 
Thanks again for comments :) I refined accordingly, please take another 
look when you have time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18231: [SPARK-20994] Remove reduant characters in OpenBl...

2017-06-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18231#discussion_r121242495
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
 ---
@@ -209,4 +190,51 @@ private ShuffleMetrics() {
 }
   }
 
+  private class ManagedBufferIterator implements Iterator {
+
+private int index = 0;
+private String appId;
+private String execId;
+private String shuffleId;
+// An array containing mapId and reduceId pairs.
+private int[] mapIdAndReduceIds;
+
+ManagedBufferIterator(String appId, String execId, String[] blockIds) {
+  this.appId = appId;
+  this.execId = execId;
+  String[] blockId0Parts = blockIds[0].split("_");
+  if (blockId0Parts.length < 4) {
+throw new IllegalArgumentException("Unexpected block id format: " 
+ blockIds[0]);
+  } else if (!blockId0Parts[0].equals("shuffle")) {
+throw new IllegalArgumentException("Expected shuffle block id, 
got: " + blockIds[0]);
+  }
+  this.shuffleId = blockId0Parts[1];
+  mapIdAndReduceIds = new int[2 * blockIds.length];
+  for (int i = 0; i< blockIds.length; i++) {
+String[] blockIdParts = blockIds[i].split("_");
+if (!blockIdParts[1].equals(shuffleId)) {
+  throw new IllegalArgumentException("Expected shuffleId=" + 
shuffleId +
+", got:" + blockIds[i]);
+}
+mapIdAndReduceIds[2 * i] = Integer.parseInt(blockIdParts[2]);
+mapIdAndReduceIds[2 * i + 1] = Integer.parseInt(blockIdParts[3]);
+  }
+}
+
+@Override
+public boolean hasNext() {
+  return index < mapIdAndReduceIds.length / 2;
+}
+
+@Override
+public ManagedBuffer next() {
+  String blockId = "shuffle_" + shuffleId + "_" + mapIdAndReduceIds[2 
* index] + "_" +
+mapIdAndReduceIds[2 * index + 1];
+  final ManagedBuffer block = blockManager.getBlockData(appId, execId, 
blockId);
--- End diff --

I was thinking about this. I added a new `getBlockData` in 
ExternalShuffleBlockResolver`. But not sure if it is appropriate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18231
  
**[Test build #77863 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77863/testReport)**
 for PR 18231 at commit 
[`1e72eab`](https://github.com/apache/spark/commit/1e72eab5a7bc3ee5a48f8380dc4398874f1c36ac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18231: [WIP][SPARK-20994] Remove reduant characters in OpenBloc...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18231
  
**[Test build #77862 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77862/testReport)**
 for PR 18231 at commit 
[`5dd0e77`](https://github.com/apache/spark/commit/5dd0e5c302e915b601b8c4b17806219ebb3b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...

2017-06-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18231#discussion_r121241362
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
 ---
@@ -209,4 +190,51 @@ private ShuffleMetrics() {
 }
   }
 
+  private class ManagedBufferIterator implements Iterator {
+
+private int index = 0;
+private String appId;
--- End diff --

Yes, I will refine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-09 Thread brkyvz
Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/18199
  
Left one last comment. Otherwise LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18199: [SPARK-20979][SS]Add RateSource to generate value...

2017-06-09 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/18199#discussion_r121240973
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/RateSourceSuite.scala
 ---
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import java.util.concurrent.TimeUnit
+
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.streaming.{StreamingQueryException, StreamTest}
+import org.apache.spark.util.ManualClock
+
+class RateSourceSuite extends StreamTest {
+
+  import testImplicits._
+
+  case class AdvanceRateManualClock(seconds: Long) extends AddData {
+override def addData(query: Option[StreamExecution]): (Source, Offset) 
= {
+  assert(query.nonEmpty)
+  val rateSource = query.get.logicalPlan.collect {
+case StreamingExecutionRelation(source, _) if 
source.isInstanceOf[RateStreamSource] =>
+  source.asInstanceOf[RateStreamSource]
+  }.head
+  
rateSource.clock.asInstanceOf[ManualClock].advance(TimeUnit.SECONDS.toMillis(seconds))
+  (rateSource, rateSource.getOffset.get)
+}
+  }
+
+  test("basic") {
+val input = spark.readStream
+  .format("rate")
+  .option("rowsPerSecond", "10")
+  .option("useManualClock", "true")
+  .load()
+testStream(input)(
+  AdvanceRateManualClock(seconds = 1),
+  CheckLastBatch((0 until 10).map(v => new java.sql.Timestamp(v * 
100L) -> v): _*),
+  StopStream,
+  StartStream(),
+  // Advance 2 seconds because creating a new RateSource will also 
create a new ManualClock
+  AdvanceRateManualClock(seconds = 2),
+  CheckLastBatch((10 until 20).map(v => new java.sql.Timestamp(v * 
100L) -> v): _*)
+)
+  }
+
+  test("uniform distribution of event timestamps") {
+val input = spark.readStream
+  .format("rate")
+  .option("rowsPerSecond", "1500")
+  .option("useManualClock", "true")
+  .load()
+  .as[(java.sql.Timestamp, Long)]
+  .map(v => (v._1.getTime, v._2))
+val expectedAnswer = (0 until 1500).map { v =>
+  (math.round(v * (1000.0 / 1500)), v)
+}
+testStream(input)(
+  AdvanceRateManualClock(seconds = 1),
+  CheckLastBatch(expectedAnswer: _*)
+)
+  }
+
+  test("valueAtSecond") {
+import RateStreamSource._
+
+assert(valueAtSecond(seconds = 0, rowsPerSecond = 5, rampUpTimeSeconds 
= 2) === 0)
--- End diff --

would be nice to add one test where `rampUpTimeSeconds = 0`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18252: [SPARK-17914][SQL] Fix parsing of timestamp strin...

2017-06-09 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18252#discussion_r121240900
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -399,13 +399,13 @@ object DateTimeUtils {
   digitsMilli += 1
 }
 
-if (!justTime && isInvalidDate(segments(0), segments(1), segments(2))) 
{
-  return None
+while (digitsMilli > 6) {
--- End diff --

add a comment indicating we are truncating the microsecond part and its 
lossy?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18255: [SPARK-21038][SQL] Reduce redundant generated init code ...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18255
  
**[Test build #3788 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3788/testReport)**
 for PR 18255 at commit 
[`de9ce5b`](https://github.com/apache/spark/commit/de9ce5b19719a629ceccc48389e856a4c1959174).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18199
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77858/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18199
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18199
  
**[Test build #77858 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77858/testReport)**
 for PR 18199 at commit 
[`d5e7492`](https://github.com/apache/spark/commit/d5e749202776eb1de5f9f1e15a3e19e8e49cf586).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18150
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18150
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77856/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...

2017-06-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18256
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18150
  
**[Test build #77856 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77856/testReport)**
 for PR 18150 at commit 
[`810a101`](https://github.com/apache/spark/commit/810a10131200f46902a7563422558c63967f5c09).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...

2017-06-09 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/18256
  
Thanks!  LGTM pending tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18256
  
**[Test build #77861 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77861/testReport)**
 for PR 18256 at commit 
[`b464adb`](https://github.com/apache/spark/commit/b464adbf16145efc5b6aa1f3db70bf60912ea439).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18256: [SPARK-21042][SQL] Document Dataset.union is reso...

2017-06-09 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/18256

[SPARK-21042][SQL] Document Dataset.union is resolution by position

## What changes were proposed in this pull request?
Document Dataset.union is resolution by position, not by name, since this 
has been a confusing point for a lot of users.

## How was this patch tested?
N/A - doc only change.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-21042

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18256.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18256






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17180: [SPARK-19839][Core]release longArray in BytesToBy...

2017-06-09 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17180#discussion_r121238154
  
--- Diff: 
core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
@@ -358,10 +358,20 @@ public long spill(long numBytes) throws IOException {
   return 0L;
 }
 
+long released = 0L;
+// longArray will not be used anymore, but may take significant 
amount of memory
+// but never released until the task is complete.
+if (longArray != null) {
+  released += longArray.memoryBlock().size();
+  freeArray(longArray);
+  longArray = null;
+}
+if (released >= numBytes) {
+  return released;
--- End diff --

@cloud-fan, I was wondering this, too: I think that this is okay because 
we're in a destructive iterator over the map, so we can just walk through the 
data pages without needing to worry about hash-style lookups. If this is the 
case, though, then we should just destroy the pointer array earlier (as I 
commented downthread on the PR). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...

2017-06-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18231#discussion_r121237553
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
 ---
@@ -209,4 +190,51 @@ private ShuffleMetrics() {
 }
   }
 
+  private class ManagedBufferIterator implements Iterator {
+
+private int index = 0;
+private String appId;
--- End diff --

`final`, same for all fields except `index`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...

2017-06-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18231#discussion_r121237684
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
 ---
@@ -209,4 +190,52 @@ private ShuffleMetrics() {
 }
   }
 
+  private class ManagedBufferIterator implements Iterator {
+
+private int index = 0;
+private String appId;
+private String execId;
+private String shuffleId;
+// An array containing mapId and reduceId pairs.
+private int[][] mapIdAndReduceIds;
+
+ManagedBufferIterator(String appId, String execId, String[] blockIds) {
+  this.appId = appId;
+  this.execId = execId;
+  String[] blockId0Parts = blockIds[0].split("_");
+  if (blockId0Parts.length < 4) {
+throw new IllegalArgumentException("Unexpected block id format: " 
+ blockIds[0]);
+  } else if (!blockId0Parts[0].equals("shuffle")) {
--- End diff --

I think Sean means that since you're throwing in the previous block, `else` 
is redundant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...

2017-06-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18231#discussion_r121237874
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
 ---
@@ -209,4 +190,51 @@ private ShuffleMetrics() {
 }
   }
 
+  private class ManagedBufferIterator implements Iterator {
+
+private int index = 0;
+private String appId;
+private String execId;
+private String shuffleId;
+// An array containing mapId and reduceId pairs.
+private int[] mapIdAndReduceIds;
+
+ManagedBufferIterator(String appId, String execId, String[] blockIds) {
+  this.appId = appId;
+  this.execId = execId;
+  String[] blockId0Parts = blockIds[0].split("_");
+  if (blockId0Parts.length < 4) {
+throw new IllegalArgumentException("Unexpected block id format: " 
+ blockIds[0]);
+  } else if (!blockId0Parts[0].equals("shuffle")) {
+throw new IllegalArgumentException("Expected shuffle block id, 
got: " + blockIds[0]);
+  }
+  this.shuffleId = blockId0Parts[1];
+  mapIdAndReduceIds = new int[2 * blockIds.length];
+  for (int i = 0; i< blockIds.length; i++) {
+String[] blockIdParts = blockIds[i].split("_");
+if (!blockIdParts[1].equals(shuffleId)) {
+  throw new IllegalArgumentException("Expected shuffleId=" + 
shuffleId +
+", got:" + blockIds[i]);
+}
+mapIdAndReduceIds[2 * i] = Integer.parseInt(blockIdParts[2]);
+mapIdAndReduceIds[2 * i + 1] = Integer.parseInt(blockIdParts[3]);
+  }
+}
+
+@Override
+public boolean hasNext() {
+  return index < mapIdAndReduceIds.length / 2;
+}
+
+@Override
+public ManagedBuffer next() {
+  String blockId = "shuffle_" + shuffleId + "_" + mapIdAndReduceIds[2 
* index] + "_" +
+mapIdAndReduceIds[2 * index + 1];
+  final ManagedBuffer block = blockManager.getBlockData(appId, execId, 
blockId);
--- End diff --

Is it too big a change to make `getBlockData` expect the broken down 
shuffle/map/reduce id, since all it does is parse the string again into 
integers?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17180: [SPARK-19839][Core]release longArray in BytesToBy...

2017-06-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17180#discussion_r121237843
  
--- Diff: 
core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
@@ -358,10 +358,20 @@ public long spill(long numBytes) throws IOException {
   return 0L;
 }
 
+long released = 0L;
+// longArray will not be used anymore, but may take significant 
amount of memory
+// but never released until the task is complete.
+if (longArray != null) {
+  released += longArray.memoryBlock().size();
+  freeArray(longArray);
+  longArray = null;
+}
+if (released >= numBytes) {
+  return released;
--- End diff --

IIUC we can only free this pointer array if we spill its related data 
pages. Is this short circuit safe to do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18252: [SPARK-17914][SQL] Fix parsing of timestamp strings with...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18252
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77857/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18252: [SPARK-17914][SQL] Fix parsing of timestamp strings with...

2017-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18252
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18252: [SPARK-17914][SQL] Fix parsing of timestamp strings with...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18252
  
**[Test build #77857 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77857/testReport)**
 for PR 18252 at commit 
[`4d057c9`](https://github.com/apache/spark/commit/4d057c99d3d99621811b6d593295533722803a3e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18248: [SPARK-21031] [SQL] Clearly separate spark's stats and h...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18248
  
**[Test build #77860 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77860/testReport)**
 for PR 18248 at commit 
[`835b6f2`](https://github.com/apache/spark/commit/835b6f280eca4c63c3e6e19df7e33237c9cd505d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17723: [SPARK-20434][YARN][CORE] Move kerberos delegatio...

2017-06-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17723#discussion_r121235695
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/security/YARNConfigurableCredentialManager.scala
 ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn.security
+
+import java.util.ServiceLoader
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.FileSystem
+import org.apache.hadoop.security.Credentials
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.security.ConfigurableCredentialManager
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * This class loads credential providers registered under the YARN-specific
+ * [[ServiceCredentialProvider]] interface, as well as the builtin 
credential providers defined
+ * in [[ConfigurableCredentialManager]].
+ */
+private[yarn] class YARNConfigurableCredentialManager(
+sparkConf: SparkConf,
+hadoopConf: Configuration,
+fileSystems: Set[FileSystem]) extends Logging {
+
+  private val configurableCredentialManager =
+new ConfigurableCredentialManager(sparkConf, hadoopConf, fileSystems)
+
+  // public for testing
+  val credentialProviders = getCredentialProviders
+
+  def obtainCredentials(
--- End diff --

nit: fits in one line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17723: [SPARK-20434][YARN][CORE] Move kerberos delegatio...

2017-06-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17723#discussion_r121235739
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/security/YARNConfigurableCredentialManager.scala
 ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn.security
+
+import java.util.ServiceLoader
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.FileSystem
+import org.apache.hadoop.security.Credentials
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.security.ConfigurableCredentialManager
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * This class loads credential providers registered under the YARN-specific
+ * [[ServiceCredentialProvider]] interface, as well as the builtin 
credential providers defined
+ * in [[ConfigurableCredentialManager]].
+ */
+private[yarn] class YARNConfigurableCredentialManager(
+sparkConf: SparkConf,
+hadoopConf: Configuration,
+fileSystems: Set[FileSystem]) extends Logging {
+
+  private val configurableCredentialManager =
+new ConfigurableCredentialManager(sparkConf, hadoopConf, fileSystems)
+
+  // public for testing
+  val credentialProviders = getCredentialProviders
+
+  def obtainCredentials(
+  hadoopConf: Configuration,
+  creds: Credentials): Long = {
+
+val superInterval = configurableCredentialManager.obtainCredentials(
--- End diff --

nit: fits in one line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17723: [SPARK-20434][YARN][CORE] Move kerberos delegatio...

2017-06-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17723#discussion_r121235242
  
--- Diff: core/pom.xml ---
@@ -357,6 +357,34 @@
   org.apache.commons
   commons-crypto
 
+
+

[GitHub] spark pull request #17723: [SPARK-20434][YARN][CORE] Move kerberos delegatio...

2017-06-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17723#discussion_r121235790
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/security/YARNConfigurableCredentialManager.scala
 ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn.security
+
+import java.util.ServiceLoader
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.FileSystem
+import org.apache.hadoop.security.Credentials
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.security.ConfigurableCredentialManager
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * This class loads credential providers registered under the YARN-specific
+ * [[ServiceCredentialProvider]] interface, as well as the builtin 
credential providers defined
+ * in [[ConfigurableCredentialManager]].
+ */
+private[yarn] class YARNConfigurableCredentialManager(
+sparkConf: SparkConf,
+hadoopConf: Configuration,
+fileSystems: Set[FileSystem]) extends Logging {
+
+  private val configurableCredentialManager =
+new ConfigurableCredentialManager(sparkConf, hadoopConf, fileSystems)
+
+  // public for testing
+  val credentialProviders = getCredentialProviders
+
+  def obtainCredentials(
+  hadoopConf: Configuration,
+  creds: Credentials): Long = {
+
+val superInterval = configurableCredentialManager.obtainCredentials(
+  hadoopConf,
+  creds)
+
+credentialProviders.values.flatMap { provider =>
+  if (provider.credentialsRequired(hadoopConf)) {
+provider.obtainCredentials(hadoopConf, sparkConf, creds)
+  } else {
+logDebug(s"Service ${provider.serviceName} does not require a 
token." +
+  s" Check your configuration to see if security is disabled or 
not.")
+None
+  }
+}.foldLeft(superInterval)(math.min)
+  }
+
+  private def getCredentialProviders: Map[String, 
ServiceCredentialProvider] = {
+val providers = loadCredentialProviders
+
+providers.
+  filter { p => 
configurableCredentialManager.isServiceEnabled(p.serviceName) }
+  .map { p => (p.serviceName, p) }
+  .toMap
+  }
+
+  private def loadCredentialProviders: List[ServiceCredentialProvider] = {
+ServiceLoader.load(
--- End diff --

nit: the `load(...)` call fits in one line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17723: [SPARK-20434][YARN][CORE] Move kerberos delegatio...

2017-06-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17723#discussion_r121235520
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HiveCredentialProvider.scala
 ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.security
+
+import java.lang.reflect.UndeclaredThrowableException
+import java.security.PrivilegedExceptionAction
+
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.conf.Configuration
+import 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.metadata.Hive
+import org.apache.hadoop.io.Text
+import org.apache.hadoop.security.{Credentials, UserGroupInformation}
+import org.apache.hadoop.security.token.Token
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+private[security] class HiveCredentialProvider extends 
HadoopDelegationTokenProvider with Logging {
+
+  override def serviceName: String = "hive"
+
+  private val classNotFoundErrorStr = "You are attempting to use the 
HiveCredentialProvider," +
+"but your Spark distribution is not built with Hive libraries."
+
+  private def hiveConf(hadoopConf: Configuration): Configuration = {
+try {
+  new HiveConf(hadoopConf, classOf[HiveConf])
+} catch {
+  case NonFatal(e) =>
+logDebug("Fail to create Hive Configuration", e)
+hadoopConf
+  case e: NoClassDefFoundError =>
+logWarning(classNotFoundErrorStr)
+hadoopConf
+}
+  }
+
+  override def credentialsRequired(hadoopConf: Configuration): Boolean = {
+UserGroupInformation.isSecurityEnabled &&
+  hiveConf(hadoopConf).getTrimmed("hive.metastore.uris", "").nonEmpty
+  }
+
+  override def obtainCredentials(
+  hadoopConf: Configuration,
+  creds: Credentials): Option[Long] = {
+try {
+  val conf = hiveConf(hadoopConf)
+
+  val principalKey = "hive.metastore.kerberos.principal"
+  val principal = conf.getTrimmed(principalKey, "")
+  require(principal.nonEmpty, s"Hive principal $principalKey 
undefined")
+  val metastoreUri = conf.getTrimmed("hive.metastore.uris", "")
+  require(metastoreUri.nonEmpty, "Hive metastore uri undefined")
+
+  val currentUser = UserGroupInformation.getCurrentUser()
+  logDebug(s"Getting Hive delegation token for 
${currentUser.getUserName()} against " +
+s"$principal at $metastoreUri")
+
+  try {
+doAsRealUser {
+  val hive = Hive.get(conf, classOf[HiveConf])
+  val tokenStr = 
hive.getDelegationToken(currentUser.getUserName(), principal)
+
+  val hive2Token = new Token[DelegationTokenIdentifier]()
+  hive2Token.decodeFromUrlString(tokenStr)
+  logInfo(s"Get Token from hive metastore: ${hive2Token.toString}")
+  creds.addToken(new Text("hive.server2.delegation.token"), 
hive2Token)
+}
+  } catch {
+case NonFatal(e) =>
--- End diff --

Move this catch block to the end of the method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17723: [SPARK-20434][YARN][CORE] Move kerberos delegatio...

2017-06-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17723#discussion_r121235405
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopFSCredentialProvider.scala
 ---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.security
+
+import scala.collection.JavaConverters._
+import scala.util.Try
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.FileSystem
+import org.apache.hadoop.mapred.Master
+import org.apache.hadoop.security.{Credentials, UserGroupInformation}
+import 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier
+
+import org.apache.spark.SparkException
+import org.apache.spark.internal.Logging
+
+private[deploy] class HadoopFSCredentialProvider(fileSystems: 
Set[FileSystem])
+extends HadoopDelegationTokenProvider with Logging {
+  // Token renewal interval, this value will be set in the first call,
+  // if None means no token renewer specified or no token can be renewed,
+  // so cannot get token renewal interval.
+  private var tokenRenewalInterval: Option[Long] = null
+
+  override val serviceName: String = "hadoopfs"
+
+  override def obtainCredentials(
+  hadoopConf: Configuration,
+  creds: Credentials): Option[Long] = {
+
+val newCreds = fetchDelegationTokens(
+  getTokenRenewer(hadoopConf),
+  fileSystems)
+
+// Get the token renewal interval if it is not set. It will only be 
called once.
+if (tokenRenewalInterval == null) {
+  tokenRenewalInterval = getTokenRenewalInterval(hadoopConf, 
fileSystems)
+}
+
+// Get the time of next renewal.
+val nextRenewalDate = tokenRenewalInterval.flatMap { interval =>
+  val nextRenewalDates = newCreds.getAllTokens.asScala
+
.filter(_.decodeIdentifier().isInstanceOf[AbstractDelegationTokenIdentifier])
+.map { token =>
+  val identifier = token
+.decodeIdentifier()
+.asInstanceOf[AbstractDelegationTokenIdentifier]
+  identifier.getIssueDate + interval
+}
+  if (nextRenewalDates.isEmpty) None else Some(nextRenewalDates.min)
+}
+
+creds.addAll(newCreds)
+nextRenewalDate
+  }
+
+  def credentialsRequired(hadoopConf: Configuration): Boolean = {
+UserGroupInformation.isSecurityEnabled
+  }
+
+  private def getTokenRenewer(hadoopConf: Configuration): String = {
+val tokenRenewer = Master.getMasterPrincipal(hadoopConf)
+logDebug("Delegation token renewer is: " + tokenRenewer)
+
+if (tokenRenewer == null || tokenRenewer.length() == 0) {
+  val errorMessage = "Can't get Master Kerberos principal for use as 
renewer."
+  logError(errorMessage)
+  throw new SparkException(errorMessage)
+}
+
+tokenRenewer
+  }
+
+  private def fetchDelegationTokens(
+renewer: String,
+filesystems: Set[FileSystem]): Credentials = {
+val creds = new Credentials()
+
+filesystems.foreach { fs =>
+  logInfo("getting token for: " + fs)
+  fs.addDelegationTokens(renewer, creds)
+}
+
+creds
+  }
+
+  private def getTokenRenewalInterval(
+hadoopConf: Configuration,
--- End diff --

nit: indent extra level


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >