date:20160124

[GitHub] spark pull request: [SPARK-10193] [core] [wip] Eliminate Skipped S...

2016-01-24 Thread mridulm

Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/8427#issuecomment-174271331
  
Just a note about MapOutputTracker - it is fairly trivial to make it use 
bare minimum amount of memory even if it does not get cleaned up for 'old' 
stages : using a disk backed map (mapdb for example) via LRU.
Which keeps utmost current and previous map output in memory and everything 
else on disk (until there is a node failure requiring recomputation - which 
brings portions of this back into memory).

This is what we used to do for production jobs in some earlier projects.


I am not sure what the impact of the current proposal is from memory 
overhead pov  - map output was (obviously) expensive enough to attempt this and 
the affect was not pervasive/diffuse across the codebase for shuffle output 
tracking.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10024#issuecomment-174271874
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10024#issuecomment-174271875
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49948/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10024#issuecomment-174271872
  
**[Test build #49948 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49948/consoleFull)**
 for PR 10024 at commit 
[`b561641`](https://github.com/apache/spark/commit/b5616414af8fff78f96b320cfbe3bf368d6f756c).
 * This patch **fails Scala style tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10024#issuecomment-174274054
  
**[Test build #49951 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49951/consoleFull)**
 for PR 10024 at commit 
[`16ca87b`](https://github.com/apache/spark/commit/16ca87bfc66e1d8ddfc1067a6bf97b6875343d61).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10024#issuecomment-174274237
  
**[Test build #49951 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49951/consoleFull)**
 for PR 10024 at commit 
[`16ca87b`](https://github.com/apache/spark/commit/16ca87bfc66e1d8ddfc1067a6bf97b6875343d61).
 * This patch **fails R style tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10024#issuecomment-174274240
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49951/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10024#issuecomment-174274239
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12973][YARN] Support to set priority wh...

2016-01-24 Thread debugger87

GitHub user debugger87 opened a pull request:

https://github.com/apache/spark/pull/10888

[SPARK-12973][YARN] Support to set priority when submit spark application 
to YARN



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/debugger87/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10888.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10888


commit c41e6b20162226762a6c8ee5ce10888c33964d90
Author: debugger87 
Date:   2016-01-24T09:08:35Z

[SPARK-12973][YARN] Support to set priority when submit spark application 
to YARN




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12790][CORE] Remove HistoryServer old m...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10860#issuecomment-174271820
  
**[Test build #49949 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49949/consoleFull)**
 for PR 10860 at commit 
[`e2b785a`](https://github.com/apache/spark/commit/e2b785a44924632c7dc53fe323ce9f5e6f3edce4).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * ``
  * ``
  * ``
  * ``
  * ``
  * ``


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12790][CORE] Remove HistoryServer old m...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10860#issuecomment-174271823
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49949/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12790][CORE] Remove HistoryServer old m...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10860#issuecomment-174271821
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12790][CORE] Remove HistoryServer old m...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10860#issuecomment-174271800
  
**[Test build #49949 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49949/consoleFull)**
 for PR 10860 at commit 
[`e2b785a`](https://github.com/apache/spark/commit/e2b785a44924632c7dc53fe323ce9f5e6f3edce4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10024#issuecomment-174271789
  
**[Test build #49948 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49948/consoleFull)**
 for PR 10024 at commit 
[`b561641`](https://github.com/apache/spark/commit/b5616414af8fff78f96b320cfbe3bf368d6f756c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12790][CORE] Remove HistoryServer old m...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10860#issuecomment-174273829
  
**[Test build #49950 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49950/consoleFull)**
 for PR 10860 at commit 
[`11fbe11`](https://github.com/apache/spark/commit/11fbe11f77af34d5f29fe39c1663033d869e92d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12974] [ML] [PySpark] Add Python API fo...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10889#issuecomment-174277060
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49952/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12790][CORE] Remove HistoryServer old m...

2016-01-24 Thread felixcheung

Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/10860#issuecomment-174266111
  
Thanks for checking - looked into it and it was because of the removal of 
the legacy log format.
It looks like HistoryServerSuite has more tests on the legacy format than 
the current format, as simply updating test expectations would no-op a bunch of 
tests, it would seem like the best course of action is to convert legacy log 
into new format.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12973][YARN] Support to set priority wh...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10888#issuecomment-174271531
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12974] [ML] [PySpark] Add Python API fo...

2016-01-24 Thread yanboliang

GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/10889

[SPARK-12974] [ML] [PySpark] Add Python API for spark.ml bisecting k-means

Add Python API for spark.ml bisecting k-means.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-12974

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10889.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10889


commit dc81222bde25c9f9b36b8a888e0792a1ed62765e
Author: Yanbo Liang 
Date:   2016-01-24T09:57:18Z

Add Python API for spark.ml bisecting k-means




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12974] [ML] [PySpark] Add Python API fo...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10889#issuecomment-174277059
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12974] [ML] [PySpark] Add Python API fo...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10889#issuecomment-174277108
  
**[Test build #49953 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49953/consoleFull)**
 for PR 10889 at commit 
[`21acce0`](https://github.com/apache/spark/commit/21acce0bb7f04fd88411f65ab2a3624e28d27e4c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12974] [ML] [PySpark] Add Python API fo...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10889#issuecomment-174277700
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12974] [ML] [PySpark] Add Python API fo...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10889#issuecomment-174277703
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49953/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12903] [SparkR] Add covar_samp and cova...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10829#issuecomment-174277974
  
**[Test build #49954 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49954/consoleFull)**
 for PR 10829 at commit 
[`a036f95`](https://github.com/apache/spark/commit/a036f953828cd3ea52208d61745d044b64fb9ebf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12974] [ML] [PySpark] Add Python API fo...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10889#issuecomment-174277650
  
**[Test build #49953 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49953/consoleFull)**
 for PR 10889 at commit 
[`21acce0`](https://github.com/apache/spark/commit/21acce0bb7f04fd88411f65ab2a3624e28d27e4c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12903] [SparkR] Add covar_samp and cova...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10829#issuecomment-174278847
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49954/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12903] [SparkR] Add covar_samp and cova...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10829#issuecomment-174278846
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12903] [SparkR] Add covar_samp and cova...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10829#issuecomment-174278829
  
**[Test build #49954 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49954/consoleFull)**
 for PR 10829 at commit 
[`a036f95`](https://github.com/apache/spark/commit/a036f953828cd3ea52208d61745d044b64fb9ebf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12895] Implement TaskMetrics with accum...

2016-01-24 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/10835#issuecomment-174356027
  
I took a fairly broad pass over this and the bulk of the changes here look 
straightforward. Most of my comments above are fairly minor.

Let me take a closer look at the followup patch for SPARK-12896 in order to 
try to better understand the comments and disabled tests which reference it.

One high-level question: as part of this migration of task metrics to 
accumulators, I believe that we planned to introduce a mechanism to opt out of 
accumulator update de-duplication logic for these internal metric accumulators. 
Are you planning to do that change as part of these two patches or will that be 
deferred to followup


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...

2016-01-24 Thread carsonwang

Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10634#discussion_r50648300
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java
 ---
@@ -125,7 +125,8 @@ public UnsafeKVExternalSorter(
 
   // reset the map, so we can re-use it to insert new records. the 
inMemSorter will not used
   // anymore, so the underline array could be used by map again.
-  map.reset();
+  final long spillSize = map.reset();
+  taskContext.taskMetrics().incMemoryBytesSpilled(spillSize);
--- End diff --

Sorry for the delay, @JoshRosen . I will update this soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10757#issuecomment-174371181
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49959/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10757#issuecomment-174371180
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12624][PYSPARK] Checks row length when ...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10886#issuecomment-174374474
  
**[Test build #49958 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49958/consoleFull)**
 for PR 10886 at commit 
[`ad8efa1`](https://github.com/apache/spark/commit/ad8efa122c21be675111c1bbaeae607058e5c8fa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12624][PYSPARK] Checks row length when ...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10886#issuecomment-174375186
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49958/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-11741 Process doctests using TextTestRun...

2016-01-24 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/9710#issuecomment-174380471
  
Those changes to `accumulators.py` are a huge improvement, so I'd go ahead 
and apply that same pattern across the other files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-11741 Process doctests using TextTestRun...

2016-01-24 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9710#discussion_r50650651
  
--- Diff: python/pyspark/doctesthelper.py ---
@@ -0,0 +1,43 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import sys
+if sys.version_info[:2] <= (2, 6):
+try:
+import unittest2 as unittest
+except ImportError:
+sys.stderr.write('Please install unittest2 to test with Python 2.6 
or earlier')
+sys.exit(1)
+else:
+import unittest
+import doctest
+try:
+import xmlrunner
+except ImportError:
+xmlrunner = None
+
+
+def run_doctests(package):
+print(package)
--- End diff --

Before merging, we're going to want to remove this print statement I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] add more APIs to acto...

2016-01-24 Thread CodingCat

Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/10892#issuecomment-174380370
  
@zsxwing mind having a review?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12934][SQL] Count-min sketch serializat...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10893#issuecomment-174385244
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12934][SQL] Count-min sketch serializat...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10893#issuecomment-174385201
  
**[Test build #49967 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49967/consoleFull)**
 for PR 10893 at commit 
[`e97d7f9`](https://github.com/apache/spark/commit/e97d7f92ad4cb075234772c24f246bf51eff6cc7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  throw new CountMinSketchMergeException(\"Cannot merge estimator 
of class \" + other.getClass().getName());`
  * `public class CountMinSketchMergeException extends Exception `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12934][SQL] Count-min sketch serializat...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10893#issuecomment-174385245
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49967/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

2016-01-24 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10655#issuecomment-174386527
  
no problem! Thank you :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12790][CORE] Remove HistoryServer old m...

2016-01-24 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/10860#issuecomment-174386559
  
I'd also ping @vanzin for this, since I believe that he was involved in a 
lot of the original refactoring of this HistoryServer code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12682][SQL] Add support for (optionally...

2016-01-24 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10826#discussion_r50652891
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala
 ---
@@ -847,4 +847,32 @@ class MetastoreDataSourcesSuite extends QueryTest with 
SQLTestUtils with TestHiv
 sqlContext.sql("""use default""")
 sqlContext.sql("""drop database if exists testdb8156 CASCADE""")
   }
+
+  test("skip hive metadata on table creation") {
+val schema = StructType((1 to 5).map(i => StructField(s"c_$i", 
StringType)))
+
+catalog.createDataSourceTable(
+  tableIdent = TableIdentifier("not_skip_hive_metadata"),
+  userSpecifiedSchema = Some(schema),
+  partitionColumns = Array.empty[String],
+  bucketSpec = None,
+  provider = "parquet",
+  options = Map("path" -> "just a dummy path", "skip_hive_metadata" -> 
"false"),
+  isExternal = false)
+
+assert(catalog.client.getTable("default", 
"not_skip_hive_metadata").schema
+  .forall(column => HiveMetastoreTypes.toDataType(column.hiveType) == 
StringType))
+
+catalog.createDataSourceTable(
+  tableIdent = TableIdentifier("skip_hive_metadata"),
+  userSpecifiedSchema = Some(schema),
+  partitionColumns = Array.empty[String],
+  bucketSpec = None,
+  provider = "parquet",
+  options = Map("path" -> "just a dummy path", "skip_hive_metadata" -> 
"true"),
+  isExternal = false)
+
+assert(catalog.client.getTable("default", "skip_hive_metadata").schema
+  .forall(column => HiveMetastoreTypes.toDataType(column.hiveType) == 
ArrayType(StringType)))
--- End diff --

Let's add comments to explain why we need to check this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12895] Implement TaskMetrics with accum...

2016-01-24 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10835#discussion_r50645758
  
--- Diff: core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala 
---
@@ -378,15 +374,20 @@ class JsonProtocolSuite extends SparkFunSuite {
 val oldInfo = JsonProtocol.accumulableInfoFromJson(oldJson)
 assert(false === oldInfo.internal)
   }
+}
 
-  /** -- *
-   | Helper test running methods |
-   * --- */
 
-  private def testEvent(event: SparkListenerEvent, jsonString: String) {
+// This extends SparkFunSuite only because we want its `assert` method.
+private[spark] object JsonProtocolSuite extends SparkFunSuite {
--- End diff --

Also, if you just want `assert` there's a different Scalatest trait that 
you can mix in without having to have an object extend a suite; I think it's 
`Matchers` or `Assertions` or something like that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12624][PYSPARK] Checks row length when ...

2016-01-24 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10886#issuecomment-174364780
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10705#issuecomment-174383672
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49968/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12953][Examples]RDDRelation writer set ...

2016-01-24 Thread shijinkui

Github user shijinkui commented on the pull request:

https://github.com/apache/spark/pull/10864#issuecomment-174383417
  
@rxin This case throw exception at the second time of running. Unit test is 
import to project, example is useful for spark user. :)
this problem is not important and critical, if can be fixed by the way, 
maybe better for user :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] add more APIs to acto...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10892#issuecomment-174383634
  
**[Test build #49963 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49963/consoleFull)**
 for PR 10892 at commit 
[`2799982`](https://github.com/apache/spark/commit/2799982304c93b5feabf5c597248488fc7a9b53f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12903] [SparkR] Add covar_samp and cova...

2016-01-24 Thread sun-rui

Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/10829#issuecomment-174383819
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10705#issuecomment-174383668
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] add more APIs to acto...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10892#issuecomment-174383809
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49963/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12624][PYSPARK] Checks row length when ...

2016-01-24 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10886#issuecomment-174386178
  
Thanks! Merging to master and branch 1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789]Support order by index

2016-01-24 Thread zhichao-li

Github user zhichao-li commented on a diff in the pull request:

https://github.com/apache/spark/pull/10731#discussion_r50653035
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -446,6 +503,10 @@ class Analyzer(
 val newOrdering = resolveSortOrders(ordering, child, throws = 
false)
 Sort(newOrdering, global, child)
 
+  case s @ Sort(ordering, global, child) if child.resolved && 
!s.resolved =>
+val newOrdering = resolveSortOrders(ordering, child, throws = 
false)
+Sort(newOrdering, global, child)
--- End diff --

oh. my bad , had been aware of this, but forgot to push out the code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12895] Implement TaskMetrics with accum...

2016-01-24 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10835#discussion_r50645683
  
--- Diff: core/src/main/scala/org/apache/spark/InternalAccumulator.scala ---
@@ -17,42 +17,193 @@
 
 package org.apache.spark
 
+import org.apache.spark.storage.{BlockId, BlockStatus}
 
-// This is moved to its own file because many more things will be added to 
it in SPARK-10620.
+
+/**
+ * A collection of fields and methods concerned with internal accumulators 
that represent
+ * task level metrics.
+ */
 private[spark] object InternalAccumulator {
-  val PEAK_EXECUTION_MEMORY = "peakExecutionMemory"
-  val TEST_ACCUMULATOR = "testAccumulator"
-
-  // For testing only.
-  // This needs to be a def since we don't want to reuse the same 
accumulator across stages.
-  private def maybeTestAccumulator: Option[Accumulator[Long]] = {
-if (sys.props.contains("spark.testing")) {
-  Some(new Accumulator(
-0L, AccumulatorParam.LongAccumulatorParam, Some(TEST_ACCUMULATOR), 
internal = true))
-} else {
-  None
+
+  import AccumulatorParam._
+
+  // Prefixes used in names of internal task level metrics
+  val METRICS_PREFIX = "internal.metrics."
+  val SHUFFLE_READ_METRICS_PREFIX = METRICS_PREFIX + "shuffle.read."
+  val SHUFFLE_WRITE_METRICS_PREFIX = METRICS_PREFIX + "shuffle.write."
+  val OUTPUT_METRICS_PREFIX = METRICS_PREFIX + "output."
+  val INPUT_METRICS_PREFIX = METRICS_PREFIX + "input."
+
+  // Names of internal task level metrics
+  val EXECUTOR_DESERIALIZE_TIME = METRICS_PREFIX + 
"executorDeserializeTime"
+  val EXECUTOR_RUN_TIME = METRICS_PREFIX + "executorRunTime"
+  val RESULT_SIZE = METRICS_PREFIX + "resultSize"
+  val JVM_GC_TIME = METRICS_PREFIX + "jvmGCTime"
+  val RESULT_SERIALIZATION_TIME = METRICS_PREFIX + 
"resultSerializationTime"
+  val MEMORY_BYTES_SPILLED = METRICS_PREFIX + "memoryBytesSpilled"
+  val DISK_BYTES_SPILLED = METRICS_PREFIX + "diskBytesSpilled"
+  val PEAK_EXECUTION_MEMORY = METRICS_PREFIX + "peakExecutionMemory"
+  val UPDATED_BLOCK_STATUSES = METRICS_PREFIX + "updatedBlockStatuses"
+  val TEST_ACCUM = METRICS_PREFIX + "testAccumulator"
+
+  // scalastyle:off
+
+  // Names of shuffle read metrics
+  object shuffleRead {
+val REMOTE_BLOCKS_FETCHED = SHUFFLE_READ_METRICS_PREFIX + 
"remoteBlocksFetched"
+val LOCAL_BLOCKS_FETCHED = SHUFFLE_READ_METRICS_PREFIX + 
"localBlocksFetched"
+val REMOTE_BYTES_READ = SHUFFLE_READ_METRICS_PREFIX + "remoteBytesRead"
+val LOCAL_BYTES_READ = SHUFFLE_READ_METRICS_PREFIX + "localBytesRead"
+val FETCH_WAIT_TIME = SHUFFLE_READ_METRICS_PREFIX + "fetchWaitTime"
+val RECORDS_READ = SHUFFLE_READ_METRICS_PREFIX + "recordsRead"
+  }
+
+  // Names of shuffle write metrics
+  object shuffleWrite {
+val BYTES_WRITTEN = SHUFFLE_WRITE_METRICS_PREFIX + "bytesWritten"
+val RECORDS_WRITTEN = SHUFFLE_WRITE_METRICS_PREFIX + "recordsWritten"
+val WRITE_TIME = SHUFFLE_WRITE_METRICS_PREFIX + "writeTime"
+  }
+
+  // Names of output metrics
+  object output {
+val WRITE_METHOD = OUTPUT_METRICS_PREFIX + "writeMethod"
+val BYTES_WRITTEN = OUTPUT_METRICS_PREFIX + "bytesWritten"
+val RECORDS_WRITTEN = OUTPUT_METRICS_PREFIX + "recordsWritten"
+  }
+
+  // Names of input metrics
+  object input {
+val READ_METHOD = INPUT_METRICS_PREFIX + "readMethod"
+val BYTES_READ = INPUT_METRICS_PREFIX + "bytesRead"
+val RECORDS_READ = INPUT_METRICS_PREFIX + "recordsRead"
+  }
+
+  // scalastyle:on
+
+  /**
+   * Create an internal [[Accumulator]] by name, which must begin with 
[[METRICS_PREFIX]].
+   */
+  def create(name: String): Accumulator[_] = {
+assert(name.startsWith(METRICS_PREFIX),
+  s"internal accumulator name must start with '$METRICS_PREFIX': 
$name")
+getParam(name) match {
+  case p @ LongAccumulatorParam => newMetric[Long](0L, name, p)
+  case p @ IntAccumulatorParam => newMetric[Int](0, name, p)
+  case p @ StringAccumulatorParam => newMetric[String]("", name, p)
+  case p @ UpdatedBlockStatusesAccumulatorParam =>
+newMetric[Seq[(BlockId, BlockStatus)]](Seq(), name, p)
+  case p => throw new IllegalArgumentException(
+s"unsupported accumulator param '${p.getClass.getSimpleName}' for 
metric '$name'.")
+}
+  }
+
+  /**
+   * Get the [[AccumulatorParam]] associated with the internal metric name,
+   * which must begin with [[METRICS_PREFIX]].
+   */
+  def getParam(name: String): AccumulatorParam[_] = {
+assert(name.startsWith(METRICS_PREFIX),
+  s"internal accumulator name must

[GitHub] spark pull request: [SPARK-12896] [WIP] Send only accumulator upda...

2016-01-24 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/10857#issuecomment-174356132
  
While I realize that this has the potential to create merge conflicts, it 
would be super helpful if you could squash all of the commits which came from 
#10835 into a single commit here in order to make it easier to spot the delta 
from that PR to this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12975] [SQL] Eliminate Bucketing Column...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10891#issuecomment-174373015
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12975] [SQL] Eliminate Bucketing Column...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10891#issuecomment-174373018
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49961/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12624][PYSPARK] Checks row length when ...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10886#issuecomment-174375180
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10705#issuecomment-174382651
  
**[Test build #49969 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49969/consoleFull)**
 for PR 10705 at commit 
[`8d45da6`](https://github.com/apache/spark/commit/8d45da608a5b8d72259b2a275271f2740b1ebdec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] add more APIs to acto...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10892#issuecomment-174382540
  
**[Test build #49966 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49966/consoleFull)**
 for PR 10892 at commit 
[`ffd1d02`](https://github.com/apache/spark/commit/ffd1d0257eb62692846c590b7bdb46911ff2103a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] add more APIs to acto...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10892#issuecomment-174384312
  
**[Test build #49966 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49966/consoleFull)**
 for PR 10892 at commit 
[`ffd1d02`](https://github.com/apache/spark/commit/ffd1d0257eb62692846c590b7bdb46911ff2103a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...

2016-01-24 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/10705#issuecomment-174387270
  
This has been updated and is now ready for further review. I've gone ahead 
and done the reference -> pin renaming, as discussed previously, have fixed the 
flaky / failing tests, and have added a bit more documentation.

I'll keep working on trying to complete the remaining checklist items 
tonight (adding more debug logging, a debugging feature-flag, etc.) but it 
would be great to get feedback + sign-off on the subset of changes + test 
changes here, just to make sure there won't be any major new surprise review 
comments at the last minute.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12975] [SQL] Eliminate Bucketing Column...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10891#issuecomment-174389180
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12682][SQL] Add support for (optionally...

2016-01-24 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10826#discussion_r50652606
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -323,7 +323,15 @@ private[hive] class HiveMetastoreCatalog(val client: 
ClientInterface, hive: Hive
 
 // TODO: Support persisting partitioned data source relations in Hive 
compatible format
 val qualifiedTableName = tableIdent.quotedString
+val skipHiveMetadata = options.getOrElse("skip_hive_metadata", 
"false").toBoolean
 val (hiveCompatibleTable, logMessage) = (maybeSerDe, 
dataSource.relation) match {
+  case (Some(serde), relation: HadoopFsRelation) if skipHiveMetadata =>
--- End diff --

Maybe it will be better to use `case _ if skipHiveMetadata`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12975] [SQL] Eliminate Bucketing Column...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10891#issuecomment-174388437
  
**[Test build #49962 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49962/consoleFull)**
 for PR 10891 at commit 
[`8c718b3`](https://github.com/apache/spark/commit/8c718b30c228074c0cc81e5fa6c8243aaf976a54).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10705#issuecomment-174390213
  
**[Test build #49971 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49971/consoleFull)**
 for PR 10705 at commit 
[`36253df`](https://github.com/apache/spark/commit/36253dfe6879023562e404d9daf0b2c0c364e718).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12976][SQL] Add LazilyGenerateOrdering ...

2016-01-24 Thread ueshin

GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/10894

[SPARK-12976][SQL] Add LazilyGenerateOrdering and use it for 
RangePartitioner of Exchange.

Add `LazilyGenerateOrdering` to support generated ordering for 
`RangePartitioner` of `Exchange` instead of `InterpretedOrdering`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-12976

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10894.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10894


commit 624927a3583fb995efd675d94b43ba5b98944e3b
Author: Takuya UESHIN 
Date:   2016-01-22T10:05:12Z

Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12976][SQL] Add LazilyGenerateOrdering ...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10894#issuecomment-174399064
  
**[Test build #49975 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49975/consoleFull)**
 for PR 10894 at commit 
[`624927a`](https://github.com/apache/spark/commit/624927a3583fb995efd675d94b43ba5b98944e3b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12895] Implement TaskMetrics with accum...

2016-01-24 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10835#discussion_r50645718
  
--- Diff: core/src/test/scala/org/apache/spark/AccumulatorSuite.scala ---
@@ -159,193 +163,69 @@ class AccumulatorSuite extends SparkFunSuite with 
Matchers with LocalSparkContex
 assert(!Accumulators.originals.get(accId).isDefined)
   }
 
-  test("internal accumulators in TaskContext") {
-sc = new SparkContext("local", "test")
-val accums = InternalAccumulator.create(sc)
-val taskContext = new TaskContextImpl(0, 0, 0, 0, null, null, accums)
-val internalMetricsToAccums = taskContext.internalMetricsToAccumulators
-val collectedInternalAccums = taskContext.collectInternalAccumulators()
-val collectedAccums = taskContext.collectAccumulators()
-assert(internalMetricsToAccums.size > 0)
-assert(internalMetricsToAccums.values.forall(_.isInternal))
-assert(internalMetricsToAccums.contains(TEST_ACCUMULATOR))
-val testAccum = internalMetricsToAccums(TEST_ACCUMULATOR)
-assert(collectedInternalAccums.size === internalMetricsToAccums.size)
-assert(collectedInternalAccums.size === collectedAccums.size)
-assert(collectedInternalAccums.contains(testAccum.id))
-assert(collectedAccums.contains(testAccum.id))
-  }
-
-  test("internal accumulators in a stage") {
-val listener = new SaveInfoListener
-val numPartitions = 10
-sc = new SparkContext("local", "test")
-sc.addSparkListener(listener)
-// Have each task add 1 to the internal accumulator
-val rdd = sc.parallelize(1 to 100, numPartitions).mapPartitions { iter 
=>
-  TaskContext.get().internalMetricsToAccumulators(TEST_ACCUMULATOR) += 
1
-  iter
-}
-// Register asserts in job completion callback to avoid flakiness
-listener.registerJobCompletionCallback { _ =>
-  val stageInfos = listener.getCompletedStageInfos
-  val taskInfos = listener.getCompletedTaskInfos
-  assert(stageInfos.size === 1)
-  assert(taskInfos.size === numPartitions)
-  // The accumulator values should be merged in the stage
-  val stageAccum = 
findAccumulableInfo(stageInfos.head.accumulables.values, TEST_ACCUMULATOR)
-  assert(stageAccum.value.toLong === numPartitions)
-  // The accumulator should be updated locally on each task
-  val taskAccumValues = taskInfos.map { taskInfo =>
-val taskAccum = findAccumulableInfo(taskInfo.accumulables, 
TEST_ACCUMULATOR)
-assert(taskAccum.update.isDefined)
-assert(taskAccum.update.get.toLong === 1)
-taskAccum.value.toLong
-  }
-  // Each task should keep track of the partial value on the way, i.e. 
1, 2, ... numPartitions
-  assert(taskAccumValues.sorted === (1L to numPartitions).toSeq)
-}
-rdd.count()
-  }
-
-  test("internal accumulators in multiple stages") {
--- End diff --

Note to self + other reviewers: these tests of internal accumulators are 
now in their own `InternalAccumulatorSuite`. It would be marginally easier to 
review the test changes if that moving of the code had been done separately. 
@andrewor14, are there any significant changes aside from updates to reflect 
the changes to the TaskContext metrics interfaces? I'll look at the diff myself 
but just thought I'd ask first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-12948. [SQL]. Consider reducing size of ...

2016-01-24 Thread rajeshbalamohan

Github user rajeshbalamohan commented on the pull request:

https://github.com/apache/spark/pull/10861#issuecomment-174355176
  
**Usecase**: User tries to map the dataset which is partitioned (e.g TPC-DS 
dataset at 200 GB scale) & runs a query in spark-shell. 

E.g
...
val o_store_sales = 
sqlContext.read.format("orc").load("/tmp/spark_tpcds_bin_partitioned_orc_200/store_sales")
o_store_sales.registerTempTable("o_store_sales")
..
sqlContext.sql("SELECT..").show();
...


When this is executed, OrcRelation creates Config objects for every 
partition (Ref: 
[OrcRelation.execute()](https://github.com/apache/spark/blob/e14817b528ccab4b4685b45a95e2325630b5fc53/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala#L295)).
 In the case of TPC-DS, it generates 1826 partitions. This info is broadcasted 
in 
[DAGScheduler#submitMissingTasks()](https://github.com/apache/spark/blob/1b2c2162af4d5d2d950af94571e69273b49bf913/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1010).
  As a part of this, the configurations created for 1826 partitions are also 
streamed through (i.e embedded in HadoopMapParitionsWithSplitRDD -->f()--> 
wrappedConf).  Each of these configuration takes around 251 KB per partition.  
Please refer to the profiler snapshot attached in the JIRA 
([mem_snap_shot](https://issues.apache.org/jira/secure/attachment/12784080/SPARK-12948.mem.prof.snapshot.png)).
 This causes quite a bit of delay in the overall job runtim
 e. 

Patch reuses the already broadcastedconf from SparkContext.  fillObject() 
function is executed later for every partition, which internally sets up any 
additional config details. This drastically reduces the amount of payload that 
is broadcasted and helps in reducing the overall job runtime.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12895] Implement TaskMetrics with accum...

2016-01-24 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10835#discussion_r50645733
  
--- Diff: core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala 
---
@@ -378,15 +374,20 @@ class JsonProtocolSuite extends SparkFunSuite {
 val oldInfo = JsonProtocol.accumulableInfoFromJson(oldJson)
 assert(false === oldInfo.internal)
   }
+}
 
-  /** -- *
-   | Helper test running methods |
-   * --- */
 
-  private def testEvent(event: SparkListenerEvent, jsonString: String) {
+// This extends SparkFunSuite only because we want its `assert` method.
+private[spark] object JsonProtocolSuite extends SparkFunSuite {
--- End diff --

What's the motivation for moving these into a separate object? Are you 
trying to reuse some of these methods in a different suite?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12895] Implement TaskMetrics with accum...

2016-01-24 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10835#discussion_r50646732
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -370,6 +370,14 @@ object SparkHadoopUtil {
 
   val SPARK_YARN_CREDS_COUNTER_DELIM = "-"
 
+  /**
+   * Number of records to update input metrics when reading from 
HadoopRDDs.
+   *
+   * Each update is potentially expensive because we need to use 
reflection to access the
+   * Hadoop FileSystem API of interest (only available in 2.5), so we 
should do this sparingly.
+   */
+  private[spark] val UPDATE_INPUT_METRICS_INTERVAL_RECORDS = 1000
--- End diff --

I noticed that `HadoopRDD` has a field named 
`RECORDS_BETWEEN_BYTES_READ_METRIC_UPDATES` which isn't used anywhere...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Eliminate Bucketing Columns that are par...

2016-01-24 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/10891

[SQL] Eliminate Bucketing Columns that are part of Partitioning Columns

When users are using partitionBy and bucketBy at the same time, some 
bucketing columns might be part of partitioning columns. For example, 
```
df.write
  .format(source)
  .partitionBy("i")
  .bucketBy(8, "i", "k")
  .sortBy("k")
  .saveAsTable("bucketed_table")
```
However, in the above case, adding column `i` is useless. It is just 
wasting extra CPU when reading or writing bucket tables. Thus, we can 
automatically remove these overlapping columns from the bucketing columns. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark 
commonKeysInPartitionByBucketBy

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10891.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10891


commit 14fb29d1cc8b30681026ad29f7fc674695644a62
Author: gatorsmile 
Date:   2016-01-24T23:56:30Z

remove unnecessary columns from blockBy

commit e68351bccd7911f55cade845918ecc2494271d2f
Author: gatorsmile 
Date:   2016-01-25T00:26:48Z

added more test cases.

commit e529b7d15f85557d6ccfa7f08f7bacb71611a286
Author: gatorsmile 
Date:   2016-01-25T00:27:32Z

Merge remote-tracking branch 'upstream/master' into 
commonKeysInPartitionByBucketBy




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10757#issuecomment-174370855
  
**[Test build #49960 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49960/consoleFull)**
 for PR 10757 at commit 
[`eb78f29`](https://github.com/apache/spark/commit/eb78f29eea1cc84a9dafc63f65cf8bfc4aaa3243).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12975] [SQL] Eliminate Bucketing Column...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10891#issuecomment-174373189
  
**[Test build #49962 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49962/consoleFull)**
 for PR 10891 at commit 
[`8c718b3`](https://github.com/apache/spark/commit/8c718b30c228074c0cc81e5fa6c8243aaf976a54).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12975] [SQL] Eliminate Bucketing Column...

2016-01-24 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10891#issuecomment-174377687
  
Does Hive write them out?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12817] Simplify CacheManager code and r...

2016-01-24 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/10748#issuecomment-174377653
  
Going to close this for now to declutter the queue but will re-open as soon 
as my other patch is merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12817] Simplify CacheManager code and r...

2016-01-24 Thread JoshRosen

Github user JoshRosen closed the pull request at:

https://github.com/apache/spark/pull/10748


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] add more APIs to acto...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10892#issuecomment-174381976
  
**[Test build #49963 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49963/consoleFull)**
 for PR 10892 at commit 
[`2799982`](https://github.com/apache/spark/commit/2799982304c93b5feabf5c597248488fc7a9b53f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-11741 Process doctests using TextTestRun...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9710#issuecomment-174381975
  
**[Test build #49965 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49965/consoleFull)**
 for PR 9710 at commit 
[`c56132e`](https://github.com/apache/spark/commit/c56132e70ca0da64877cb86c58ae694f016d83c5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12934][SQL] Count-min sketch serializat...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10893#issuecomment-174382920
  
**[Test build #49967 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49967/consoleFull)**
 for PR 10893 at commit 
[`e97d7f9`](https://github.com/apache/spark/commit/e97d7f92ad4cb075234772c24f246bf51eff6cc7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] add more APIs to acto...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10892#issuecomment-174384499
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] add more APIs to acto...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10892#issuecomment-174384503
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49966/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12682][SQL] Add support for (optionally...

2016-01-24 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10826#discussion_r50652549
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -323,7 +323,15 @@ private[hive] class HiveMetastoreCatalog(val client: 
ClientInterface, hive: Hive
 
 // TODO: Support persisting partitioned data source relations in Hive 
compatible format
 val qualifiedTableName = tableIdent.quotedString
+val skipHiveMetadata = options.getOrElse("skip_hive_metadata", 
"false").toBoolean
--- End diff --

How about `skipHiveMetadata`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12975] [SQL] Eliminate Bucketing Column...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10891#issuecomment-174389186
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49962/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789]Support order by index

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10731#issuecomment-174397390
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49972/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789]Support order by index

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10731#issuecomment-174397389
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10705#issuecomment-174397987
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...

2016-01-24 Thread carsonwang

Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/10634#issuecomment-174398084
  
retest this please 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10705#issuecomment-174397879
  
**[Test build #49964 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49964/consoleFull)**
 for PR 10705 at commit 
[`43e50ed`](https://github.com/apache/spark/commit/43e50ed39c5dee9c15ae5aeac451dd02647ac8e5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10705#issuecomment-174397989
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49964/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Provide same info as in spark-submit --help

2016-01-24 Thread jimlohse

Github user jimlohse commented on a diff in the pull request:

https://github.com/apache/spark/pull/10890#discussion_r50653981
  
--- Diff: docs/submitting-applications.md ---
@@ -177,8 +177,9 @@ debugging information by running `spark-submit` with 
the `--verbose` option.
 
 # Advanced Dependency Management
 When using `spark-submit`, the application jar along with any jars 
included with the `--jars` option
-will be automatically transferred to the cluster. Spark uses the following 
URL scheme to allow
-different strategies for disseminating jars:
+will be automatically transferred to the cluster. URLs supplied after 
--jars must be separated by commas. Each entry points to a specific jar file, 
resulting in a comma-separated list of local jars. That list is included on the 
driver and executor classpaths. Directory expansion does not work with --jars. 
--- End diff --

Oh I didn't catch the first point of your question til now, I think they 
are URLS because I think they need the file:// before them, that's an 
URL too?

On 01/24/2016 12:24 PM, Sean Owen wrote:
>
> In docs/submitting-applications.md 
> :
>
> > @@ -177,8 +177,9 @@ debugging information by running `spark-submit` 
with the `--verbose` option.
> >
> >  # Advanced Dependency Management
> >  When using `spark-submit`, the application jar along with any jars 
included with the `--jars` option
> > -will be automatically transferred to the cluster. Spark uses the 
following URL scheme to allow
> > -different strategies for disseminating jars:
> > +will be automatically transferred to the cluster. URLs supplied after 
--jars must be separated by commas. Each entry points to a specific jar file, 
resulting in a comma-separated list of local jars. That list is included on the 
driver and executor classpaths. Directory expansion does not work with --jars.
>
> This seems OK but do they have to be local JARs (I think so)? in which 
> case are they really URLs? The second sentence you added seems to say 
> the same thing as the first then. You could back-tick |--jars| too for 
> consistency.
>
> â
> Reply to this email directly or view it on GitHub 
> .
>




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12895] Implement TaskMetrics with accum...

2016-01-24 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10835#discussion_r50645583
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ShuffleWriteMetrics.scala ---
@@ -17,40 +17,65 @@
 
 package org.apache.spark.executor
 
+import org.apache.spark.{Accumulator, InternalAccumulator}
 import org.apache.spark.annotation.DeveloperApi
 
 
 /**
  * :: DeveloperApi ::
- * Metrics pertaining to shuffle data written in a given task.
+ * A collection of accumulators that represent metrics about writing 
shuffle data.
--- End diff --

It just occurred to me that maybe we should add a note here and in 
`ShuffleReadMetrics` to explain that instances of this class are not 
thread-safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12895] Implement TaskMetrics with accum...

2016-01-24 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10835#discussion_r50645544
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ShuffleReadMetrics.scala ---
@@ -17,71 +17,102 @@
 
 package org.apache.spark.executor
 
+import org.apache.spark.{Accumulator, InternalAccumulator}
 import org.apache.spark.annotation.DeveloperApi
 
 
 /**
  * :: DeveloperApi ::
- * Metrics pertaining to shuffle data read in a given task.
+ * A collection of accumulators that represent metrics about reading 
shuffle data.
  */
 @DeveloperApi
-class ShuffleReadMetrics extends Serializable {
+class ShuffleReadMetrics private (
+_remoteBlocksFetched: Accumulator[Int],
+_localBlocksFetched: Accumulator[Int],
+_remoteBytesRead: Accumulator[Long],
+_localBytesRead: Accumulator[Long],
+_fetchWaitTime: Accumulator[Long],
+_recordsRead: Accumulator[Long])
+  extends Serializable {
+
+  private[executor] def this(accumMap: Map[String, Accumulator[_]]) {
+this(
+  TaskMetrics.getAccum[Int](accumMap, 
InternalAccumulator.shuffleRead.REMOTE_BLOCKS_FETCHED),
--- End diff --

One note: if we ever add new fields to this metrics then we're going to 
have to be very careful to not mix up adjacent fields which have the same type 
but different units.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12624][PYSPARK] Checks row length when ...

2016-01-24 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10886#issuecomment-174356844
  
cc @davies


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12624][PYSPARK] Checks row length when ...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10886#issuecomment-174362165
  
**[Test build #49958 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49958/consoleFull)**
 for PR 10886 at commit 
[`ad8efa1`](https://github.com/apache/spark/commit/ad8efa122c21be675111c1bbaeae607058e5c8fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10757#issuecomment-174373074
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49960/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...

2016-01-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10757#issuecomment-174373048
  
**[Test build #49960 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49960/consoleFull)**
 for PR 10757 at commit 
[`eb78f29`](https://github.com/apache/spark/commit/eb78f29eea1cc84a9dafc63f65cf8bfc4aaa3243).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...

2016-01-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10757#issuecomment-174373072
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-11741 Process doctests using TextTestRun...

2016-01-24 Thread gliptak

Github user gliptak commented on the pull request:

https://github.com/apache/spark/pull/9710#issuecomment-174380026
  
@JoshRosen Please review accumulators.py


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 247 matches

Mail list logo