[GitHub] spark pull request: [SPARK-2546] Clone JobConf for each task (bran...

2014-10-07 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2684#issuecomment-58142790
  
By the way, I checked and this patch cleanly cherry-picks into `branch-1.0`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3412] Replace Epydoc with Sphinx to gen...

2014-10-07 Thread davies
GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/2689

[SPARK-3412] Replace Epydoc with Sphinx to generate Python API docs

Retire Epydoc, use Sphinx to generate API docs.

Refine Sphinx docs, also convert some docstrings into Sphinx style.

It looks like:
![api 
doc](https://cloud.githubusercontent.com/assets/40902/4538272/9e2d4f10-4dec-11e4-8d96-6e45a8fe51f9.png)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2689.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2689


commit 240b3933ffa21c97d8bf53b91b6274f715877980
Author: Davies Liu davies@gmail.com
Date:   2014-10-06T22:00:11Z

replace epydoc with sphinx doc

commit 746d0b67ba782faf660c24dc3ab11caefc9a0cc2
Author: Davies Liu davies@gmail.com
Date:   2014-10-06T22:46:24Z

@param - :param

commit 4bc1c3c794c2ecfb213d5fc0379c03c3615b5c89
Author: Davies Liu davies@gmail.com
Date:   2014-10-07T06:29:49Z

refactor

commit d5b874a1dd0f49e1dee84746ef64ec08efeccaf9
Author: Davies Liu davies@gmail.com
Date:   2014-10-07T06:35:49Z

Merge branch 'master' of github.com:apache/spark into docs

Conflicts:
python/pyspark/mllib/classification.py
python/pyspark/mllib/regression.py




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-07 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-58143439
  
@mateiz Aside from restoring the `getThreadLocal` method in order to 
preserve API compatibility, is this patch otherwise ready to merge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3412] [PySpark] Replace Epydoc with Sph...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2689#issuecomment-58143632
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21367/consoleFull)
 for   PR 2689 at commit 
[`d5b874a`](https://github.com/apache/spark/commit/d5b874a1dd0f49e1dee84746ef64ec08efeccaf9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: fix the Building Spark url

2014-10-07 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2558#issuecomment-58143668
  
Hi @yangl,

Do you mind closing this PR?  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-07 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-58143704
  
@mateiz @JoshRosen I had put getThreadLocal() back and deprecated it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: fix the Building Spark url

2014-10-07 Thread yangl
Github user yangl commented on the pull request:

https://github.com/apache/spark/pull/2558#issuecomment-58143905
  
close please,ths!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: fix the Building Spark url

2014-10-07 Thread yangl
Github user yangl closed the pull request at:

https://github.com/apache/spark/pull/2558


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-58143977
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21368/consoleFull)
 for   PR 2624 at commit 
[`a69f30c`](https://github.com/apache/spark/commit/a69f30cdb8e63d526ebee06162d8f1b9f2adb253).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3825] Log more detail when unrolling a ...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2688#issuecomment-58144285
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21366/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3825] Log more detail when unrolling a ...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2688#issuecomment-58144279
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21366/consoleFull)
 for   PR 2688 at commit 
[`5638c49`](https://github.com/apache/spark/commit/5638c49f1a441b338b8998294aacae27300cd522).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3829] Make Spark logo image on the head...

2014-10-07 Thread sarutak
GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/2690

[SPARK-3829] Make Spark logo image on the header of HistoryPage as a link 
to HistoryPage's page #1

There is a Spark logo on the header of HistoryPage.
We can have too many HistoryPages if we run 20+ applications. So I think, 
it's useful if the logo is as a link to the HistoryPage's page #1.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark SPARK-3829

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2690.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2690


commit dd874805678629df0fdb489644a1330ff35c94a4
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Date:   2014-10-07T06:54:27Z

Made header Spark log image as a link to History Server's top page.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3829] Make Spark logo image on the head...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2690#issuecomment-58145012
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21369/consoleFull)
 for   PR 2690 at commit 
[`dd87480`](https://github.com/apache/spark/commit/dd874805678629df0fdb489644a1330ff35c94a4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3270] Spark API for Application Extensi...

2014-10-07 Thread mmalohlava
GitHub user mmalohlava opened a pull request:

https://github.com/apache/spark/pull/2691

[SPARK-3270] Spark API for Application Extensions

SPARK-3270: Initial proposal of application extensions.

The change set introduces:
   * Spark extension API to implement
   * hook into Executor to handle extension lifecycle
* a method to specify extension via SparkConf
* a 'spark.extensions' configuration variable to pass extension
list to spark context
* a test verifying that extension is correctly started inside executor 
lifecycle

For more details please folow SPARK-3270 or design document

https://docs.google.com/document/d/1dHF9zi7GzFbYnbV2PwaOQ2eLPoTeiN9IogUe4PAOtrQ/edit?usp=sharing

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/0xdata/perrier core_ext

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2691.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2691


commit 255357e7f1451b592bdd7374b5007aa3ce63690b
Author: mmalohlava michal.malohl...@gmail.com
Date:   2014-10-03T01:53:02Z

SPARK-3270: Initial proposal of application extensions.

The commit introduces:
 - Spark extension API to implement
 - hook into Executor to handle extension lifecycle
 - a method to specify extension via SparkConf
 - a 'spark.extensions' configuration variable to pass extension
list to spark context

For more details please folow SPARK-3270 or design document

https://docs.google.com/document/d/1dHF9zi7GzFbYnbV2PwaOQ2eLPoTeiN9IogUe4PAOtrQ/edit?usp=sharing

commit 532d352936b47c9b38635976aa33e9010fd6e81a
Author: mmalohlava michal.malohl...@gmail.com
Date:   2014-10-07T00:02:06Z

SPARK-3270 : test suite for application extension

The basic test suite verifying that a given extension
is started on all executors.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3270] Spark API for Application Extensi...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2691#issuecomment-58146366
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3270] Spark API for Application Extensi...

2014-10-07 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2691#issuecomment-58146963
  
This seems quite heavyweight compared to Patrick's suggestion of just using 
a static object. Why the need for custom logic to load classes? (which even 
opens up security questions)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3412] [PySpark] Replace Epydoc with Sph...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2689#issuecomment-58148670
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21367/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3412] [PySpark] Replace Epydoc with Sph...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2689#issuecomment-58148662
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21367/consoleFull)
 for   PR 2689 at commit 
[`d5b874a`](https://github.com/apache/spark/commit/d5b874a1dd0f49e1dee84746ef64ec08efeccaf9).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-58149038
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21368/consoleFull)
 for   PR 2624 at commit 
[`a69f30c`](https://github.com/apache/spark/commit/a69f30cdb8e63d526ebee06162d8f1b9f2adb253).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2624#issuecomment-58149042
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21368/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3829] Make Spark logo image on the head...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2690#issuecomment-58150433
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21369/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3829] Make Spark logo image on the head...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2690#issuecomment-58150425
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21369/consoleFull)
 for   PR 2690 at commit 
[`dd87480`](https://github.com/apache/spark/commit/dd874805678629df0fdb489644a1330ff35c94a4).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3831] Filter rule Improvement and bool ...

2014-10-07 Thread sarutak
GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/2692

[SPARK-3831] Filter rule Improvement and bool expression optimization.

If we write the filter which is always FALSE like

SELECT * from person WHERE FALSE;

200 tasks will run. I think, 1 task is enough.

And current optimizer cannot optimize the case NOT is duplicated like

SELECT * from person WHERE NOT ( NOT (age  30));

The filter rule above should be simplified


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark SPARK-3831

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2692.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2692


commit 8ea872b0131f75ae0797add9bda6dbc79d92736a
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Date:   2014-10-07T12:34:06Z

Fixed the number of tasks when the data of  LocalRelation is empty.

Added optimization rule related to bool expression.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2692#issuecomment-58178035
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21370/consoleFull)
 for   PR 2692 at commit 
[`8ea872b`](https://github.com/apache/spark/commit/8ea872b0131f75ae0797add9bda6dbc79d92736a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2692#issuecomment-58179115
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21370/consoleFull)
 for   PR 2692 at commit 
[`8ea872b`](https://github.com/apache/spark/commit/8ea872b0131f75ae0797add9bda6dbc79d92736a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2692#issuecomment-58179119
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21370/Test 
FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3119] Re-implementation of TorrentBroad...

2014-10-07 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/2030#issuecomment-58183559
  
We had a build against the spark master on Oct 2, and when ran our 
application with data around 600GB, we got the following exception. Does this 
PR fix this issue which is seen by @JoshRosen

Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, 
most recent failure: Lost task 0.3 in stage 6.0 (TID 8312, ams03-002.ff): 
java.io.IOException: PARSING_ERROR(2)
org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594)

org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125)

org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)

org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58)

org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)

org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1004)

org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116)

org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115)

org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)

org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)

org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)

org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)

org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89)

org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
org.apache.spark.scheduler.Task.run(Task.scala:56)

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)
Driver stacktrace:



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...

2014-10-07 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/2692#issuecomment-58186913
  
@sarutak LGTM. Can you take a look at the failing test?
The log is 
```
[info] - NOT (i  88) *** FAILED ***
[info]   2 did not equal 10 Wrong number of read batches 
(PartitionBatchPruningSuite.scala:91)
```
Seems we need to update the test suite since with your change, we can 
handle this predicate when doing batch pruning for cached tables. Also, it will 
be good to add another case involving `NOT` into unsupported predicate if 
possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: code style format

2014-10-07 Thread shijinkui
Github user shijinkui commented on the pull request:

https://github.com/apache/spark/pull/2643#issuecomment-58187162
  
in the intellij IDEA, too much yellow tips to fix.
after changing, the code looks better. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL] Fixes test suites in hive-th...

2014-10-07 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/2675#issuecomment-58188701
  
Hi @liancheng, I think i get the root cause here. In TestHive.scala we 
reset the log4j level
```
  // HACK: Hive is too noisy by default.
  org.apache.log4j.LogManager.getCurrentLoggers.foreach { log =

log.asInstanceOf[org.apache.log4j.Logger].setLevel(org.apache.log4j.Level.WARN)
  }
```
So here the level is WARN and the process will not loginfo 
ThriftBinaryCLIService listening on, that lead to time out exception and test 
failed.

Maybe we should reset log4j level here to test this:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...

2014-10-07 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2692#issuecomment-58190309
  
@yhuai Thanks picking this PR up and for your comment!
I'll try soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3832][MLlib] Upgrade Breeze dependency ...

2014-10-07 Thread dbtsai
GitHub user dbtsai opened a pull request:

https://github.com/apache/spark/pull/2693

[SPARK-3832][MLlib] Upgrade Breeze dependency to 0.10

In Breeze 0.10, the L1regParam can be configured through anonymous function 
in OWLQN, and each component can be penalized differently. This is required for 
GLMNET in MLlib with L1/L2 regularization.

https://github.com/scalanlp/breeze/commit/2570911026aa05aa1908ccf7370bc19cd8808a4c


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dbtsai/spark breeze0.10

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2693.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2693


commit 7a0c45cda7d388152774722a2f6728294cc81b4e
Author: DB Tsai dbt...@dbtsai.com
Date:   2014-10-07T14:20:41Z

In Breeze 0.10, the L1regParam can be configured through anonymous function 
in OWLQN, and each component can be penalized differently. This is required for 
GLMNET in MLlib with L1/L2 regularization.

https://github.com/scalanlp/breeze/commit/2570911026aa05aa1908ccf7370bc19cd8808a4c




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3832][MLlib] Upgrade Breeze dependency ...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2693#issuecomment-58192163
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21371/consoleFull)
 for   PR 2693 at commit 
[`7a0c45c`](https://github.com/apache/spark/commit/7a0c45cda7d388152774722a2f6728294cc81b4e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-07 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-58197006
  
Thanks @andrewor14.  I've merged this into 1.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2577


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3119] Re-implementation of TorrentBroad...

2014-10-07 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2030#issuecomment-58201237
  
It could be fixed by https://github.com/apache/spark/pull/2624

It's strange that I can not see this comment on PR #2030.

On Tue, Oct 7, 2014 at 6:28 AM, DB Tsai notificati...@github.com wrote:

 We had a build against the spark master on Oct 2, and when ran our
 application with data around 600GB, we got the following exception. Does
 this PR fix this issue which is seen by @JoshRosen
 https://github.com/JoshRosen

 Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, 
most recent failure: Lost task 0.3 in stage 6.0 (TID 8312, ams03-002.ff): 
java.io.IOException: PARSING_ERROR(2)
 org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
 org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
 org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594)
 
org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125)
 
org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)
 org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58)
 
org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)
 
org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1004)
 
org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116)
 
org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115)
 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89)
 
org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
 org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)

 Driver stacktrace:

 --
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/2030#issuecomment-58183559.




-- 
 - Davies


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3832][MLlib] Upgrade Breeze dependency ...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2693#issuecomment-58202720
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21371/consoleFull)
 for   PR 2693 at commit 
[`7a0c45c`](https://github.com/apache/spark/commit/7a0c45cda7d388152774722a2f6728294cc81b4e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3832][MLlib] Upgrade Breeze dependency ...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2693#issuecomment-58202731
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21371/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2692#issuecomment-58203345
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21372/consoleFull)
 for   PR 2692 at commit 
[`a11b9f3`](https://github.com/apache/spark/commit/a11b9f31751f23ba306c2549108a3c6ab47191fe).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3786] [PySpark] speedup tests

2014-10-07 Thread jameszhouyi
Github user jameszhouyi commented on the pull request:

https://github.com/apache/spark/pull/2646#issuecomment-58205767
  
Hi @davies @JoshRosen  

Found below errors after add 'time' in run-tests
Running PySpark tests. Output is in python/unit-tests.log.
Testing with Python version:
Python 2.6.6
Run core tests ...
Running test: pyspark/rdd.py
./python/run-tests: line 37: time: command not found
./python/run-tests: line 37: time: command not found


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2661#issuecomment-58209751
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21373/Test 
FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2647#issuecomment-58209752
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21374/Test 
FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-58209766
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21376/Test 
FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2612#issuecomment-58209763
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21375/Test 
FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-07 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2612#issuecomment-58210133
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-07 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-58210179
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...

2014-10-07 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2661#issuecomment-58210228
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-07 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2647#issuecomment-58210261
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...

2014-10-07 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2692#discussion_r18529676
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/columnar/PartitionBatchPruningSuite.scala
 ---
@@ -67,10 +67,11 @@ class PartitionBatchPruningSuite extends FunSuite with 
BeforeAndAfterAll with Be
   checkBatchPruning(i  8 AND i = 21, 9 to 21, 2, 3)
   checkBatchPruning(i  2 OR i  99, Seq(1, 100), 2, 2)
   checkBatchPruning(i  2 OR (i  78 AND i  92), Seq(1) ++ (79 to 91), 
3, 4)
+  checkBatchPruning(NOT (i  88), 88 to 100, 1, 2)
 
   // With unsupported predicate
   checkBatchPruning(i  12 AND i IS NOT NULL, 1 to 11, 1, 2)
-  checkBatchPruning(NOT (i  88), 88 to 100, 5, 10)
+  checkBatchPruning(NOT (i in (1)), 2 to 100, 5, 10)
--- End diff --

How about 
```
checkBatchPruning(sNOT (i in (${(1 to 30).mkString(,)})), 31 to 100, 5, 
10)
```
For this case, we will read 4 partitions including 7 batches when we can 
support it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2612#issuecomment-58210908
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21379/consoleFull)
 for   PR 2612 at commit 
[`b0585da`](https://github.com/apache/spark/commit/b0585da796aeb91957956f61d97fa98953d1c5e5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2647#issuecomment-58210898
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21377/consoleFull)
 for   PR 2647 at commit 
[`ad1f96e`](https://github.com/apache/spark/commit/ad1f96ea36f7a4750d6fdaf3ab91239a20a7e6a1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2692#issuecomment-5827
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21372/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2692#issuecomment-5823
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21372/consoleFull)
 for   PR 2692 at commit 
[`a11b9f3`](https://github.com/apache/spark/commit/a11b9f31751f23ba306c2549108a3c6ab47191fe).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-07 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2612#discussion_r18530534
  
--- Diff: dev/lint-windows-cmd ---
@@ -0,0 +1,40 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+SCRIPT_DIR=$( cd $( dirname $0 )  pwd )
+SPARK_ROOT_DIR=$(dirname $SCRIPT_DIR)
+TARGET_DIR=$SPARK_ROOT_DIR/bin
+HAS_ERROR=0
+
+# check whether all of lines ends with CRLF.
+for file in $TARGET_DIR/*.cmd ; do
+  grep ^.*$'\r'$ $file  /dev/null
+  if [ $? -ne 0 ]; then
+HAS_ERROR=1
+echo $file has line(s) not ends with CRLF.
+  fi
+done
+
+if [ $HAS_ERROR -eq 0 ];then
+  echo -e Windows batch file style checks passed.
+else
+  echo -e Windows batch file style  checks failed.
--- End diff --

Looks like there's an extra space here between `style` and `checks`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-07 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2612#discussion_r18530666
  
--- Diff: dev/lint-windows-cmd ---
@@ -0,0 +1,40 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+SCRIPT_DIR=$( cd $( dirname $0 )  pwd )
+SPARK_ROOT_DIR=$(dirname $SCRIPT_DIR)
+TARGET_DIR=$SPARK_ROOT_DIR/bin
+HAS_ERROR=0
+
+# check whether all of lines ends with CRLF.
+for file in $TARGET_DIR/*.cmd ; do
+  grep ^.*$'\r'$ $file  /dev/null
+  if [ $? -ne 0 ]; then
--- End diff --

In Bash it's a good practice to always quote tested variables. So `$? -eq 
0`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-07 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2612#discussion_r18530699
  
--- Diff: dev/lint-windows-cmd ---
@@ -0,0 +1,40 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+SCRIPT_DIR=$( cd $( dirname $0 )  pwd )
+SPARK_ROOT_DIR=$(dirname $SCRIPT_DIR)
+TARGET_DIR=$SPARK_ROOT_DIR/bin
+HAS_ERROR=0
+
+# check whether all of lines ends with CRLF.
+for file in $TARGET_DIR/*.cmd ; do
+  grep ^.*$'\r'$ $file  /dev/null
+  if [ $? -ne 0 ]; then
+HAS_ERROR=1
+echo $file has line(s) not ends with CRLF.
+  fi
+done
+
+if [ $HAS_ERROR -eq 0 ];then
--- End diff --

Same here. I suggest quoting both terms in the comparison.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...

2014-10-07 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2692#issuecomment-58211878
  
@yhuai Thanks, it make sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-58210929
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21380/consoleFull)
 for   PR 2520 at commit 
[`de91bbd`](https://github.com/apache/spark/commit/de91bbd37d0986abc8d154efde2418e07b685eb0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2661#issuecomment-58212070
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21378/Test 
FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2692#issuecomment-58212418
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21381/consoleFull)
 for   PR 2692 at commit 
[`23c750c`](https://github.com/apache/spark/commit/23c750cd5eb883171737d1a622fd30954315232a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-07 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2612#issuecomment-58212737
  
Thanks @nchammas , I've done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...

2014-10-07 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2661#issuecomment-58212936
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2612#issuecomment-58213209
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21382/consoleFull)
 for   PR 2612 at commit 
[`cfaa176`](https://github.com/apache/spark/commit/cfaa176a299b4c7b3f02e7dc8bf35627997021c5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2661#issuecomment-58214144
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21383/consoleFull)
 for   PR 2661 at commit 
[`8b64bb7`](https://github.com/apache/spark/commit/8b64bb7feb0ddea9f573cabfd96150bce673aa31).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3119] Re-implementation of TorrentBroad...

2014-10-07 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/2030#issuecomment-58214186
  
I thought it was a close issue, so I moved my comment to JIRA. I ran into
this issue in spark-shell not the standalone application, does SPARK-3762
apply in this situation? Thanks.

Sent from my Google Nexus 5
On Oct 7, 2014 5:17 PM, Davies Liu notificati...@github.com wrote:

 It could be fixed by https://github.com/apache/spark/pull/2624

 It's strange that I can not see this comment on PR #2030.

 On Tue, Oct 7, 2014 at 6:28 AM, DB Tsai notificati...@github.com wrote:

  We had a build against the spark master on Oct 2, and when ran our
  application with data around 600GB, we got the following exception. Does
  this PR fix this issue which is seen by @JoshRosen
  https://github.com/JoshRosen
 
  Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times,
 most recent failure: Lost task 0.3 in stage 6.0 (TID 8312, ams03-002.ff):
 java.io.IOException: PARSING_ERROR(2)
  org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
  org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
  org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594)
  
org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125)
  
org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)
  org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58)
 
 
org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)
 
 
org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1004)
 
 
org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116)
 
 
org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115)
 
 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
 
 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
  scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 
 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
 
 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
  org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89)
 
 
org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
  org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
  org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
  org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
  org.apache.spark.scheduler.Task.run(Task.scala:56)
  org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
 
 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  java.lang.Thread.run(Thread.java:744)
 
  Driver stacktrace:
 
  --
  Reply to this email directly or view it on GitHub
  https://github.com/apache/spark/pull/2030#issuecomment-58183559.
 



 --
 - Davies

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/2030#issuecomment-58201237.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3786] [PySpark] speedup tests

2014-10-07 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2646#issuecomment-58214253
  
What shell are you running it in?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...

2014-10-07 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/2673#issuecomment-58216712
  
Thanks for the explanation. The issue with the Scala versions makes sense. 
What threw me off was the Hadoop example: I've always seen people say that the 
Spark API is independent of the Hadoop version, and they should explicitly say 
the Hadoop version they want in their projects (and have a matching Spark 
deployment). So explaining this PR in terms of publishing a different Hadoop 
versions sounds a little bit at odds with that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3710] Fix Yarn integration tests on Had...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2682#issuecomment-58216947
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21384/consoleFull)
 for   PR 2682 at commit 
[`701d4fb`](https://github.com/apache/spark/commit/701d4fb9fbeb52856ab4611b00f2ecfb35cc9e88).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2750] support https in spark web ui

2014-10-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1980#discussion_r18532610
  
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -205,10 +231,74 @@ private[spark] object JettyUtils extends Logging {
 ServerInfo(server, boundPort, collection)
   }
 
+  // to generate a new url string scheme://server:port+path
--- End diff --

Hi @scwf,

The reason I asked for a comment is that it seems like the method is doing 
a little more than just that. For example, L238 seems to be doing some sort of 
parsing of the `server` string, so it's more than just concatenating the 
different arguments into a URL.

It would be nice if the comment explained exactly what the relationship 
between the input and the output is. A unit test wouldn't hurt either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-58218797
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21385/consoleFull)
 for   PR 1269 at commit 
[`cb951cc`](https://github.com/apache/spark/commit/cb951cc3693bec9e1694efd25db0a599869899b5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-58218990
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21385/consoleFull)
 for   PR 1269 at commit 
[`cb951cc`](https://github.com/apache/spark/commit/cb951cc3693bec9e1694efd25db0a599869899b5).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class Document(val tokens: SparseVector[Int], val alphabetSize: Int) 
extends Serializable`
  * `class DocumentParameters(val document: Document,`
  * `class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize 
: Int)`
  * `class PLSA(@transient protected val sc: SparkContext,`
  * `class RobustDocumentParameters(document: Document,`
  * `class RobustGlobalParameters(phi : Array[Array[Float]],`
  * `class RobustPLSA(@transient protected val sc: SparkContext,`
  * `trait DocumentOverTopicDistributionRegularizer extends Serializable 
with MatrixInPlaceModification `
  * `class 
SymmetricDirichletDocumentOverTopicDistributionRegularizer(protected val alpha: 
Float)`
  * `class SymmetricDirichletTopicRegularizer(protected val alpha: Float) 
extends TopicsRegularizer`
  * `trait TopicsRegularizer extends MatrixInPlaceModification `
  * `class UniformDocumentOverTopicRegularizer extends 
DocumentOverTopicDistributionRegularizer `
  * `class UniformTopicRegularizer extends TopicsRegularizer `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-58218992
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21385/Test 
FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-10-07 Thread akopich
Github user akopich commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-58220390
  
Unfortunately, our cluster is unavailable due to some technical issues.

Probably, the problem you report is related to the fact that `backgound : 
Array[Float]` in the line 
```
val newParameters = parameters.map(parameter =
parameter.getNewTheta(topicsBC, background, eps, gamma)).cache()
```
is serialized with the task. 

But it's clear that `backgound` variable should be approximately 0.5 MB, 
and I still have no idea why does the task grow up to several MB. I don't 
understand why it grows neither.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2692#issuecomment-58220515
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21381/consoleFull)
 for   PR 2692 at commit 
[`23c750c`](https://github.com/apache/spark/commit/23c750cd5eb883171737d1a622fd30954315232a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2692#issuecomment-58220527
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21381/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3831] [SQL] Filter rule Improvement and...

2014-10-07 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/2692#issuecomment-58221944
  
LGTM

cc @marmbrus.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3832][MLlib] Upgrade Breeze dependency ...

2014-10-07 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/2693#issuecomment-58221886
  
@dbtsai Could you check whether there is any dependency change in 
breeze-0.10 and the number of files in breeze-0.10 jar? Does it compatible with 
both Scala 2.10 and 2.11? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-58222074
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21380/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2647#issuecomment-58222518
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21377/consoleFull)
 for   PR 2647 at commit 
[`ad1f96e`](https://github.com/apache/spark/commit/ad1f96ea36f7a4750d6fdaf3ab91239a20a7e6a1).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2647#issuecomment-58222529
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21377/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2612#issuecomment-58222629
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21379/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2612#issuecomment-58222616
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21379/consoleFull)
 for   PR 2612 at commit 
[`b0585da`](https://github.com/apache/spark/commit/b0585da796aeb91957956f61d97fa98953d1c5e5).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-10-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r18534397
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -142,48 +151,56 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
* Tries to reuse as much of the data already in memory as possible, by 
not reading
* applications that haven't been updated since last time the logs were 
checked.
*/
-  private def checkForLogs() = {
+  private[history] def checkForLogs() = {
 lastLogCheckTimeMs = getMonotonicTimeMs()
 logDebug(Checking for logs. Time is now 
%d..format(lastLogCheckTimeMs))
-try {
-  val logStatus = fs.listStatus(new Path(resolvedLogDir))
-  val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq 
else Seq[FileStatus]()
 
-  // Load all new logs from the log directory. Only directories that 
have a modification time
-  // later than the last known log directory will be loaded.
+def getModificationTime(fsEntry: FileStatus) = {
--- End diff --

Are we adding return types for every method now? I thought we were only 
doing this for public ones.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-10-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r18534494
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -142,48 +151,56 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
* Tries to reuse as much of the data already in memory as possible, by 
not reading
* applications that haven't been updated since last time the logs were 
checked.
*/
-  private def checkForLogs() = {
+  private[history] def checkForLogs() = {
 lastLogCheckTimeMs = getMonotonicTimeMs()
 logDebug(Checking for logs. Time is now 
%d..format(lastLogCheckTimeMs))
-try {
-  val logStatus = fs.listStatus(new Path(resolvedLogDir))
-  val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq 
else Seq[FileStatus]()
 
-  // Load all new logs from the log directory. Only directories that 
have a modification time
-  // later than the last known log directory will be loaded.
+def getModificationTime(fsEntry: FileStatus) = {
+  if (fsEntry.isDir) {
+fs.listStatus(fsEntry.getPath).map(_.getModificationTime()).max
+  } else {
+fsEntry.getModificationTime()
+  }
+}
+
+try {
   var newLastModifiedTime = lastModifiedTime
-  val logInfos = logDirs
-.filter { dir =
-  if (fs.isFile(new Path(dir.getPath(), 
EventLoggingListener.APPLICATION_COMPLETE))) {
-val modTime = getModificationTime(dir)
+  val logInfos = fs.listStatus(new Path(logDir))
+.filter { entry =
--- End diff --

That makes the alignment of flatMap / sortBy really weird. Do you have an 
example of what you have in mind so I can follow it? Others I have found follow 
this style.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-58222849
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21386/Test 
FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-58222059
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21380/consoleFull)
 for   PR 2520 at commit 
[`de91bbd`](https://github.com/apache/spark/commit/de91bbd37d0986abc8d154efde2418e07b685eb0).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-10-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r18534683
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -142,48 +151,56 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
* Tries to reuse as much of the data already in memory as possible, by 
not reading
* applications that haven't been updated since last time the logs were 
checked.
*/
-  private def checkForLogs() = {
+  private[history] def checkForLogs() = {
 lastLogCheckTimeMs = getMonotonicTimeMs()
 logDebug(Checking for logs. Time is now 
%d..format(lastLogCheckTimeMs))
-try {
-  val logStatus = fs.listStatus(new Path(resolvedLogDir))
-  val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq 
else Seq[FileStatus]()
 
-  // Load all new logs from the log directory. Only directories that 
have a modification time
-  // later than the last known log directory will be loaded.
+def getModificationTime(fsEntry: FileStatus) = {
+  if (fsEntry.isDir) {
+fs.listStatus(fsEntry.getPath).map(_.getModificationTime()).max
+  } else {
+fsEntry.getModificationTime()
+  }
+}
+
+try {
   var newLastModifiedTime = lastModifiedTime
-  val logInfos = logDirs
-.filter { dir =
-  if (fs.isFile(new Path(dir.getPath(), 
EventLoggingListener.APPLICATION_COMPLETE))) {
-val modTime = getModificationTime(dir)
+  val logInfos = fs.listStatus(new Path(logDir))
+.filter { entry =
+  val isLogEntry =
+if (entry.isDir()) {
+  fs.exists(new Path(entry.getPath(), APPLICATION_COMPLETE))
+} else {
+  
!entry.getPath().getName().endsWith(EventLoggingListener.IN_PROGRESS)
+}
+
+  if (isLogEntry) {
+val modTime = getModificationTime(entry)
 newLastModifiedTime = math.max(newLastModifiedTime, modTime)
-modTime  lastModifiedTime
+modTime = lastModifiedTime
   } else {
 false
   }
 }
-.flatMap { dir =
+.flatMap { entry =
   try {
-val (replayBus, appListener) = createReplayBus(dir)
-replayBus.replay()
+val appListener = replay(entry, new ReplayListenerBus())
 Some(new FsApplicationHistoryInfo(
-  dir.getPath().getName(),
-  appListener.appId.getOrElse(dir.getPath().getName()),
+  entry.getPath().getName(),
+  appListener.appId.getOrElse(entry.getPath().getName()),
   appListener.appName.getOrElse(NOT_STARTED),
   appListener.startTime.getOrElse(-1L),
   appListener.endTime.getOrElse(-1L),
-  getModificationTime(dir),
+  getModificationTime(entry),
   appListener.sparkUser.getOrElse(NOT_STARTED)))
   } catch {
 case e: Exception =
-  logInfo(sFailed to load application log data from $dir., e)
+  logInfo(sFailed to load application log data from $entry., 
e)
   None
   }
 }
 .sortBy { info = -info.endTime }
 
-  lastModifiedTime = newLastModifiedTime
--- End diff --

Oops.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3133] embed small object in broadcast t...

2014-10-07 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2681#discussion_r18534899
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -161,6 +178,10 @@ private[spark] class TorrentBroadcast[T: ClassTag](
   _value = x.asInstanceOf[T]
 
 case None =
+  if (numBlocks == 0) {
--- End diff --

when will this ever happen?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3133] embed small object in broadcast t...

2014-10-07 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2681#discussion_r18534919
  
--- Diff: 
core/src/test/scala/org/apache/spark/broadcast/BroadcastSuite.scala ---
@@ -257,7 +257,7 @@ class BroadcastSuite extends FunSuite with 
LocalSparkContext {
   new SparkContext(local, test, broadcastConf)
 }
 val blockManagerMaster = sc.env.blockManager.master
-val list = List[Int](1, 2, 3, 4)
+val list = (1 to 4096).toList
--- End diff --

can u make sure we have unit tests for both cases? i.e. small broadcast and 
large ones.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-10-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r18535127
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -214,29 +231,64 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
 }
   }
 
-  private def createReplayBus(logDir: FileStatus): (ReplayListenerBus, 
ApplicationEventListener) = {
-val path = logDir.getPath()
-val elogInfo = EventLoggingListener.parseLoggingInfo(path, fs)
-val replayBus = new ReplayListenerBus(elogInfo.logPaths, fs, 
elogInfo.compressionCodec)
-val appListener = new ApplicationEventListener
-replayBus.addListener(appListener)
-(replayBus, appListener)
+  private def replay(logPath: FileStatus, bus: ReplayListenerBus): 
ApplicationEventListener = {
+val (logInput, sparkVersion) =
+  if (logPath.isDir()) {
+openOldLog(logPath.getPath())
+  } else {
+EventLoggingListener.openEventLog(logPath.getPath(), fs)
+  }
+try {
+  val appListener = new ApplicationEventListener
+  bus.addListener(appListener)
+  bus.replay(logInput, sparkVersion)
+  appListener
+} finally {
+  logInput.close()
+}
   }
 
-  /** Return when this directory was last modified. */
-  private def getModificationTime(dir: FileStatus): Long = {
-try {
-  val logFiles = fs.listStatus(dir.getPath)
-  if (logFiles != null  !logFiles.isEmpty) {
-logFiles.map(_.getModificationTime).max
-  } else {
-dir.getModificationTime
+  /**
+   * Load the app log information from a Spark 1.0.0 log directory, for 
backwards compatibility.
+   * This assumes that the log directory contains a single event log file, 
which is the case for
+   * directories generated by the code in that release.
+   */
+  private[history] def openOldLog(dir: Path): (InputStream, String) = {
--- End diff --

Why? EventLoggingListener nor any of its callers need to deal with legacy 
event logs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2612#issuecomment-58224711
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21382/consoleFull)
 for   PR 2612 at commit 
[`cfaa176`](https://github.com/apache/spark/commit/cfaa176a299b4c7b3f02e7dc8bf35627997021c5).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2612#issuecomment-58224721
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21382/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-07 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2650#issuecomment-58224804
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-10-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r18535195
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -688,41 +691,34 @@ private[spark] class Master(
   def rebuildSparkUI(app: ApplicationInfo): Boolean = {
 val appName = app.desc.name
 val notFoundBasePath = HistoryServer.UI_PATH_PREFIX + /not-found
-val eventLogDir = app.desc.eventLogDir.getOrElse {
-  // Event logging is not enabled for this application
-  app.desc.appUiUrl = notFoundBasePath
-  return false
-}
-
-val appEventLogDir = EventLoggingListener.getLogDirPath(eventLogDir, 
app.id)
-val fileSystem = Utils.getHadoopFileSystem(appEventLogDir,
-  SparkHadoopUtil.get.newConfiguration(conf))
-val eventLogInfo = 
EventLoggingListener.parseLoggingInfo(appEventLogDir, fileSystem)
-val eventLogPaths = eventLogInfo.logPaths
-val compressionCodec = eventLogInfo.compressionCodec
-
-if (eventLogPaths.isEmpty) {
-  // Event logging is enabled for this application, but no event logs 
are found
-  val title = sApplication history not found (${app.id})
-  var msg = sNo event logs found for application $appName in 
$appEventLogDir.
-  logWarning(msg)
-  msg +=  Did you specify the correct logging directory?
-  msg = URLEncoder.encode(msg, UTF-8)
-  app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title
-  return false
-}
+val eventLogFile = app.desc.eventLogFile.getOrElse { return false }
 
 try {
-  val replayBus = new ReplayListenerBus(eventLogPaths, fileSystem, 
compressionCodec)
-  val ui = new SparkUI(new SparkConf, replayBus, appName +  
(completed),
-HistoryServer.UI_PATH_PREFIX + s/${app.id})
-  replayBus.replay()
+  val fs = Utils.getHadoopFileSystem(eventLogFile, hadoopConf)
+  val (logInput, sparkVersion) = EventLoggingListener.openEventLog(new 
Path(eventLogFile), fs)
+  val replayBus = new ReplayListenerBus()
--- End diff --

No, because I changed that signature. The event stream is now passed in the 
`replay()` method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-07 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2649#issuecomment-58225142
  
Ah so you really mean when using viewFS.  You can use federation without 
viewfs.   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...

2014-10-07 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/2673#issuecomment-58225042
  
Hi @pwendell, I had a similar issue related to artifacts in Maven Central 
and Hadoop versions.
Could you take a look at 
[SPARK-3764](https://issues.apache.org/jira/browse/SPARK-3764) and #2638 please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-10-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r18535305
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -688,41 +691,34 @@ private[spark] class Master(
   def rebuildSparkUI(app: ApplicationInfo): Boolean = {
 val appName = app.desc.name
 val notFoundBasePath = HistoryServer.UI_PATH_PREFIX + /not-found
-val eventLogDir = app.desc.eventLogDir.getOrElse {
-  // Event logging is not enabled for this application
-  app.desc.appUiUrl = notFoundBasePath
-  return false
-}
-
-val appEventLogDir = EventLoggingListener.getLogDirPath(eventLogDir, 
app.id)
-val fileSystem = Utils.getHadoopFileSystem(appEventLogDir,
-  SparkHadoopUtil.get.newConfiguration(conf))
-val eventLogInfo = 
EventLoggingListener.parseLoggingInfo(appEventLogDir, fileSystem)
-val eventLogPaths = eventLogInfo.logPaths
-val compressionCodec = eventLogInfo.compressionCodec
-
-if (eventLogPaths.isEmpty) {
-  // Event logging is enabled for this application, but no event logs 
are found
-  val title = sApplication history not found (${app.id})
-  var msg = sNo event logs found for application $appName in 
$appEventLogDir.
-  logWarning(msg)
-  msg +=  Did you specify the correct logging directory?
-  msg = URLEncoder.encode(msg, UTF-8)
-  app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title
-  return false
-}
+val eventLogFile = app.desc.eventLogFile.getOrElse { return false }
 
 try {
-  val replayBus = new ReplayListenerBus(eventLogPaths, fileSystem, 
compressionCodec)
-  val ui = new SparkUI(new SparkConf, replayBus, appName +  
(completed),
-HistoryServer.UI_PATH_PREFIX + s/${app.id})
-  replayBus.replay()
+  val fs = Utils.getHadoopFileSystem(eventLogFile, hadoopConf)
+  val (logInput, sparkVersion) = EventLoggingListener.openEventLog(new 
Path(eventLogFile), fs)
+  val replayBus = new ReplayListenerBus()
+  val ui = new SparkUI(new SparkConf, replayBus, appName +  
(completed), /history/ + app.id)
+  try {
+replayBus.replay(logInput, sparkVersion)
+  } finally {
+logInput.close()
+  }
+
   appIdToUI(app.id) = ui
   webUi.attachSparkUI(ui)
   // Application UI is successfully rebuilt, so link the Master UI to 
it
-  app.desc.appUiUrl = ui.getBasePath
+  app.desc.appUiUrl = ui.basePath
   true
 } catch {
+  case fnf: FileNotFoundException =
+// Event logging is enabled for this application, but no event 
logs are found
+val title = sApplication history not found (${app.id})
+var msg = sNo event logs found for application $appName in 
$eventLogFile.
+logWarning(msg)
+msg +=  Did you specify the correct logging directory?
+msg = URLEncoder.encode(msg, UTF-8)
+app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title
+false
--- End diff --

I disagree. `if (file exists)` checks are racy, and entail more RPCs to the 
NN. And we're really interested in handling that particular exception, so I 
don't see any advantage in the explicit check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-10-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r18535329
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -58,43 +61,79 @@ private[spark] class EventLoggingListener(
   private val shouldOverwrite = 
sparkConf.getBoolean(spark.eventLog.overwrite, false)
   private val testing = sparkConf.getBoolean(spark.eventLog.testing, 
false)
   private val outputBufferSize = 
sparkConf.getInt(spark.eventLog.buffer.kb, 100) * 1024
-  val logDir = EventLoggingListener.getLogDirPath(logBaseDir, appId)
-  val logDirName: String = logDir.split(/).last
-  protected val logger = new FileLogger(logDir, sparkConf, hadoopConf, 
outputBufferSize,
-shouldCompress, shouldOverwrite, Some(LOG_FILE_PERMISSIONS))
+  private val fileSystem = Utils.getHadoopFileSystem(new URI(logBaseDir), 
hadoopConf)
+
+  // Only defined if the file system scheme is not local
+  private var hadoopDataStream: Option[FSDataOutputStream] = None
+
+  // The Hadoop APIs have changed over time, so we use reflection to 
figure out
+  // the correct method to use to flush a hadoop data stream. See 
SPARK-1518
+  // for details.
+  private val hadoopFlushMethod = {
--- End diff --

This is how the code was before. I'm just moving it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...

2014-10-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2661#issuecomment-58225743
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21383/consoleFull)
 for   PR 2661 at commit 
[`8b64bb7`](https://github.com/apache/spark/commit/8b64bb7feb0ddea9f573cabfd96150bce673aa31).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...

2014-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2661#issuecomment-58225756
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21383/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >