date:20140928

[GitHub] spark pull request: SPARK-3711: Optimize where in clause filter qu...

2014-09-28 Thread saucam

GitHub user saucam opened a pull request:

https://github.com/apache/spark/pull/2561

SPARK-3711: Optimize where in clause filter queries

The In case class is replaced by a InSet class in case all the filters are 
literals, which uses a hashset instead of Sequence, thereby giving significant 
performance improvement (earlier the seq was using a worst case linear match 
(exists method) since expressions were assumed in the filter list) . Maximum 
improvement should be visible in case small percentage of large data matches 
the filter list.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/saucam/spark branch-1.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2561.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2561


commit bee98aadcea7cb8fa6402d72af45aef2a4de8c19
Author: Yash Datta yash.da...@guavus.com
Date:   2014-09-28T05:54:49Z

SPARK-3711: Optimize where in clause filter queries




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3711: Optimize where in clause filter qu...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2561#issuecomment-57076163
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2542#issuecomment-57076212
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20929/consoleFull)
 for   PR 2542 at commit 
[`e9cd8be`](https://github.com/apache/spark/commit/e9cd8be5b69af54c1de3219cc8f2c0ad1718615a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-57076438
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20930/consoleFull)
 for   PR 1290 at commit 
[`804c07a`](https://github.com/apache/spark/commit/804c07a3abd6a0e81d0f04b4a08f88df29cad357).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2514#issuecomment-57076888
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20928/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2514#issuecomment-57076884
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20928/consoleFull)
 for   PR 2514 at commit 
[`6f3c302`](https://github.com/apache/spark/commit/6f3c30263560853c4cfb5b65b74bce3e39801e05).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class IndexedRecordToJavaConverter extends Converter[IndexedRecord, 
JMap[String, Any]]`
  * `class AvroWrapperToJavaConverter extends Converter[Any, Any] `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-3525] Adding gradient boosting

2014-09-28 Thread epahomov

Github user epahomov commented on the pull request:

https://github.com/apache/spark/pull/2394#issuecomment-57077001
  
Sorry for such messy pull request, I didn't review my student code close 
enough. Would try my best next time. We'll fix everything by the middle of the 
week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3705][SQL]add case for VoidObjectInspec...

2014-09-28 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/2552#issuecomment-57077090
  
I tested for the time out issue, https://github.com/apache/spark/pull/1689 
lead to this issue, but have not found the root cause


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...

2014-09-28 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/2514#issuecomment-57077291
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2514#issuecomment-57077406
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20931/consoleFull)
 for   PR 2514 at commit 
[`6f3c302`](https://github.com/apache/spark/commit/6f3c30263560853c4cfb5b65b74bce3e39801e05).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-2377] Python API for Streaming

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2538#issuecomment-57077576
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20932/consoleFull)
 for   PR 2538 at commit 
[`847f9b9`](https://github.com/apache/spark/commit/847f9b9faba9f9e6af20c9f5e72e68bc9eb52f4d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-57077703
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20930/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-57077701
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20930/consoleFull)
 for   PR 1290 at commit 
[`804c07a`](https://github.com/apache/spark/commit/804c07a3abd6a0e81d0f04b4a08f88df29cad357).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OutputCanvas2D(wd: Int, ht: Int) extends Canvas `
  * `class OutputFrame2D( title: String ) extends Frame( title ) `
  * `class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends 
Canvas `
  * `class OutputFrame3D(title: String, shadowFrac: Double) extends 
Frame(title) `
  * `class IndexedRecordToJavaConverter extends Converter[IndexedRecord, 
JMap[String, Any]]`
  * `class AvroWrapperToJavaConverter extends Converter[Any, Any] `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3543] TaskContext remaining cleanup wor...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2560#issuecomment-57078093
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20927/consoleFull)**
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3543] TaskContext remaining cleanup wor...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2560#issuecomment-57078096
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20927/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3699: SQL and Hive console tasks now cle...

2014-09-28 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/2547#issuecomment-57078294
  
Thanks. I've merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3543] TaskContext remaining cleanup wor...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2560#issuecomment-57078276
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/173/consoleFull)
 for   PR 2560 at commit 
[`9eff95a`](https://github.com/apache/spark/commit/9eff95afe6051b264854b415b5d305dc9e4bf3ef).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3699: SQL and Hive console tasks now cle...

2014-09-28 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2547


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2542#issuecomment-57078355
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20929/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2542#issuecomment-57078351
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20929/consoleFull)**
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2514#issuecomment-57078720
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20931/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3032][Shuffle] Fix key comparison integ...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2514#issuecomment-57078715
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20931/consoleFull)
 for   PR 2514 at commit 
[`6f3c302`](https://github.com/apache/spark/commit/6f3c30263560853c4cfb5b65b74bce3e39801e05).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class IndexedRecordToJavaConverter extends Converter[IndexedRecord, 
JMap[String, Any]]`
  * `class AvroWrapperToJavaConverter extends Converter[Any, Any] `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3517]mapPartitions is not correct ...

2014-09-28 Thread witgo

Github user witgo closed the pull request at:

https://github.com/apache/spark/pull/2376


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3517]mapPartitions is not correct ...

2014-09-28 Thread witgo

Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/2376#issuecomment-57078975
  
I temporarily can not reproduce it,  and close this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support

2014-09-28 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/2344#issuecomment-57079061
  
Hive is really strange dealing with date, there's still some inconsistency 
between, but I think it is better to update in following PRs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2344#issuecomment-57079131
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20933/consoleFull)
 for   PR 2344 at commit 
[`f4058ab`](https://github.com/apache/spark/commit/f4058ab1a185b3dc3fb5fe1522ef5b481601d873).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2542#issuecomment-57079256
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/174/consoleFull)
 for   PR 2542 at commit 
[`e9cd8be`](https://github.com/apache/spark/commit/e9cd8be5b69af54c1de3219cc8f2c0ad1718615a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3412][SQL]add missing row api

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2529#issuecomment-57079526
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20934/consoleFull)
 for   PR 2529 at commit 
[`4c18c29`](https://github.com/apache/spark/commit/4c18c29faedd52ca6a3d925ea039841b860862f7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-2377] Python API for Streaming

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2538#issuecomment-57079953
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20932/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-2377] Python API for Streaming

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2538#issuecomment-57079950
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20932/consoleFull)**
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-28 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18128789
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
+val propsWithEnvVars : mutable.Map[String,String] = new 
mutable.HashMap() ++ systemPropertyConfig ++ legacyEnvVars
+  .map( {case(varName, propName) = (environmentConfig.get(varName), 
propName) })
+  .filter( {case(varVariable, _) = varVariable.isDefined  
!varVariable.get.isEmpty} )
+  .map{case(varVariable, propName) = (propName, varVariable.get)}
+
+val ConfigSources  = additionalConfigs ++ Seq (
+  environmentConfig,
+  propsWithEnvVars,
+  sparkDefaultConfig,
+  hardCodedDefaultConfig.get
+)
+
+// Load properties file at

[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] make i...

2014-09-28 Thread witgo

Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/2524#discussion_r18128796
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -112,6 +112,10 @@ class DAGScheduler(
   //   stray messages to detect.
   private val failedEpoch = new HashMap[String, Long]
 
+  // stageId = (SplitId - (accumulatorId, accumulatorValue))
+  private[scheduler] val stageIdToAccumulators = new HashMap[Int,
--- End diff --

This may cause a memory leak?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3543] TaskContext remaining cleanup wor...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2560#issuecomment-57080788
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/173/consoleFull)**
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-28 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18128856
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
+val propsWithEnvVars : mutable.Map[String,String] = new 
mutable.HashMap() ++ systemPropertyConfig ++ legacyEnvVars
+  .map( {case(varName, propName) = (environmentConfig.get(varName), 
propName) })
+  .filter( {case(varVariable, _) = varVariable.isDefined  
!varVariable.get.isEmpty} )
+  .map{case(varVariable, propName) = (propName, varVariable.get)}
+
+val ConfigSources  = additionalConfigs ++ Seq (
+  environmentConfig,
+  propsWithEnvVars,
+  sparkDefaultConfig,
+  hardCodedDefaultConfig.get
+)
+
+// Load properties file at

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-28 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18128941
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
--- End diff --

Ok my way was even getting me confused.  Lets use your suggested code and 
treat legacy env variables at the same priority as normal environment variables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail:

[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2344#issuecomment-57081729
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20933/consoleFull)**
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2344#issuecomment-57081730
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20933/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2542#issuecomment-57081837
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/174/consoleFull)**
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...

2014-09-28 Thread uncleGen

GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/2562

[SPARK-3712][STREAMING]: add a new UpdateDStream to update a rdd dynamically

Maybe, we can achieve the aim by using forEachRdd  function. But it is 
weird in this way, because I need to pass a closure, like this:

val baseRdd = ...
var updatedRDD = ...
val inputStream = ...

val func = (rdd: RDD[T], t: Time) = {
 updatedRDD = baseRDD.op(rdd)
}

inputStream.foreachRDD(func _)

In my PR, we can update a rdd like:

val updateStream = inputStream.updateRDD(baseRDD, func).asInstanceOf[U, 
V, T]

and obtain the updatedRDD like this:

val updatedRDD = updateStream.getUpdatedRDD

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark master-clean-14928

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2562.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2562


commit 265c941fe1b7cd164eef11c58f622a0c434a229b
Author: uncleGen husty...@gmail.com
Date:   2014-09-28T07:48:20Z

[STREAMING]: add a new UpdateDStream to update a rdd dynamically

commit b5cdb62410c3461115e76a9549f160460b63b8fb
Author: uncleGen husty...@gmail.com
Date:   2014-09-28T10:37:40Z

fix test

commit 41d9a952d39f8bc64a38312856ab57e304a59382
Author: uncleGen husty...@gmail.com
Date:   2014-09-28T10:40:37Z

clerical error




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3707] [SQL] Fix bug of type coercion in...

2014-09-28 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2559#issuecomment-57082094
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3412][SQL]add missing row api

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2529#issuecomment-57082151
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20934/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2562#issuecomment-57082146
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20935/consoleFull)
 for   PR 2562 at commit 
[`41d9a95`](https://github.com/apache/spark/commit/41d9a952d39f8bc64a38312856ab57e304a59382).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3412][SQL]add missing row api

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2529#issuecomment-57082150
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20934/consoleFull)**
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3707] [SQL] Fix bug of type coercion in...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2559#issuecomment-57082251
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20936/consoleFull)
 for   PR 2559 at commit 
[`199a85d`](https://github.com/apache/spark/commit/199a85d2e7ef482f3c0ac9cacc4dbeb2a21d5901).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3658][SQL]Take thrift server as a daemo...

2014-09-28 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/2509#issuecomment-57082932
  
@liancheng I tried `export` and it worked. Thanks for the suggestion.
Also modified permission of `stop-thriftserver.sh`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3658][SQL]Take thrift server as a daemo...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2509#issuecomment-57083074
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20937/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2562#issuecomment-57083228
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20935/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2562#issuecomment-57083227
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20935/consoleFull)
 for   PR 2562 at commit 
[`41d9a95`](https://github.com/apache/spark/commit/41d9a952d39f8bc64a38312856ab57e304a59382).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class UpdateDStream[U: ClassTag, T: ClassTag, V: ClassTag](`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-09-28 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/2471#issuecomment-57083376
  
Thanks for your options. @vanzin @andrewor14 .i have changed code according 
your options.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...

2014-09-28 Thread uncleGen

Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/2562#issuecomment-57083416
  
Test failure appears to be unrelated to my patch. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2542#issuecomment-57083428
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/175/consoleFull)
 for   PR 2542 at commit 
[`e9cd8be`](https://github.com/apache/spark/commit/e9cd8be5b69af54c1de3219cc8f2c0ad1718615a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

2014-09-28 Thread liancheng

GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/2563

[SPARK-3713][SQL] Uses JSON to serialize DataType objects

This PR uses JSON instead of `toString` to serialize `DataType`s. The 
latter is not only hard to parse but also flaky in many cases.

Since we already write schema information to Parquet metadata in the old 
style, we have to reserve the old `DataType` parser and ensure downward 
compatibility. The old parser is now renamed to `CaseClassStringParser` and 
moved into `object DataType`.

@JoshRosen @davis Please help review PySpark related changes, thanks!

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark datatype-to-json

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2563.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2563


commit dca9153d213a9a9603d7b327d78750af66021ed2
Author: Cheng Lian lian.cs@gmail.com
Date:   2014-09-25T09:28:06Z

De/serializes DataType objects from/to JSON

commit 5f792df158128f6bf41a49e816a915150698a9d2
Author: Cheng Lian lian.cs@gmail.com
Date:   2014-09-28T11:19:34Z

Adds PySpark support

commit 26c6563ab1f7bc9c063da44ecfcb31dff65a3bf1
Author: Cheng Lian lian.cs@gmail.com
Date:   2014-09-28T11:54:26Z

Adds compatibility est case for Parquet type conversion




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3707] [SQL] Fix bug of type coercion in...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2559#issuecomment-57084958
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20936/consoleFull)**
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3707] [SQL] Fix bug of type coercion in...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2559#issuecomment-57084960
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20936/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2563#issuecomment-57084987
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20938/consoleFull)
 for   PR 2563 at commit 
[`26c6563`](https://github.com/apache/spark/commit/26c6563ab1f7bc9c063da44ecfcb31dff65a3bf1).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2563#issuecomment-57084988
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20938/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3421][SQL] Allows arbitrary character i...

2014-09-28 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2291#issuecomment-57085068
  
PR #2563 supersedes this one. Closing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3421][SQL] Allows arbitrary character i...

2014-09-28 Thread liancheng

Github user liancheng closed the pull request at:

https://github.com/apache/spark/pull/2291


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...

2014-09-28 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/2562#discussion_r18129623
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
@@ -602,6 +602,18 @@ abstract class DStream[T: ClassTag] (
   }
 
   /**
+   * Return a new UpdateDStream in which each RDD is used to update the 
original rdd by
+   * applying a function on each RDD of 'this' DStream.
+   */
+  def updateRDD[U: ClassTag, V: ClassTag](
+  rdd: RDD[V],
+  updateFunc: (Option[RDD[T]], RDD[V]) = RDD[U]
+): DStream[T] = {
+val cleanF = ssc.sparkContext.clean(updateFunc)
+new UpdateDStream[U, T, V](this, cleanF, rdd).register()
--- End diff --

Hi @uncleGen , I'm not sure why do you need to register this DStream? looks 
like your `updateRDD` operator is a transformation DStream, not a action 
DStream, I don't think you need to call register. It's only for output DStream 
like `ForEachDStream`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...

2014-09-28 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/2562#issuecomment-57085435
  
I think this can be done by `foreachRDD` or `transform` as you said, I'm 
not sure what is your purpose to do so?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3658][SQL]Take thrift server as a daemo...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2509#issuecomment-57085586
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/176/consoleFull)
 for   PR 2509 at commit 
[`5dcaab2`](https://github.com/apache/spark/commit/5dcaab2d4ef6c279872aa65e62c1be5456858c6c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3658][SQL]Take thrift server as a daemo...

2014-09-28 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2509#issuecomment-57085803
  
After updating my local repo, I found that `stop-thriftserver.sh` is still 
not executable. Make sure to `git add` this file after `chmod +x`. This is the 
only pending issue from my perspective.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2542#issuecomment-57086391
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/175/consoleFull)**
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2563#issuecomment-57087294
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20939/consoleFull)
 for   PR 2563 at commit 
[`03da3ec`](https://github.com/apache/spark/commit/03da3ec870940bd6ff56e03450993da6125b40a4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...

2014-09-28 Thread uncleGen

Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/2562#issuecomment-57087576
  
@jerryshao Thanks for your comments! I want to abstract an independent 
DStream to achieve the aim. I feel it is weird to update a rdd by passing a 
closure. Maybe, this patch is not very appropriate, I will close it first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3712][STREAMING]: add a new UpdateDStre...

2014-09-28 Thread uncleGen

Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/2562


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3658][SQL]Take thrift server as a daemo...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2509#issuecomment-57089022
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/176/consoleFull)**
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2432#issuecomment-57089524
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20940/consoleFull)
 for   PR 2432 at commit 
[`6a91b14`](https://github.com/apache/spark/commit/6a91b1448dcbc02516b22edbb2fb4253cf29d5bc).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...

2014-09-28 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/2564

SPARK-2548 [STREAMING] JavaRecoverableWordCount is missing

Here's my attempt to re-port `RecoverableNetworkWordCount` to Java, 
following the example of its Scala and Java siblings. I fixed a few minor 
doc/formatting issues along the way I believe.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-2548

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2564.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2564


commit 179b3c2ca892a8db237ff714147aabf54d7d2b3a
Author: Sean Owen so...@cloudera.com
Date:   2014-09-28T16:16:03Z

Re-port RecoverableNetworkWordCount to Java example, and touch up doc / 
formatting in related examples




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2564#issuecomment-57090626
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20941/consoleFull)
 for   PR 2564 at commit 
[`179b3c2`](https://github.com/apache/spark/commit/179b3c2ca892a8db237ff714147aabf54d7d2b3a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2563#issuecomment-57090930
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20939/consoleFull)**
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2563#issuecomment-57090932
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20939/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2432#issuecomment-57091751
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20940/consoleFull)
 for   PR 2432 at commit 
[`6a91b14`](https://github.com/apache/spark/commit/6a91b1448dcbc02516b22edbb2fb4253cf29d5bc).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class KillExecutor(`
  * `  case class MasterChangeAcknowledged(appId: ApplicationId)`
  * `  case class RegisteredApplication(appId: ApplicationId, masterUrl: 
String) extends DeployMessage`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2432#issuecomment-57091756
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20940/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...

2014-09-28 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2432#issuecomment-57092226
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2432#issuecomment-57092314
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20942/consoleFull)
 for   PR 2432 at commit 
[`6a91b14`](https://github.com/apache/spark/commit/6a91b1448dcbc02516b22edbb2fb4253cf29d5bc).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2564#issuecomment-57092386
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20941/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2564#issuecomment-57092384
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20941/consoleFull)
 for   PR 2564 at commit 
[`179b3c2`](https://github.com/apache/spark/commit/179b3c2ca892a8db237ff714147aabf54d7d2b3a).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public final class JavaRecoverableNetworkWordCount `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-57094501
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20943/consoleFull)
 for   PR 2520 at commit 
[`b9e0bfb`](https://github.com/apache/spark/commit/b9e0bfb693fed6b5befbc40cadb883617670e389).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2432#issuecomment-57094648
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20942/consoleFull)
 for   PR 2432 at commit 
[`6a91b14`](https://github.com/apache/spark/commit/6a91b1448dcbc02516b22edbb2fb4253cf29d5bc).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class KillExecutor(`
  * `  case class MasterChangeAcknowledged(appId: ApplicationId)`
  * `  case class RegisteredApplication(appId: ApplicationId, masterUrl: 
String) extends DeployMessage`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2432#issuecomment-57094652
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20942/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...

2014-09-28 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2564#issuecomment-57095584
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2564#issuecomment-57095758
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20944/consoleFull)
 for   PR 2564 at commit 
[`179b3c2`](https://github.com/apache/spark/commit/179b3c2ca892a8db237ff714147aabf54d7d2b3a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-28 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/2435#issuecomment-57095860
  
@manishamde  Fixed the typo.  I believe I have addressed everything, so 
please let me know if it looks good.  Thank you for the review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2435#issuecomment-57095923
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20945/consoleFull)
 for   PR 2435 at commit 
[`c694174`](https://github.com/apache/spark/commit/c6941748b58f5b77a480cfbc85cdece9ce8dec5a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-57096850
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20943/consoleFull)
 for   PR 2520 at commit 
[`b9e0bfb`](https://github.com/apache/spark/commit/b9e0bfb693fed6b5befbc40cadb883617670e389).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] Scalastyle is neve...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-57096853
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20943/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SQL] Diagnose test timeouts

2014-09-28 Thread marmbrus

GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/2565

[WIP][SQL] Diagnose test timeouts



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark testTimeOut

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2565.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2565


commit a72b75161dd4ea5ee71ce59c09cf79ac717816a9
Author: Michael Armbrust mich...@databricks.com
Date:   2014-09-28T19:33:46Z

Force test run




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SQL] Diagnose test timeouts

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2565#issuecomment-57097221
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20946/consoleFull)
 for   PR 2565 at commit 
[`a72b751`](https://github.com/apache/spark/commit/a72b75161dd4ea5ee71ce59c09cf79ac717816a9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3658][SQL]Take thrift server as a daemo...

2014-09-28 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/2509#issuecomment-57097696
  
eh...I cloned the repository on another laptop and found it's executable, 
as shown in top-left corner of 
https://github.com/WangTaoTheTonic/spark/blob/thriftserver/sbin/stop-thriftserver.sh.

Could you verify this again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...

2014-09-28 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2432#issuecomment-57097830
  
It looks like this most recent test failure is due to Mima:

```
[info] spark-core: found 1 potential binary incompatibilities (filtered 223)
[error]  * synthetic method 
org$apache$spark$SparkContext$$createTaskScheduler(org.apache.spark.SparkContext,java.lang.String)org.apache.spark.scheduler.TaskScheduler
 in object org.apache.spark.SparkContext does not have a correspondent in new 
version
[error]filter with: 
ProblemFilters.exclude[MissingMethodProblem](org.apache.spark.SparkContext.org$apache$spark$SparkContext$$createTaskScheduler)
```

Here's the v1.1.0 definition of `SparkContext.createTaskScheduler` that 
MiMa is comparing against: 
https://github.com/apache/spark/blob/v1.1.0/core/src/main/scala/org/apache/spark/SparkContext.scala#L1478

This MiMa error is confusing because this is a `private` method that's only 
called from SparkContext.  It's also called from 
`SparkContextSchedulerCreationSuite` using reflection.  Unless anyone has an 
objection, I think we should add the

```scala

ProblemFilters.exclude[MissingMethodProblem](org.apache.spark.SparkContext.org$apache$spark$SparkContext$$createTaskScheduler)
```

exclusion, since this method is `private`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3377] [SPARK-3610] Metrics can be accid...

2014-09-28 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2432#discussion_r18132263
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -334,6 +352,8 @@ class SparkContext(config: SparkConf) extends Logging {
 localProperties.set(props)
   }
 
+  def getApplicationId = appId
--- End diff --

I guess the return type of this method is `ApplicationId`, but that class 
is `private[spark]`, so this method is leaking a private type to users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2564#issuecomment-57098005
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20944/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2548 [STREAMING] JavaRecoverableWordCoun...

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2564#issuecomment-57098002
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20944/consoleFull)
 for   PR 2564 at commit 
[`179b3c2`](https://github.com/apache/spark/commit/179b3c2ca892a8db237ff714147aabf54d7d2b3a).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public final class JavaRecoverableNetworkWordCount `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2435#issuecomment-57098176
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20945/consoleFull)
 for   PR 2435 at commit 
[`c694174`](https://github.com/apache/spark/commit/c6941748b58f5b77a480cfbc85cdece9ce8dec5a).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class RandomForestModel(val trees: Array[DecisionTreeModel], val algo: 
Algo) extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2435#issuecomment-57098177
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20945/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-2377] Python API for Streaming

2014-09-28 Thread giwa

Github user giwa commented on a diff in the pull request:

https://github.com/apache/spark/pull/2538#discussion_r18132327
  
--- Diff: python/pyspark/streaming/dstream.py ---
@@ -0,0 +1,632 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from itertools import chain, ifilter, imap
+import operator
+import time
+from datetime import datetime
+
+from pyspark import RDD
+from pyspark.storagelevel import StorageLevel
+from pyspark.streaming.util import rddToFileName, RDDFunction
+from pyspark.rdd import portable_hash
+from pyspark.resultiterable import ResultIterable
+
+__all__ = [DStream]
+
+
+class DStream(object):
+def __init__(self, jdstream, ssc, jrdd_deserializer):
+self._jdstream = jdstream
+self._ssc = ssc
+self.ctx = ssc._sc
+self._jrdd_deserializer = jrdd_deserializer
+self.is_cached = False
+self.is_checkpointed = False
+
+def context(self):
+
+Return the StreamingContext associated with this DStream
+
+return self._ssc
+
+def count(self):
+
+Return a new DStream which contains the number of elements in this 
DStream.
+
+return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
+
+def sum(self):
+
+Add up the elements in this DStream.
+
+return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
+
+def filter(self, f):
+
+Return a new DStream containing only the elements that satisfy 
predicate.
+
+def func(iterator):
+return ifilter(f, iterator)
+return self.mapPartitions(func, True)
+
+def flatMap(self, f, preservesPartitioning=False):
+
+Pass each value in the key-value pair DStream through flatMap 
function
+without changing the keys: this also retains the original RDD's 
partition.
+
+def func(s, iterator):
+return chain.from_iterable(imap(f, iterator))
+return self.mapPartitionsWithIndex(func, preservesPartitioning)
+
+def map(self, f, preservesPartitioning=False):
+
+Return a new DStream by applying a function to each element of 
DStream.
+
+def func(iterator):
+return imap(f, iterator)
+return self.mapPartitions(func, preservesPartitioning)
+
+def mapPartitions(self, f, preservesPartitioning=False):
+
+Return a new DStream by applying a function to each partition of 
this DStream.
+
+def func(s, iterator):
+return f(iterator)
+return self.mapPartitionsWithIndex(func, preservesPartitioning)
+
+def mapPartitionsWithIndex(self, f, preservesPartitioning=False):
+
+Return a new DStream by applying a function to each partition of 
this DStream,
+while tracking the index of the original partition.
+
+return self.transform(lambda rdd: rdd.mapPartitionsWithIndex(f, 
preservesPartitioning))
+
+def reduce(self, func):
+
+Return a new DStream by reduceing the elements of this RDD using 
the specified
+commutative and associative binary operator.
+
+return self.map(lambda x: (None, x)).reduceByKey(func, 
1).map(lambda x: x[1])
+
+def reduceByKey(self, func, numPartitions=None):
+
+Merge the value for each key using an associative reduce function.
+
+This will also perform the merging locally on each mapper before
+sending results to reducer, similarly to a combiner in MapReduce.
+
+Output will be hash-partitioned with C{numPartitions} partitions, 
or
+the default parallelism level if C{numPartitions} is not specified.
+
+return

[GitHub] spark pull request: [WIP] [SPARK-2377] Python API for Streaming

2014-09-28 Thread giwa

Github user giwa commented on a diff in the pull request:

https://github.com/apache/spark/pull/2538#discussion_r18132355
  
--- Diff: examples/src/main/python/streaming/wordcount.py ---
@@ -0,0 +1,21 @@
+import sys
+
+from pyspark import SparkContext
+from pyspark.streaming import StreamingContext
+
+if __name__ == __main__:
+if len(sys.argv) != 2:
+print  sys.stderr, Usage: wordcount directory
+exit(-1)
+
+sc = SparkContext(appName=PythonStreamingWordCount)
+ssc = StreamingContext(sc, 1)
+
+lines = ssc.textFileStream(sys.argv[1])
+counts = lines.flatMap(lambda line: line.split( ))\
+  .map(lambda x: (x, 1))\
+  .reduceByKey(lambda a, b: a+b)
+counts.pyprint()
--- End diff --

counts.pyprint() should be counts.pprint()

```
def pprint(self):

Print the first ten elements of each RDD generated in this DStream. 
This is an output
operator, so this DStream will be registered as an output stream 
and there materialized.

```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-2377] Python API for Streaming

2014-09-28 Thread giwa

Github user giwa commented on a diff in the pull request:

https://github.com/apache/spark/pull/2538#discussion_r18132354
  
--- Diff: examples/src/main/python/streaming/network_wordcount.py ---
@@ -0,0 +1,20 @@
+import sys
+
+from pyspark import SparkContext
+from pyspark.streaming import StreamingContext
+
+if __name__ == __main__:
+if len(sys.argv) != 3:
+print  sys.stderr, Usage: wordcount hostname port
+exit(-1)
+sc = SparkContext(appName=PythonStreamingNetworkWordCount)
+ssc = StreamingContext(sc, 1)
+
+lines = ssc.socketTextStream(sys.argv[1], int(sys.argv[2]))
+counts = lines.flatMap(lambda line: line.split( ))\
+  .map(lambda word: (word, 1))\
+  .reduceByKey(lambda a, b: a+b)
+counts.pyprint()
--- End diff --

counts.pyprint() should be counts.pprint()

```
def pprint(self):

Print the first ten elements of each RDD generated in this DStream. 
This is an output
operator, so this DStream will be registered as an output stream 
and there materialized.

```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL]fix spark sql hive tests time out issue

2014-09-28 Thread scwf

GitHub user scwf opened a pull request:

https://github.com/apache/spark/pull/2566

[SQL]fix spark sql hive tests time out issue

When set ```TestHive.cacheTables = true'''  the 
```correlationoptimizer14``` will be pending which lead to time out, maybe 
there are some issue bebind this. This PR is a quick fix for the hive test 
issue.
 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scwf/spark patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2566.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2566


commit f4897581dcd2878f7af71110305d7d0ef8e3d7e3
Author: wangfei wangf...@huawei.com
Date:   2014-09-28T20:13:50Z

fix spark sql hive test time out issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 219 matches

Mail list logo