date:20141120

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3381#issuecomment-63772820
  
  [Test build #23665 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23665/consoleFull)
 for   PR 3381 at commit 
[`f7c704a`](https://github.com/apache/spark/commit/f7c704af4d615977c43b8f6af87c5166aee0ac03).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3381#issuecomment-63773096
  
  [Test build #23665 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23665/consoleFull)
 for   PR 3381 at commit 
[`f7c704a`](https://github.com/apache/spark/commit/f7c704af4d615977c43b8f6af87c5166aee0ac03).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `final class Date extends Ordered[Date] with Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3381#issuecomment-63773100
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23665/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: add jackson-core-asl 1.8.8 dependency

2014-11-20 Thread devlatte

Github user devlatte commented on the pull request:

https://github.com/apache/spark/pull/3379#issuecomment-63773937
  
It might be related with this issue.
https://issues.apache.org/jira/browse/SPARK-3602


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3374#issuecomment-63774229
  
  [Test build #23663 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23663/consoleFull)
 for   PR 3374 at commit 
[`7097251`](https://github.com/apache/spark/commit/70972515085245957df9601e425141746f268c4b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3374#issuecomment-63774234
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23663/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4505][Core] Add a ClassTag parameter to...

2014-11-20 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3378#issuecomment-63774979
  
We should definitely add a ClassTag since this can be used for primitive 
types. However, there might be places where we create a lot of CompactBuffers. 
I haven't had a chance to look at where CompactBuffers are used yet, but for 
those places, would it be possible to create a single ClassTag reference? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4505][Core] Add a ClassTag parameter to...

2014-11-20 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/3378#issuecomment-63775682
  
Cogroup uses `CompactBuffer`. However, it cannot add ClassTag due to its 
signature:

```Scala
class CoGroupedRDD[K](@transient var rdds: Seq[RDD[_ : Product2[K, _]]], 
part: Partitioner)
  extends RDD[(K, Array[Iterable[_]])](rdds.head.context, Nil)
```
Here `rdds` is `Seq[RDD[_ : Product2[K, _]]]` without the real template 
type of `RDD`s


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-20 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3374#issuecomment-63777328
  
@manishamde @jkbradley Thanks! Merged into master and branch-1.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4439] [MLlib] add python api for random...

2014-11-20 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3320#issuecomment-63777609
  
@davies We updated the `RandomForest` API in #3374 . Now `RandomForest` 
returns a `RandomForestModel`. Could you rebase and update this PR? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...

2014-11-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3374


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4481][Streaming][Doc] Fix the wrong des...

2014-11-20 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3376#issuecomment-63779898
  
I have merged this. Thanks for the backport!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4481][Streaming][Doc] Fix the wrong des...

2014-11-20 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3376#issuecomment-63780029
  
Github wont close this automatically, so could you please close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4510][MLlib]: Add k-medoids Partitionin...

2014-11-20 Thread fjiang6

GitHub user fjiang6 opened a pull request:

https://github.com/apache/spark/pull/3382

[SPARK-4510][MLlib]: Add k-medoids Partitioning Around Medoids (PAM) 
algorithm

PAM (k-medoids) including the test case and an example.
Passed the style checks
Tested and compared with K-Means in MLlib, showing more steady performances.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Huawei-Spark/spark PAM

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3382.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3382


commit 95cd43e21a5e4499fd63125dde6973a4271a0de2
Author: Jiang Fan fjia...@gmail.com
Date:   2014-11-20T02:59:18Z

add PAM algorithm with an example

commit 8721fc2d0fead9e72427909d5dab455e7dcd67f9
Author: Jiang Fan fjia...@gmail.com
Date:   2014-11-20T03:05:43Z

add newline at end of file

commit 9b4131a3fee5e9cd5a7ac58c7718b78236412f7e
Author: Jiang Fan fjia...@gmail.com
Date:   2014-11-20T05:05:06Z

add the PAMSuite.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4481][Streaming][Doc] Fix the wrong des...

2014-11-20 Thread zsxwing

Github user zsxwing closed the pull request at:

https://github.com/apache/spark/pull/3376


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4510][MLlib]: Add k-medoids Partitionin...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3382#issuecomment-63781094
  
  [Test build #23666 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23666/consoleFull)
 for   PR 3382 at commit 
[`9b4131a`](https://github.com/apache/spark/commit/9b4131a3fee5e9cd5a7ac58c7718b78236412f7e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4510][MLlib]: Add k-medoids Partitionin...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3382#issuecomment-63791400
  
  [Test build #23666 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23666/consoleFull)
 for   PR 3382 at commit 
[`9b4131a`](https://github.com/apache/spark/commit/9b4131a3fee5e9cd5a7ac58c7718b78236412f7e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class Params(`
  * `class PAM (`
  * `class PAMModel (val clusterCenters: Array[Vector]) extends 
Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4510][MLlib]: Add k-medoids Partitionin...

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3382#issuecomment-63791407
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23666/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...

2014-11-20 Thread liancheng

GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/3383

[SPARK-3938][SQL] Names in-memory columnar RDD with corresponding table name

This PR enables the Web UI storage tab to show the in-memory table name 
instead of the mysterious query plan string as the name of the in-memory 
columnar RDD.

Note that after #2501, a single columnar RDD can be shared with multiple 
in-memory tables, as long as their query results are the same. In this case, 
only the first cached table name is shown. For example:

```sql
CACHE TABLE first AS SELECT * FROM src;
CACHE TABLE second AS SELECT * FROM src;
```

The Web UI only shows In-memory table first.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark columnar-rdd-name

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3383.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3383


commit 12ddfa6eea5b0b0b7ebbb8fffd802d87e6727493
Author: Cheng Lian l...@databricks.com
Date:   2014-11-20T10:35:57Z

Names in-memory columnar RDD with corresponding table name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3383#issuecomment-63792868
  
  [Test build #23667 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23667/consoleFull)
 for   PR 3383 at commit 
[`12ddfa6`](https://github.com/apache/spark/commit/12ddfa6eea5b0b0b7ebbb8fffd802d87e6727493).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3383#issuecomment-63793690
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23667/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3383#issuecomment-63793685
  
  [Test build #23667 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23667/consoleFull)
 for   PR 3383 at commit 
[`12ddfa6`](https://github.com/apache/spark/commit/12ddfa6eea5b0b0b7ebbb8fffd802d87e6727493).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: add jackson-core-asl 1.8.8 dependency

2014-11-20 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3379#issuecomment-63796464
  
This is basically the same as 
https://issues.apache.org/jira/browse/SPARK-3955 actually, which already has an 
open PR: https://github.com/apache/spark/pull/2818  Maybe comment on that PR 
instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3237#issuecomment-63803519
  
  [Test build #23668 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23668/consoleFull)
 for   PR 3237 at commit 
[`5fd7afd`](https://github.com/apache/spark/commit/5fd7afd6a0c724151340719e2b017357e042300c).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3237#issuecomment-63808836
  
  [Test build #23669 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23669/consoleFull)
 for   PR 3237 at commit 
[`3abdb1b`](https://github.com/apache/spark/commit/3abdb1b24aa48f21e7eed1232c01d3933873688c).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...

2014-11-20 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/3383#issuecomment-63810342
  
+10 (haven't looked at PR itself)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3237#issuecomment-63810730
  
  [Test build #23670 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23670/consoleFull)
 for   PR 3237 at commit 
[`b589a4b`](https://github.com/apache/spark/commit/b589a4b94c470f10aec0cc778060cd49470354d5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3383#issuecomment-63814934
  
  [Test build #23671 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23671/consoleFull)
 for   PR 3383 at commit 
[`071907f`](https://github.com/apache/spark/commit/071907f10d9e943370ff76f4b00d7abdc1db1017).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Merge pull request #1 from apache/master

2014-11-20 Thread codeAshu

GitHub user codeAshu opened a pull request:

https://github.com/apache/spark/pull/3384

Merge pull request #1 from apache/master

updating Fork

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/codeAshu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3384.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3384


commit 878e816bd8b9a6348d8e82194e329c1cefaca8bc
Author: Ashutosh Trivedi rusty.iceb...@gmail.com
Date:   2014-10-24T07:35:52Z

Merge pull request #1 from apache/master

updating Fork




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Merge pull request #1 from apache/master

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3384#issuecomment-63815959
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Merge pull request #1 from apache/master

2014-11-20 Thread codeAshu

Github user codeAshu closed the pull request at:

https://github.com/apache/spark/pull/3384


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Merge pull request #1 from apache/master

2014-11-20 Thread codeAshu

Github user codeAshu commented on the pull request:

https://github.com/apache/spark/pull/3384#issuecomment-63816089
  
sorry my bad


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Branch 1.2

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3385#issuecomment-63817444
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Branch 1.2

2014-11-20 Thread codeAshu

GitHub user codeAshu opened a pull request:

https://github.com/apache/spark/pull/3385

Branch 1.2



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-1.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3385.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3385


commit a68321400c1068449698d03cebd0fbf648627133
Author: Xiangrui Meng m...@databricks.com
Date:   2014-11-03T20:24:24Z

[SPARK-4148][PySpark] fix seed distribution and add some tests for 
rdd.sample

The current way of seed distribution makes the random sequences from 
partition i and i+1 offset by 1.

~~~
In [14]: import random

In [15]: r1 = random.Random(10)

In [16]: r1.randint(0, 1)
Out[16]: 1

In [17]: r1.random()
Out[17]: 0.4288890546751146

In [18]: r1.random()
Out[18]: 0.5780913011344704

In [19]: r2 = random.Random(10)

In [20]: r2.randint(0, 1)
Out[20]: 1

In [21]: r2.randint(0, 1)
Out[21]: 0

In [22]: r2.random()
Out[22]: 0.5780913011344704
~~~

Note: The new tests are not for this bug fix.

Author: Xiangrui Meng m...@databricks.com

Closes #3010 from mengxr/SPARK-4148 and squashes the following commits:

869ae4b [Xiangrui Meng] move tests tests.py
c1bacd9 [Xiangrui Meng] fix seed distribution and add some tests for 
rdd.sample

(cherry picked from commit 3cca1962207745814b9d83e791713c91b659c36c)
Signed-off-by: Xiangrui Meng m...@databricks.com

commit fc782896b5d51161feee950107df2acf17e12422
Author: fi code...@gmail.com
Date:   2014-11-03T20:56:56Z

[SPARK-4211][Build] Fixes hive.version in Maven profile hive-0.13.1

instead of `hive.version=0.13.1`.
e.g. mvn -Phive -Phive=0.13.1

Note: `hive.version=0.13.1a` is the default property value. However, when 
explicitly specifying the `hive-0.13.1` maven profile, the wrong one would be 
selected.
References:  PR #2685, which resolved a package incompatibility issue with 
Hive-0.13.1 by introducing a special version Hive-0.13.1a

Author: fi code...@gmail.com

Closes #3072 from coderfi/master and squashes the following commits:

7ca4b1e [fi] Fixes the `hive-0.13.1` maven profile referencing 
`hive.version=0.13.1` instead of the Spark compatible `hive.version=0.13.1a` 
Note: `hive.version=0.13.1a` is the default version. However, when explicitly 
specifying the `hive-0.13.1` maven profile, the wrong one would be selected. 
e.g. mvn -Phive -Phive=0.13.1 See PR #2685

(cherry picked from commit df607da025488d6c924d3d70eddb67f5523080d3)
Signed-off-by: Michael Armbrust mich...@databricks.com

commit 292da4ef25d6cce23bfde7b9ab663a574dfd2b00
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-11-03T21:07:41Z

[SPARK-4207][SQL] Query which has syntax like 'not like' is not working in 
Spark SQL

Queries which has 'not like' is not working spark sql.

sql(SELECT * FROM records where value not like 'val%')
 same query works in Spark HiveQL

Author: ravipesala ravindra.pes...@huawei.com

Closes #3075 from ravipesala/SPARK-4207 and squashes the following commits:

35c11e7 [ravipesala] Supported 'not like' syntax in sql

(cherry picked from commit 2b6e1ce6ee7b1ba8160bcbee97f5bbff5c46ca09)
Signed-off-by: Michael Armbrust mich...@databricks.com

commit cc5dc4247979dc001302f7af978801b789acdbfa
Author: Davies Liu davies@gmail.com
Date:   2014-11-03T21:17:09Z

[SPARK-3594] [PySpark] [SQL] take more rows to infer schema or sampling

This patch will try to infer schema for RDD which has empty value (None, 
[], {}) in the first row. It will try first 100 rows and merge the types into 
schema, also merge fields of StructType together. If there is still NullType in 
schema, then it will show an warning, tell user to try with sampling.

If sampling is presented, it will infer schema from all the rows after 
sampling.

Also, add samplingRatio for jsonFile() and jsonRDD()

Author: Davies Liu davies@gmail.com
Author: Davies Liu dav...@databricks.com

Closes #2716 from davies/infer and squashes the following commits:

e678f6d [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
infer
34b5c63 [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
infer
567dc60 [Davies Liu] update docs
9767b27 [Davies Liu] Merge branch 'master' into infer
e48d7fb [Davies Liu] fix tests
29e94d5 [Davies Liu] let NullType inherit from PrimitiveType
ee5d524 [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
infer
540d1d5 [Davies Liu] merge fields for StructType

[GitHub] spark pull request: Branch 1.2

2014-11-20 Thread codeAshu

Github user codeAshu closed the pull request at:

https://github.com/apache/spark/pull/3385


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3237#issuecomment-63817831
  
  [Test build #23668 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23668/consoleFull)
 for   PR 3237 at commit 
[`5fd7afd`](https://github.com/apache/spark/commit/5fd7afd6a0c724151340719e2b017357e042300c).
 * This patch **passes all tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3237#issuecomment-63817840
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23668/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-4512] [SQL] Unresolved Attribute Except...

2014-11-20 Thread chenghao-intel

GitHub user chenghao-intel opened a pull request:

https://github.com/apache/spark/pull/3386

[Spark-4512] [SQL] Unresolved Attribute Exception in Sort By

It will cause exception while do query like:
SELECT key+key FROM src sort by value;

This fix is inspired by #3363 , hope it goes in after #3363 merged. And 
I've removed the `logical.SortPartitions` and added a new attribute `global` 
for `logical.Sort`, the reason we do that is for sharing 
`ResolveSortReferences` for both `ORDER BY` and `SORT BY`.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenghao-intel/spark sort

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3386.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3386


commit 503b263408924df42b26689a6f8b913a5185c10d
Author: Cheng Hao hao.ch...@intel.com
Date:   2014-11-20T14:40:42Z

Remove the logical.SortPartitions and Add global sort flag for logical.Sort




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Add example that reads a local file, writes to...

2014-11-20 Thread rnowling

Github user rnowling commented on the pull request:

https://github.com/apache/spark/pull/3347#issuecomment-63820267
  
@andrewor14 , thanks for the comments!  I believe I fixed everything except 
for the changing the name of the example.  I wanted some more feedback.

I wrote the example to test reading and writing to a DFS.  It does so by 
comparing the result of word count on a local file to word count on the file 
after copying to the DFS.  I'm using it to make sure that DFSs are configured 
properly and accessible by all nodes. 

Do you still want me to drop the test suffix?  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Add example that reads a local file, writes to...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3347#issuecomment-63820290
  
  [Test build #23672 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23672/consoleFull)
 for   PR 3347 at commit 
[`b0ef9ea`](https://github.com/apache/spark/commit/b0ef9ea387e031deddbe1ffda833d98eb5f42e08).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-4512] [SQL] Unresolved Attribute Except...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3386#issuecomment-63820723
  
  [Test build #23673 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23673/consoleFull)
 for   PR 3386 at commit 
[`503b263`](https://github.com/apache/spark/commit/503b263408924df42b26689a6f8b913a5185c10d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-63821052
  
  [Test build #23674 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23674/consoleFull)
 for   PR 3222 at commit 
[`af8fbb3`](https://github.com/apache/spark/commit/af8fbb3309e5e36c0ec3332c590cec5a1bcb30e0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...

2014-11-20 Thread giwa

Github user giwa commented on the pull request:

https://github.com/apache/spark/pull/3237#issuecomment-63822056
  
@tdas @watermen 

I believe python code become like this.

```
def pprint(self, num):

Print the first num elements of each RDD generated in this DStream.

def takeAndPrint(time, rdd):
taken = rdd.take(num + 1)
print ---
print Time: %s % time
print ---
for record in taken[:num]:
print record
if len(taken)  num:
print ...
print

self.foreachRDD(takeAndPrint)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-4512] [SQL] Unresolved Attribute Except...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3386#issuecomment-63822505
  
  [Test build #23673 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23673/consoleFull)
 for   PR 3386 at commit 
[`503b263`](https://github.com/apache/spark/commit/503b263408924df42b26689a6f8b913a5185c10d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Sort(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-4512] [SQL] Unresolved Attribute Except...

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3386#issuecomment-63822515
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23673/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3237#issuecomment-63824115
  
  [Test build #23670 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23670/consoleFull)
 for   PR 3237 at commit 
[`b589a4b`](https://github.com/apache/spark/commit/b589a4b94c470f10aec0cc778060cd49470354d5).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3237#issuecomment-63824131
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23670/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3237#issuecomment-63825274
  
  [Test build #23669 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23669/consoleFull)
 for   PR 3237 at commit 
[`3abdb1b`](https://github.com/apache/spark/commit/3abdb1b24aa48f21e7eed1232c01d3933873688c).
 * This patch **passes all tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3237#issuecomment-63825280
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23669/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3383#issuecomment-63827207
  
  [Test build #23671 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23671/consoleFull)
 for   PR 3383 at commit 
[`071907f`](https://github.com/apache/spark/commit/071907f10d9e943370ff76f4b00d7abdc1db1017).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3383#issuecomment-63827215
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23671/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-20 Thread uncleGen

Github user uncleGen commented on the pull request:

https://github.com/apache/spark/pull/3366#issuecomment-63831692
  
@davies Could you help reviewing this patch? Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Add example that reads a local file, writes to...

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3347#issuecomment-63835431
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23672/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Add example that reads a local file, writes to...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3347#issuecomment-63835422
  
  [Test build #23672 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23672/consoleFull)
 for   PR 3347 at commit 
[`b0ef9ea`](https://github.com/apache/spark/commit/b0ef9ea387e031deddbe1ffda833d98eb5f42e08).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...

2014-11-20 Thread watermen

Github user watermen commented on the pull request:

https://github.com/apache/spark/pull/3237#issuecomment-63836267
  
@tdas See test build #23670, I had to add exclusions for
```scala

ProblemFilters.exclude[MissingMethodProblem](org.apache.spark.streaming.api.java.JavaDStreamLike.print)
```
for Spark 1.2
But it had failed MiMa tests again. Can you tell me the reason? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-63836399
  
  [Test build #23674 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23674/consoleFull)
 for   PR 3222 at commit 
[`af8fbb3`](https://github.com/apache/spark/commit/af8fbb3309e5e36c0ec3332c590cec5a1bcb30e0).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-63836409
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23674/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2014-11-20 Thread shaneknapp

Github user shaneknapp commented on the pull request:

https://github.com/apache/spark/pull/3381#issuecomment-63841756
  
looks like a compilation error, and the reason why no unit test results
were stored was that none were run (unless i'm missing something).


On Wed, Nov 19, 2014 at 11:56 PM, Daoyuan Wang notificati...@github.com
wrote:

 This is a weird build error. @shaneknapp https://github.com/shaneknapp

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/3381#issuecomment-63772517.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3009#issuecomment-63844350
  
I looked through this and took a spin locally, LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3009#issuecomment-63851287
  
Hey @JoshRosen I'll take a look at this right now. Is it still WIP by the 
way?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2014-11-20 Thread shaneknapp

Github user shaneknapp commented on the pull request:

https://github.com/apache/spark/pull/3381#issuecomment-63851763
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3009#discussion_r20664971
  
--- Diff: 
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
@@ -144,11 +146,30 @@ class JobProgressListener(conf: SparkConf) extends 
SparkListener with Logging {
   }
 
   override def onJobStart(jobStart: SparkListenerJobStart) = synchronized {
-val jobGroup = 
Option(jobStart.properties).map(_.getProperty(SparkContext.SPARK_JOB_GROUP_ID))
+val jobGroup = for (
+  props - Option(jobStart.properties);
+  group - Option(props.getProperty(SparkContext.SPARK_JOB_GROUP_ID))
+) yield group
 val jobData: JobUIData =
-  new JobUIData(jobStart.jobId, jobStart.stageIds, jobGroup, 
JobExecutionStatus.RUNNING)
+  new JobUIData(jobStart.jobId, Some(System.currentTimeMillis), None, 
jobStart.stageIds,
+jobGroup, JobExecutionStatus.RUNNING)
+// Compute (a potential underestimate of) the number of tasks that 
will be run by this job:
--- End diff --

I think it would be good to explain briefly why it can potentially be an 
underestimate


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3009#discussion_r20665049
  
--- Diff: 
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
@@ -166,6 +188,21 @@ class JobProgressListener(conf: SparkConf) extends 
SparkListener with Logging {
 trimJobsIfNecessary(failedJobs)
 jobData.status = JobExecutionStatus.FAILED
 }
+for (stageId - jobData.stageIds) {
+  stageIdToActiveJobIds.get(stageId).foreach { jobsUsingStage =
+jobsUsingStage.remove(jobEnd.jobId)
+stageIdToInfo.get(stageId).foreach { stageInfo =
+  // If this is a pending stage and no other job depends on it, 
then it won't be run.
+  // To prevent memory leaks, remove this data since it won't be 
cleaned up as stages
+  // finish / fail:
+  if (stageInfo.submissionTime.isEmpty  
stageInfo.completionTime.isEmpty
+ jobsUsingStage.isEmpty) {
--- End diff --

this looks funky, can you do it like this
```
if (stageInfo.submissionTime.isEmpty 
stageInfo.completionTime.isEmpty 
jobsUsingStage.isEmpty) {
  ...
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3009#discussion_r20665136
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala ---
@@ -40,9 +40,22 @@ private[jobs] object UIData {
 
   class JobUIData(
 var jobId: Int = -1,
+var startTime: Option[Long] = None,
+var endTime: Option[Long] = None,
 var stageIds: Seq[Int] = Seq.empty,
 var jobGroup: Option[String] = None,
-var status: JobExecutionStatus = JobExecutionStatus.UNKNOWN
+var status: JobExecutionStatus = JobExecutionStatus.UNKNOWN,
+/* Tasks */
+// `numTasks` is a potential underestimate of the true number of tasks 
that this job will run
+// see https://github.com/apache/spark/pull/3009 for an extensive 
discussion of this
--- End diff --

as in above, is it possible to provide a 1-line quick summary of why that 
is the case, and if the reader wants to dig deeper then they can follow the 
link?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/3009#discussion_r20665208
  
--- Diff: 
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
@@ -214,6 +264,14 @@ class JobProgressListener(conf: SparkConf) extends 
SparkListener with Logging {
 
 val stages = poolToActiveStages.getOrElseUpdate(poolName, new 
HashMap[Int, StageInfo])
 stages(stage.stageId) = stage
+
+for (
+  activeJobsDependentOnStage - 
stageIdToActiveJobIds.get(stage.stageId);
+  jobId - activeJobsDependentOnStage;
+  jobData - jobIdToData.get(jobId)
+) {
+  jobData.numActiveStages += 1
+}
--- End diff --

So what's the behavior now for resubmitted stages? Will they result in # 
completed stages  # total stages in the UI (and similarly for # tasks)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/3009#discussion_r20665341
  
--- Diff: 
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
@@ -214,6 +264,14 @@ class JobProgressListener(conf: SparkConf) extends 
SparkListener with Logging {
 
 val stages = poolToActiveStages.getOrElseUpdate(poolName, new 
HashMap[Int, StageInfo])
 stages(stage.stageId) = stage
+
+for (
+  activeJobsDependentOnStage - 
stageIdToActiveJobIds.get(stage.stageId);
+  jobId - activeJobsDependentOnStage;
+  jobData - jobIdToData.get(jobId)
+) {
+  jobData.numActiveStages += 1
+}
--- End diff --

Ok I realized what this is (the apparent-hang behavior we discussed) -- but 
it would be great to add a comment somewhere describing this (maybe in 
JobUIData, explaining why completedStages is a set?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/3009#discussion_r20665377
  
--- Diff: core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala ---
@@ -17,16 +17,18 @@
 
 package org.apache.spark.ui
 
-import org.apache.spark.api.java.StorageLevels
-import org.apache.spark.{SparkException, SparkConf, SparkContext}
-import org.openqa.selenium.WebDriver
+import org.openqa.selenium.{By, WebDriver}
 import org.openqa.selenium.htmlunit.HtmlUnitDriver
 import org.scalatest._
 import org.scalatest.concurrent.Eventually._
 import org.scalatest.selenium.WebBrowser
 import org.scalatest.time.SpanSugar._
 
+import org.apache.spark.api.java.StorageLevels
--- End diff --

nit: this should be below the next three


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/3009#discussion_r20665632
  
--- Diff: 
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
@@ -144,11 +146,30 @@ class JobProgressListener(conf: SparkConf) extends 
SparkListener with Logging {
   }
 
   override def onJobStart(jobStart: SparkListenerJobStart) = synchronized {
-val jobGroup = 
Option(jobStart.properties).map(_.getProperty(SparkContext.SPARK_JOB_GROUP_ID))
+val jobGroup = for (
+  props - Option(jobStart.properties);
+  group - Option(props.getProperty(SparkContext.SPARK_JOB_GROUP_ID))
+) yield group
 val jobData: JobUIData =
-  new JobUIData(jobStart.jobId, jobStart.stageIds, jobGroup, 
JobExecutionStatus.RUNNING)
+  new JobUIData(jobStart.jobId, Some(System.currentTimeMillis), None, 
jobStart.stageIds,
--- End diff --

Can you name each of these parameters? As is, I find it hard to read


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/3009#discussion_r20665705
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala ---
@@ -56,7 +56,11 @@ case class SparkListenerTaskEnd(
   extends SparkListenerEvent
 
 @DeveloperApi
-case class SparkListenerJobStart(jobId: Int, stageIds: Seq[Int], 
properties: Properties = null)
+case class SparkListenerJobStart(
+jobId: Int,
+stageInfos: Seq[StageInfo],
+stageIds: Seq[Int],  // Note: this is here for backwards-compatibility
--- End diff --

Why do you need this for backwards compatibility? Can the JsonProtocol do 
something smarter where it just fills in the StageInfos with 0 except for the 
stageIds?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4439] [MLlib] add python api for random...

2014-11-20 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/3320#issuecomment-63853606
  
@mengxr done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3009#discussion_r20665717
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala ---
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ui.jobs
+
+import scala.xml.{Node, NodeSeq}
+
+import javax.servlet.http.HttpServletRequest
+
+import org.apache.spark.ui.{WebUIPage, UIUtils}
+import org.apache.spark.ui.jobs.UIData.JobUIData
+
+
+/** Page showing list of all ongoing and recently finished jobs */
+private[ui] class AllJobsPage(parent: JobsTab) extends WebUIPage() {
+  private val startTime: Option[Long] = parent.sc.map(_.startTime)
+  private val listener = parent.listener
+
+  private def jobsTable(jobs: Seq[JobUIData]): Seq[Node] = {
+val someJobHasJobGroup = jobs.exists(_.jobGroup.isDefined)
+
+val columns: Seq[Node] = {
+  th{if (someJobHasJobGroup) Job Id (Job Group) else Job Id}/th
+  thDescription/th
+  thSubmitted/th
+  thDuration/th
+  th class=sorttable_nosortStages: Succeeded/Total/th
+  th class=sorttable_nosortTasks (for all stages): 
Succeeded/Total/th
+}
+
+def makeRow(job: JobUIData): Seq[Node] = {
+  val lastStageInfo = listener.stageIdToInfo.get(job.stageIds.max)
+  val lastStageData = lastStageInfo.flatMap { s =
+listener.stageIdToData.get((s.stageId, s.attemptId))
+  }
+  val lastStageName = lastStageInfo.map(_.name).getOrElse((Unknown 
Stage Name))
+  val lastStageDescription = 
lastStageData.flatMap(_.description).getOrElse()
+  val duration: Option[Long] = {
+job.startTime.map { start =
+  val end = job.endTime.getOrElse(System.currentTimeMillis())
+  end - start
+}
+  }
+  val formattedDuration = duration.map(d = 
UIUtils.formatDuration(d)).getOrElse(Unknown)
+  val formattedSubmissionTime = 
job.startTime.map(UIUtils.formatDate).getOrElse(Unknown)
--- End diff --

I realize we use `Unknown` in a few places. Can you declare a
```
val UNKNOWN: String = Unknown
```
in `UIUtils`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3009#discussion_r20665799
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala ---
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ui.jobs
+
+import scala.xml.{Node, NodeSeq}
+
+import javax.servlet.http.HttpServletRequest
+
+import org.apache.spark.ui.{WebUIPage, UIUtils}
+import org.apache.spark.ui.jobs.UIData.JobUIData
+
+
+/** Page showing list of all ongoing and recently finished jobs */
+private[ui] class AllJobsPage(parent: JobsTab) extends WebUIPage() {
+  private val startTime: Option[Long] = parent.sc.map(_.startTime)
+  private val listener = parent.listener
+
+  private def jobsTable(jobs: Seq[JobUIData]): Seq[Node] = {
+val someJobHasJobGroup = jobs.exists(_.jobGroup.isDefined)
+
+val columns: Seq[Node] = {
+  th{if (someJobHasJobGroup) Job Id (Job Group) else Job Id}/th
+  thDescription/th
+  thSubmitted/th
+  thDuration/th
+  th class=sorttable_nosortStages: Succeeded/Total/th
+  th class=sorttable_nosortTasks (for all stages): 
Succeeded/Total/th
+}
+
+def makeRow(job: JobUIData): Seq[Node] = {
+  val lastStageInfo = listener.stageIdToInfo.get(job.stageIds.max)
+  val lastStageData = lastStageInfo.flatMap { s =
+listener.stageIdToData.get((s.stageId, s.attemptId))
+  }
+  val lastStageName = lastStageInfo.map(_.name).getOrElse((Unknown 
Stage Name))
+  val lastStageDescription = 
lastStageData.flatMap(_.description).getOrElse()
+  val duration: Option[Long] = {
+job.startTime.map { start =
+  val end = job.endTime.getOrElse(System.currentTimeMillis())
+  end - start
+}
+  }
+  val formattedDuration = duration.map(d = 
UIUtils.formatDuration(d)).getOrElse(Unknown)
+  val formattedSubmissionTime = 
job.startTime.map(UIUtils.formatDate).getOrElse(Unknown)
+  val detailUrl =
+
%s/jobs/job?id=%s.format(UIUtils.prependBaseUri(parent.basePath), job.jobId)
+
+  tr
+td sorttable_customkey={job.jobId.toString}
+  {job.jobId} {job.jobGroup.map(id = s($id)).getOrElse()}
+/td
+td
+  divem{lastStageDescription}/em/div
+  a href={detailUrl}{lastStageName}/a
+/td
+td sorttable_customkey={job.startTime.getOrElse(-1).toString}
+  {formattedSubmissionTime}
+/td
+td 
sorttable_customkey={duration.getOrElse(-1).toString}{formattedDuration}/td
+td class=stage-progress-cell
+  {job.completedStageIndices.size}/{job.stageIds.size}
+  {if (job.numFailedStages  0) s(${job.numFailedStages} failed) 
else }
+/td
+td class=progress-cell
+  {UIUtils.makeProgressBar(job.numActiveTasks, 
job.numCompletedTasks,
+  job.numFailedTasks, job.numTasks)}
+/td
+  /tr
+}
+
+table class=table table-bordered table-striped table-condensed 
sortable
+  thead{columns}/thead
+  tbody
+{jobs.map(makeRow)}
+  /tbody
+/table
+  }
+
+  def render(request: HttpServletRequest): Seq[Node] = {
+listener.synchronized {
+  val activeJobs = listener.activeJobs.values.toSeq
+  val completedJobs = listener.completedJobs.reverse.toSeq
+  val failedJobs = listener.failedJobs.reverse.toSeq
+  val now = System.currentTimeMillis
+
+  val activeJobsTable =
+jobsTable(activeJobs.sortBy(_.startTime.getOrElse(-1L)).reverse)
+  val completedJobsTable =
+jobsTable(completedJobs.sortBy(_.endTime.getOrElse(-1L)).reverse)
+  val failedJobsTable =
+jobsTable(failedJobs.sortBy(_.endTime.getOrElse(-1L)).reverse)
+
+  val summary: NodeSeq =

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread kayousterhout

Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/3009#issuecomment-63854023
  
All superficial comments on the code, but I tried it out and still got this 
somewhat icky situation where a completed stage has fewer succeeded tasks than 
the total:


![image](https://cloud.githubusercontent.com/assets/1108612/5130344/a80bdca2-709e-11e4-905b-9e5baebec7c9.png)

Is this still the expected behavior? (this happened from running val rdd = 
sc.parallelize(1 to 10, 2).map((_, 1)).reduceByKey(_+_) and then counting the 
elements twice)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3009#discussion_r20666049
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala ---
@@ -195,9 +180,10 @@ private[ui] class StageTableBase(
 
 private[ui] class FailedStageTable(
 stages: Seq[StageInfo],
-parent: JobProgressTab,
-killEnabled: Boolean = false)
-  extends StageTableBase(stages, parent, killEnabled) {
+basePath: String,
+listener: JobProgressListener,
+isFairScheduler: Boolean)
+  extends StageTableBase(stages, basePath, listener, isFairScheduler, 
killEnabled = false) {
--- End diff --

Not your change, but weird how `killEnabled` is an attribute of the 
`StageTableBase`. The right thing to do is to have a `KillableStageTable` or 
something. We can fix this later (no action needed on your part)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread kayousterhout

Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/3009#issuecomment-63855330
  
Also, I thought more about having this in 1.2, and I'm -0.5 on putting this 
in 1.2.  Given all of the subtleties you ended up running into with this, Josh, 
I don't think it's a good idea to put it in the release at the last minute 
without giving folks plenty of time to test it out.  Based on my (admittedly 
limited!) understanding of most Spark users, the UI page is one of the first 
thing a user will look at, and I don't think pushing a major change to what the 
user first sees when he or she interacts with Spark this late in the release 
cycle is a good idea.  If I were a Spark user and had successfully tried the 
preview release Patrick had posted, and then later found that the final 1.2 
release changed the landing page for the UI in a buggy / unintuitive / insert 
unexpected bug here, I think I'd be annoyed and question the Spark QA process.

I'm certainly willing to be overruled if others are less concerned about 
the points I mentioned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4439] [MLlib] add python api for random...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3320#issuecomment-63856051
  
  [Test build #23677 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23677/consoleFull)
 for   PR 3320 at commit 
[`e0df852`](https://github.com/apache/spark/commit/e0df852ab4f353b9f800fe5374195fee5a06aa52).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Add example that reads a local file, writes to...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3347#issuecomment-63856412
  
  [Test build #23675 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23675/consoleFull)
 for   PR 3347 at commit 
[`af8ccb7`](https://github.com/apache/spark/commit/af8ccb785098f3c3a99f9a9d9675abbe989401a0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3381#issuecomment-63856447
  
  [Test build #23676 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23676/consoleFull)
 for   PR 3381 at commit 
[`f7c704a`](https://github.com/apache/spark/commit/f7c704af4d615977c43b8f6af87c5166aee0ac03).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3381#issuecomment-63857314
  
  [Test build #23676 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23676/consoleFull)
 for   PR 3381 at commit 
[`f7c704a`](https://github.com/apache/spark/commit/f7c704af4d615977c43b8f6af87c5166aee0ac03).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `final class Date extends Ordered[Date] with Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3381#issuecomment-63857315
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23676/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...

2014-11-20 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/3366#issuecomment-63859043
  
What's the cases that we should disable map side aggregation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3009#discussion_r20669493
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala ---
@@ -56,7 +56,11 @@ case class SparkListenerTaskEnd(
   extends SparkListenerEvent
 
 @DeveloperApi
-case class SparkListenerJobStart(jobId: Int, stageIds: Seq[Int], 
properties: Properties = null)
+case class SparkListenerJobStart(
+jobId: Int,
+stageInfos: Seq[StageInfo],
+stageIds: Seq[Int],  // Note: this is here for backwards-compatibility
--- End diff --

The issue if someone with an older version of Spark reads a log message 
from a newer version. We can't go back and modify how older versions parse the 
fields, so we need to include fields expected by older versions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3009#discussion_r20669574
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala ---
@@ -56,7 +56,11 @@ case class SparkListenerTaskEnd(
   extends SparkListenerEvent
 
 @DeveloperApi
-case class SparkListenerJobStart(jobId: Int, stageIds: Seq[Int], 
properties: Properties = null)
+case class SparkListenerJobStart(
+jobId: Int,
+stageInfos: Seq[StageInfo],
+stageIds: Seq[Int],  // Note: this is here for backwards-compatibility
--- End diff --

The problem I described could be fixed by just modifying the serializer and 
not the message here. However, we'd also like to provide binary compatiblity 
for people who wrote custom listeners and expect this field to be there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] elimin...

2014-11-20 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/2524#issuecomment-63862394
  
@CodingCat I think we discussed in 
https://issues.apache.org/jira/browse/SPARK-3628 that it would be best to do 
this only for result stages first. Can you do that? The reason is that we can't 
fully guarantee these semantics for transformations, for two reasons:
* A shuffle stage may be resubmitted once the old one is garbage-collected 
(if periodic cleanup is on)
* If you use an accumulator in a pipelined transformation like a map(), and 
then you make a new RDD built on top of that (e.g. apply another map() to it), 
it won't count as the same stage so you'll still get the updates twice

I think we can clarify our documentation to say accumulators offer this 
guarantee only in actions, and should be used more as counters in other 
settings. It would also lead to a *much* simpler patch, which is highly 
preferred for a bug fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3009#issuecomment-63862396
  
Hi Kay,

The behavior you noticed is intentional. If a job completes but doesn't run 
all of it's stages, it ends up showing as finished with a partially completed 
progress bar.

IMO - this should go in the release because it's one of the most requested 
features for Spark (giving better insights into runtime performance of a job). 
Because of the way the UI/Listners are structured it's very low risk and cannot 
interfere with other lower level functionality. Also, the existing Stage page 
is left unmodified, so this is purely additive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] elimin...

2014-11-20 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/2524#discussion_r20670036
  
--- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala ---
@@ -282,7 +285,6 @@ private object Accumulators {
 return ret
   }
 
-  // Add values to the original accumulators with some given IDs
--- End diff --

Why was this comment removed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] elimin...

2014-11-20 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/2524#discussion_r20670065
  
--- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala ---
@@ -226,9 +227,12 @@ GrowableAccumulableParam[R % Growable[T] with 
TraversableOnce[T] with Serializa
  * @param param helper object defining how to add elements of type `T`
  * @tparam T result type
  */
-class Accumulator[T](@transient initialValue: T, param: 
AccumulatorParam[T], name: Option[String])
+class Accumulator[T](@transient initialValue: T, param: 
AccumulatorParam[T],
+ name: Option[String])
 extends Accumulable[T,T](initialValue, param, name) {
-  def this(initialValue: T, param: AccumulatorParam[T]) = 
this(initialValue, param, None)
+
+  def this(initialValue: T, param: AccumulatorParam[T]) =
+this(initialValue, param, None)
--- End diff --

Why was formatting changed here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4513][SQL] Support relational operator ...

2014-11-20 Thread ravipesala

GitHub user ravipesala opened a pull request:

https://github.com/apache/spark/pull/3387

[SPARK-4513][SQL] Support relational operator '=' in Spark SQL

The relational operator '=' is not working in Spark SQL. Same works in 
Spark HiveQL

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/spark =

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3387.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3387


commit 7198e90fd6458bc44c0c40762c0d493d240e5e69
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-11-20T19:04:04Z

Supporting relational operator '=' in Spark SQL




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] elimin...

2014-11-20 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/2524#issuecomment-63863153
  
BTW when doing this only for result stages, my suggestion is to use the 
data structures within the stage instead of having a second HashMap. I believe 
I mentioned this before too (maybe on the previous PR): all you need to do is 
move the accumulator update code within the `if (!job.finished(rt.outputId)) {` 
for such stages, similar to how it only fetches results once for each task. 
Again the point is to avoid adding a new data structure in DAGScheduler that we 
must then carefully manage and clean up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4513][SQL] Support relational operator ...

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3387#issuecomment-63863591
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: add Sphinx as a dependency of building docs

2014-11-20 Thread davies

GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/3388

add Sphinx as a dependency of building docs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark doc_readme

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3388.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3388


commit daa14828b0e9d8ad37bdae072963e97166c484a1
Author: Davies Liu dav...@databricks.com
Date:   2014-11-20T19:22:51Z

add Sphinx dependency




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: add Sphinx as a dependency of building docs

2014-11-20 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/3388#issuecomment-63863907
  
cc @pwendell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: add Sphinx as a dependency of building docs

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3388#issuecomment-63864808
  
  [Test build #23678 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23678/consoleFull)
 for   PR 3388 at commit 
[`daa1482`](https://github.com/apache/spark/commit/daa14828b0e9d8ad37bdae072963e97166c484a1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] elimin...

2014-11-20 Thread CodingCat

Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/2524#issuecomment-63866302
  
@mateiz , I see...I had the impression that we agreed on still support 
shuffle stage deduplication finally...

OK, I can shrink this patch to only support result stage 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4439] [MLlib] add python api for random...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3320#issuecomment-63869584
  
  [Test build #23677 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23677/consoleFull)
 for   PR 3320 at commit 
[`e0df852`](https://github.com/apache/spark/commit/e0df852ab4f353b9f800fe5374195fee5a06aa52).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4439] [MLlib] add python api for random...

2014-11-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3320#issuecomment-63869599
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23677/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3200#issuecomment-63870038
  
  [Test build #23679 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23679/consoleFull)
 for   PR 3200 at commit 
[`9ae85aa`](https://github.com/apache/spark/commit/9ae85aa1ebabdc099d7f655bc1d9021d34d2910f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Add example that reads a local file, writes to...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3347#issuecomment-63870055
  
  [Test build #23675 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23675/consoleFull)
 for   PR 3347 at commit 
[`af8ccb7`](https://github.com/apache/spark/commit/af8ccb785098f3c3a99f9a9d9675abbe989401a0).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages

2014-11-20 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3009#discussion_r20673960
  
--- Diff: 
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
@@ -144,11 +146,30 @@ class JobProgressListener(conf: SparkConf) extends 
SparkListener with Logging {
   }
 
   override def onJobStart(jobStart: SparkListenerJobStart) = synchronized {
-val jobGroup = 
Option(jobStart.properties).map(_.getProperty(SparkContext.SPARK_JOB_GROUP_ID))
+val jobGroup = for (
+  props - Option(jobStart.properties);
+  group - Option(props.getProperty(SparkContext.SPARK_JOB_GROUP_ID))
+) yield group
 val jobData: JobUIData =
-  new JobUIData(jobStart.jobId, jobStart.stageIds, jobGroup, 
JobExecutionStatus.RUNNING)
+  new JobUIData(jobStart.jobId, Some(System.currentTimeMillis), None, 
jobStart.stageIds,
+jobGroup, JobExecutionStatus.RUNNING)
+// Compute (a potential underestimate of) the number of tasks that 
will be run by this job:
--- End diff --

How's this for an explanation?

```
// Compute (a potential underestimate of) the number of tasks that will 
be run by this job.
// This may be an underestimate because the job start event references 
all of the result
// stages's transitive stage dependencies, but some of these stages 
might be skipped if their
// output is available from earlier runs.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4477] [PySpark] remove numpy from RDDSa...

2014-11-20 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3351#issuecomment-63871878
  
  [Test build #23680 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23680/consoleFull)
 for   PR 3351 at commit 
[`ee17d78`](https://github.com/apache/spark/commit/ee17d7846438e270967e38120d0fb6c63523defd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 316 matches

Mail list logo