[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3381#issuecomment-63772820 [Test build #23665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23665/consoleFull) for PR 3381 at commit [`f7c704a`](https://github.com/apache/spark/commit/f7c704af4d615977c43b8f6af87c5166aee0ac03). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3381#issuecomment-63773096 [Test build #23665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23665/consoleFull) for PR 3381 at commit [`f7c704a`](https://github.com/apache/spark/commit/f7c704af4d615977c43b8f6af87c5166aee0ac03). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `final class Date extends Ordered[Date] with Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3381#issuecomment-63773100 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23665/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: add jackson-core-asl 1.8.8 dependency
Github user devlatte commented on the pull request: https://github.com/apache/spark/pull/3379#issuecomment-63773937 It might be related with this issue. https://issues.apache.org/jira/browse/SPARK-3602 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3374#issuecomment-63774229 [Test build #23663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23663/consoleFull) for PR 3374 at commit [`7097251`](https://github.com/apache/spark/commit/70972515085245957df9601e425141746f268c4b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3374#issuecomment-63774234 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23663/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4505][Core] Add a ClassTag parameter to...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3378#issuecomment-63774979 We should definitely add a ClassTag since this can be used for primitive types. However, there might be places where we create a lot of CompactBuffers. I haven't had a chance to look at where CompactBuffers are used yet, but for those places, would it be possible to create a single ClassTag reference? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4505][Core] Add a ClassTag parameter to...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/3378#issuecomment-63775682 Cogroup uses `CompactBuffer`. However, it cannot add ClassTag due to its signature: ```Scala class CoGroupedRDD[K](@transient var rdds: Seq[RDD[_ : Product2[K, _]]], part: Partitioner) extends RDD[(K, Array[Iterable[_]])](rdds.head.context, Nil) ``` Here `rdds` is `Seq[RDD[_ : Product2[K, _]]]` without the real template type of `RDD`s --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3374#issuecomment-63777328 @manishamde @jkbradley Thanks! Merged into master and branch-1.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4439] [MLlib] add python api for random...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3320#issuecomment-63777609 @davies We updated the `RandomForest` API in #3374 . Now `RandomForest` returns a `RandomForestModel`. Could you rebase and update this PR? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4486][MLLIB] Improve GradientBoosting A...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3374 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4481][Streaming][Doc] Fix the wrong des...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3376#issuecomment-63779898 I have merged this. Thanks for the backport! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4481][Streaming][Doc] Fix the wrong des...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3376#issuecomment-63780029 Github wont close this automatically, so could you please close this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4510][MLlib]: Add k-medoids Partitionin...
GitHub user fjiang6 opened a pull request: https://github.com/apache/spark/pull/3382 [SPARK-4510][MLlib]: Add k-medoids Partitioning Around Medoids (PAM) algorithm PAM (k-medoids) including the test case and an example. Passed the style checks Tested and compared with K-Means in MLlib, showing more steady performances. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Huawei-Spark/spark PAM Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3382.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3382 commit 95cd43e21a5e4499fd63125dde6973a4271a0de2 Author: Jiang Fan fjia...@gmail.com Date: 2014-11-20T02:59:18Z add PAM algorithm with an example commit 8721fc2d0fead9e72427909d5dab455e7dcd67f9 Author: Jiang Fan fjia...@gmail.com Date: 2014-11-20T03:05:43Z add newline at end of file commit 9b4131a3fee5e9cd5a7ac58c7718b78236412f7e Author: Jiang Fan fjia...@gmail.com Date: 2014-11-20T05:05:06Z add the PAMSuite.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4481][Streaming][Doc] Fix the wrong des...
Github user zsxwing closed the pull request at: https://github.com/apache/spark/pull/3376 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4510][MLlib]: Add k-medoids Partitionin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3382#issuecomment-63781094 [Test build #23666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23666/consoleFull) for PR 3382 at commit [`9b4131a`](https://github.com/apache/spark/commit/9b4131a3fee5e9cd5a7ac58c7718b78236412f7e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4510][MLlib]: Add k-medoids Partitionin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3382#issuecomment-63791400 [Test build #23666 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23666/consoleFull) for PR 3382 at commit [`9b4131a`](https://github.com/apache/spark/commit/9b4131a3fee5e9cd5a7ac58c7718b78236412f7e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class Params(` * `class PAM (` * `class PAMModel (val clusterCenters: Array[Vector]) extends Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4510][MLlib]: Add k-medoids Partitionin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3382#issuecomment-63791407 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23666/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/3383 [SPARK-3938][SQL] Names in-memory columnar RDD with corresponding table name This PR enables the Web UI storage tab to show the in-memory table name instead of the mysterious query plan string as the name of the in-memory columnar RDD. Note that after #2501, a single columnar RDD can be shared with multiple in-memory tables, as long as their query results are the same. In this case, only the first cached table name is shown. For example: ```sql CACHE TABLE first AS SELECT * FROM src; CACHE TABLE second AS SELECT * FROM src; ``` The Web UI only shows In-memory table first. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark columnar-rdd-name Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3383.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3383 commit 12ddfa6eea5b0b0b7ebbb8fffd802d87e6727493 Author: Cheng Lian l...@databricks.com Date: 2014-11-20T10:35:57Z Names in-memory columnar RDD with corresponding table name --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3383#issuecomment-63792868 [Test build #23667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23667/consoleFull) for PR 3383 at commit [`12ddfa6`](https://github.com/apache/spark/commit/12ddfa6eea5b0b0b7ebbb8fffd802d87e6727493). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3383#issuecomment-63793690 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23667/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3383#issuecomment-63793685 [Test build #23667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23667/consoleFull) for PR 3383 at commit [`12ddfa6`](https://github.com/apache/spark/commit/12ddfa6eea5b0b0b7ebbb8fffd802d87e6727493). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: add jackson-core-asl 1.8.8 dependency
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3379#issuecomment-63796464 This is basically the same as https://issues.apache.org/jira/browse/SPARK-3955 actually, which already has an open PR: https://github.com/apache/spark/pull/2818 Maybe comment on that PR instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3237#issuecomment-63803519 [Test build #23668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23668/consoleFull) for PR 3237 at commit [`5fd7afd`](https://github.com/apache/spark/commit/5fd7afd6a0c724151340719e2b017357e042300c). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3237#issuecomment-63808836 [Test build #23669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23669/consoleFull) for PR 3237 at commit [`3abdb1b`](https://github.com/apache/spark/commit/3abdb1b24aa48f21e7eed1232c01d3933873688c). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/3383#issuecomment-63810342 +10 (haven't looked at PR itself) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3237#issuecomment-63810730 [Test build #23670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23670/consoleFull) for PR 3237 at commit [`b589a4b`](https://github.com/apache/spark/commit/b589a4b94c470f10aec0cc778060cd49470354d5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3383#issuecomment-63814934 [Test build #23671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23671/consoleFull) for PR 3383 at commit [`071907f`](https://github.com/apache/spark/commit/071907f10d9e943370ff76f4b00d7abdc1db1017). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Merge pull request #1 from apache/master
GitHub user codeAshu opened a pull request: https://github.com/apache/spark/pull/3384 Merge pull request #1 from apache/master updating Fork You can merge this pull request into a Git repository by running: $ git pull https://github.com/codeAshu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3384.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3384 commit 878e816bd8b9a6348d8e82194e329c1cefaca8bc Author: Ashutosh Trivedi rusty.iceb...@gmail.com Date: 2014-10-24T07:35:52Z Merge pull request #1 from apache/master updating Fork --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Merge pull request #1 from apache/master
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3384#issuecomment-63815959 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Merge pull request #1 from apache/master
Github user codeAshu closed the pull request at: https://github.com/apache/spark/pull/3384 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Merge pull request #1 from apache/master
Github user codeAshu commented on the pull request: https://github.com/apache/spark/pull/3384#issuecomment-63816089 sorry my bad --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Branch 1.2
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3385#issuecomment-63817444 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Branch 1.2
GitHub user codeAshu opened a pull request: https://github.com/apache/spark/pull/3385 Branch 1.2 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-1.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3385.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3385 commit a68321400c1068449698d03cebd0fbf648627133 Author: Xiangrui Meng m...@databricks.com Date: 2014-11-03T20:24:24Z [SPARK-4148][PySpark] fix seed distribution and add some tests for rdd.sample The current way of seed distribution makes the random sequences from partition i and i+1 offset by 1. ~~~ In [14]: import random In [15]: r1 = random.Random(10) In [16]: r1.randint(0, 1) Out[16]: 1 In [17]: r1.random() Out[17]: 0.4288890546751146 In [18]: r1.random() Out[18]: 0.5780913011344704 In [19]: r2 = random.Random(10) In [20]: r2.randint(0, 1) Out[20]: 1 In [21]: r2.randint(0, 1) Out[21]: 0 In [22]: r2.random() Out[22]: 0.5780913011344704 ~~~ Note: The new tests are not for this bug fix. Author: Xiangrui Meng m...@databricks.com Closes #3010 from mengxr/SPARK-4148 and squashes the following commits: 869ae4b [Xiangrui Meng] move tests tests.py c1bacd9 [Xiangrui Meng] fix seed distribution and add some tests for rdd.sample (cherry picked from commit 3cca1962207745814b9d83e791713c91b659c36c) Signed-off-by: Xiangrui Meng m...@databricks.com commit fc782896b5d51161feee950107df2acf17e12422 Author: fi code...@gmail.com Date: 2014-11-03T20:56:56Z [SPARK-4211][Build] Fixes hive.version in Maven profile hive-0.13.1 instead of `hive.version=0.13.1`. e.g. mvn -Phive -Phive=0.13.1 Note: `hive.version=0.13.1a` is the default property value. However, when explicitly specifying the `hive-0.13.1` maven profile, the wrong one would be selected. References: PR #2685, which resolved a package incompatibility issue with Hive-0.13.1 by introducing a special version Hive-0.13.1a Author: fi code...@gmail.com Closes #3072 from coderfi/master and squashes the following commits: 7ca4b1e [fi] Fixes the `hive-0.13.1` maven profile referencing `hive.version=0.13.1` instead of the Spark compatible `hive.version=0.13.1a` Note: `hive.version=0.13.1a` is the default version. However, when explicitly specifying the `hive-0.13.1` maven profile, the wrong one would be selected. e.g. mvn -Phive -Phive=0.13.1 See PR #2685 (cherry picked from commit df607da025488d6c924d3d70eddb67f5523080d3) Signed-off-by: Michael Armbrust mich...@databricks.com commit 292da4ef25d6cce23bfde7b9ab663a574dfd2b00 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-11-03T21:07:41Z [SPARK-4207][SQL] Query which has syntax like 'not like' is not working in Spark SQL Queries which has 'not like' is not working spark sql. sql(SELECT * FROM records where value not like 'val%') same query works in Spark HiveQL Author: ravipesala ravindra.pes...@huawei.com Closes #3075 from ravipesala/SPARK-4207 and squashes the following commits: 35c11e7 [ravipesala] Supported 'not like' syntax in sql (cherry picked from commit 2b6e1ce6ee7b1ba8160bcbee97f5bbff5c46ca09) Signed-off-by: Michael Armbrust mich...@databricks.com commit cc5dc4247979dc001302f7af978801b789acdbfa Author: Davies Liu davies@gmail.com Date: 2014-11-03T21:17:09Z [SPARK-3594] [PySpark] [SQL] take more rows to infer schema or sampling This patch will try to infer schema for RDD which has empty value (None, [], {}) in the first row. It will try first 100 rows and merge the types into schema, also merge fields of StructType together. If there is still NullType in schema, then it will show an warning, tell user to try with sampling. If sampling is presented, it will infer schema from all the rows after sampling. Also, add samplingRatio for jsonFile() and jsonRDD() Author: Davies Liu davies@gmail.com Author: Davies Liu dav...@databricks.com Closes #2716 from davies/infer and squashes the following commits: e678f6d [Davies Liu] Merge branch 'master' of github.com:apache/spark into infer 34b5c63 [Davies Liu] Merge branch 'master' of github.com:apache/spark into infer 567dc60 [Davies Liu] update docs 9767b27 [Davies Liu] Merge branch 'master' into infer e48d7fb [Davies Liu] fix tests 29e94d5 [Davies Liu] let NullType inherit from PrimitiveType ee5d524 [Davies Liu] Merge branch 'master' of github.com:apache/spark into infer 540d1d5 [Davies Liu] merge fields for StructType
[GitHub] spark pull request: Branch 1.2
Github user codeAshu closed the pull request at: https://github.com/apache/spark/pull/3385 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3237#issuecomment-63817831 [Test build #23668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23668/consoleFull) for PR 3237 at commit [`5fd7afd`](https://github.com/apache/spark/commit/5fd7afd6a0c724151340719e2b017357e042300c). * This patch **passes all tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3237#issuecomment-63817840 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23668/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-4512] [SQL] Unresolved Attribute Except...
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/3386 [Spark-4512] [SQL] Unresolved Attribute Exception in Sort By It will cause exception while do query like: SELECT key+key FROM src sort by value; This fix is inspired by #3363 , hope it goes in after #3363 merged. And I've removed the `logical.SortPartitions` and added a new attribute `global` for `logical.Sort`, the reason we do that is for sharing `ResolveSortReferences` for both `ORDER BY` and `SORT BY`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenghao-intel/spark sort Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3386.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3386 commit 503b263408924df42b26689a6f8b913a5185c10d Author: Cheng Hao hao.ch...@intel.com Date: 2014-11-20T14:40:42Z Remove the logical.SortPartitions and Add global sort flag for logical.Sort --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add example that reads a local file, writes to...
Github user rnowling commented on the pull request: https://github.com/apache/spark/pull/3347#issuecomment-63820267 @andrewor14 , thanks for the comments! I believe I fixed everything except for the changing the name of the example. I wanted some more feedback. I wrote the example to test reading and writing to a DFS. It does so by comparing the result of word count on a local file to word count on the file after copying to the DFS. I'm using it to make sure that DFSs are configured properly and accessible by all nodes. Do you still want me to drop the test suffix? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add example that reads a local file, writes to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3347#issuecomment-63820290 [Test build #23672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23672/consoleFull) for PR 3347 at commit [`b0ef9ea`](https://github.com/apache/spark/commit/b0ef9ea387e031deddbe1ffda833d98eb5f42e08). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-4512] [SQL] Unresolved Attribute Except...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3386#issuecomment-63820723 [Test build #23673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23673/consoleFull) for PR 3386 at commit [`503b263`](https://github.com/apache/spark/commit/503b263408924df42b26689a6f8b913a5185c10d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3222#issuecomment-63821052 [Test build #23674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23674/consoleFull) for PR 3222 at commit [`af8fbb3`](https://github.com/apache/spark/commit/af8fbb3309e5e36c0ec3332c590cec5a1bcb30e0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...
Github user giwa commented on the pull request: https://github.com/apache/spark/pull/3237#issuecomment-63822056 @tdas @watermen I believe python code become like this. ``` def pprint(self, num): Print the first num elements of each RDD generated in this DStream. def takeAndPrint(time, rdd): taken = rdd.take(num + 1) print --- print Time: %s % time print --- for record in taken[:num]: print record if len(taken) num: print ... print self.foreachRDD(takeAndPrint) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-4512] [SQL] Unresolved Attribute Except...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3386#issuecomment-63822505 [Test build #23673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23673/consoleFull) for PR 3386 at commit [`503b263`](https://github.com/apache/spark/commit/503b263408924df42b26689a6f8b913a5185c10d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Sort(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-4512] [SQL] Unresolved Attribute Except...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3386#issuecomment-63822515 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23673/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3237#issuecomment-63824115 [Test build #23670 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23670/consoleFull) for PR 3237 at commit [`b589a4b`](https://github.com/apache/spark/commit/b589a4b94c470f10aec0cc778060cd49470354d5). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3237#issuecomment-63824131 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23670/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3237#issuecomment-63825274 [Test build #23669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23669/consoleFull) for PR 3237 at commit [`3abdb1b`](https://github.com/apache/spark/commit/3abdb1b24aa48f21e7eed1232c01d3933873688c). * This patch **passes all tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3237#issuecomment-63825280 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23669/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3383#issuecomment-63827207 [Test build #23671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23671/consoleFull) for PR 3383 at commit [`071907f`](https://github.com/apache/spark/commit/071907f10d9e943370ff76f4b00d7abdc1db1017). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3938][SQL] Names in-memory columnar RDD...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3383#issuecomment-63827215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23671/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user uncleGen commented on the pull request: https://github.com/apache/spark/pull/3366#issuecomment-63831692 @davies Could you help reviewing this patch? Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add example that reads a local file, writes to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3347#issuecomment-63835431 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23672/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add example that reads a local file, writes to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3347#issuecomment-63835422 [Test build #23672 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23672/consoleFull) for PR 3347 at commit [`b0ef9ea`](https://github.com/apache/spark/commit/b0ef9ea387e031deddbe1ffda833d98eb5f42e08). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...
Github user watermen commented on the pull request: https://github.com/apache/spark/pull/3237#issuecomment-63836267 @tdas See test build #23670, I had to add exclusions for ```scala ProblemFilters.exclude[MissingMethodProblem](org.apache.spark.streaming.api.java.JavaDStreamLike.print) ``` for Spark 1.2 But it had failed MiMa tests again. Can you tell me the reason? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3222#issuecomment-63836399 [Test build #23674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23674/consoleFull) for PR 3222 at commit [`af8fbb3`](https://github.com/apache/spark/commit/af8fbb3309e5e36c0ec3332c590cec5a1bcb30e0). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3222#issuecomment-63836409 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23674/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user shaneknapp commented on the pull request: https://github.com/apache/spark/pull/3381#issuecomment-63841756 looks like a compilation error, and the reason why no unit test results were stored was that none were run (unless i'm missing something). On Wed, Nov 19, 2014 at 11:56 PM, Daoyuan Wang notificati...@github.com wrote: This is a weird build error. @shaneknapp https://github.com/shaneknapp — Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3381#issuecomment-63772517. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3009#issuecomment-63844350 I looked through this and took a spin locally, LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3009#issuecomment-63851287 Hey @JoshRosen I'll take a look at this right now. Is it still WIP by the way? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user shaneknapp commented on the pull request: https://github.com/apache/spark/pull/3381#issuecomment-63851763 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3009#discussion_r20664971 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -144,11 +146,30 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging { } override def onJobStart(jobStart: SparkListenerJobStart) = synchronized { -val jobGroup = Option(jobStart.properties).map(_.getProperty(SparkContext.SPARK_JOB_GROUP_ID)) +val jobGroup = for ( + props - Option(jobStart.properties); + group - Option(props.getProperty(SparkContext.SPARK_JOB_GROUP_ID)) +) yield group val jobData: JobUIData = - new JobUIData(jobStart.jobId, jobStart.stageIds, jobGroup, JobExecutionStatus.RUNNING) + new JobUIData(jobStart.jobId, Some(System.currentTimeMillis), None, jobStart.stageIds, +jobGroup, JobExecutionStatus.RUNNING) +// Compute (a potential underestimate of) the number of tasks that will be run by this job: --- End diff -- I think it would be good to explain briefly why it can potentially be an underestimate --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3009#discussion_r20665049 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -166,6 +188,21 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging { trimJobsIfNecessary(failedJobs) jobData.status = JobExecutionStatus.FAILED } +for (stageId - jobData.stageIds) { + stageIdToActiveJobIds.get(stageId).foreach { jobsUsingStage = +jobsUsingStage.remove(jobEnd.jobId) +stageIdToInfo.get(stageId).foreach { stageInfo = + // If this is a pending stage and no other job depends on it, then it won't be run. + // To prevent memory leaks, remove this data since it won't be cleaned up as stages + // finish / fail: + if (stageInfo.submissionTime.isEmpty stageInfo.completionTime.isEmpty + jobsUsingStage.isEmpty) { --- End diff -- this looks funky, can you do it like this ``` if (stageInfo.submissionTime.isEmpty stageInfo.completionTime.isEmpty jobsUsingStage.isEmpty) { ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3009#discussion_r20665136 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala --- @@ -40,9 +40,22 @@ private[jobs] object UIData { class JobUIData( var jobId: Int = -1, +var startTime: Option[Long] = None, +var endTime: Option[Long] = None, var stageIds: Seq[Int] = Seq.empty, var jobGroup: Option[String] = None, -var status: JobExecutionStatus = JobExecutionStatus.UNKNOWN +var status: JobExecutionStatus = JobExecutionStatus.UNKNOWN, +/* Tasks */ +// `numTasks` is a potential underestimate of the true number of tasks that this job will run +// see https://github.com/apache/spark/pull/3009 for an extensive discussion of this --- End diff -- as in above, is it possible to provide a 1-line quick summary of why that is the case, and if the reader wants to dig deeper then they can follow the link? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/3009#discussion_r20665208 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -214,6 +264,14 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging { val stages = poolToActiveStages.getOrElseUpdate(poolName, new HashMap[Int, StageInfo]) stages(stage.stageId) = stage + +for ( + activeJobsDependentOnStage - stageIdToActiveJobIds.get(stage.stageId); + jobId - activeJobsDependentOnStage; + jobData - jobIdToData.get(jobId) +) { + jobData.numActiveStages += 1 +} --- End diff -- So what's the behavior now for resubmitted stages? Will they result in # completed stages # total stages in the UI (and similarly for # tasks)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/3009#discussion_r20665341 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -214,6 +264,14 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging { val stages = poolToActiveStages.getOrElseUpdate(poolName, new HashMap[Int, StageInfo]) stages(stage.stageId) = stage + +for ( + activeJobsDependentOnStage - stageIdToActiveJobIds.get(stage.stageId); + jobId - activeJobsDependentOnStage; + jobData - jobIdToData.get(jobId) +) { + jobData.numActiveStages += 1 +} --- End diff -- Ok I realized what this is (the apparent-hang behavior we discussed) -- but it would be great to add a comment somewhere describing this (maybe in JobUIData, explaining why completedStages is a set?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/3009#discussion_r20665377 --- Diff: core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala --- @@ -17,16 +17,18 @@ package org.apache.spark.ui -import org.apache.spark.api.java.StorageLevels -import org.apache.spark.{SparkException, SparkConf, SparkContext} -import org.openqa.selenium.WebDriver +import org.openqa.selenium.{By, WebDriver} import org.openqa.selenium.htmlunit.HtmlUnitDriver import org.scalatest._ import org.scalatest.concurrent.Eventually._ import org.scalatest.selenium.WebBrowser import org.scalatest.time.SpanSugar._ +import org.apache.spark.api.java.StorageLevels --- End diff -- nit: this should be below the next three --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/3009#discussion_r20665632 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -144,11 +146,30 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging { } override def onJobStart(jobStart: SparkListenerJobStart) = synchronized { -val jobGroup = Option(jobStart.properties).map(_.getProperty(SparkContext.SPARK_JOB_GROUP_ID)) +val jobGroup = for ( + props - Option(jobStart.properties); + group - Option(props.getProperty(SparkContext.SPARK_JOB_GROUP_ID)) +) yield group val jobData: JobUIData = - new JobUIData(jobStart.jobId, jobStart.stageIds, jobGroup, JobExecutionStatus.RUNNING) + new JobUIData(jobStart.jobId, Some(System.currentTimeMillis), None, jobStart.stageIds, --- End diff -- Can you name each of these parameters? As is, I find it hard to read --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/3009#discussion_r20665705 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -56,7 +56,11 @@ case class SparkListenerTaskEnd( extends SparkListenerEvent @DeveloperApi -case class SparkListenerJobStart(jobId: Int, stageIds: Seq[Int], properties: Properties = null) +case class SparkListenerJobStart( +jobId: Int, +stageInfos: Seq[StageInfo], +stageIds: Seq[Int], // Note: this is here for backwards-compatibility --- End diff -- Why do you need this for backwards compatibility? Can the JsonProtocol do something smarter where it just fills in the StageInfos with 0 except for the stageIds? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4439] [MLlib] add python api for random...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/3320#issuecomment-63853606 @mengxr done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3009#discussion_r20665717 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ui.jobs + +import scala.xml.{Node, NodeSeq} + +import javax.servlet.http.HttpServletRequest + +import org.apache.spark.ui.{WebUIPage, UIUtils} +import org.apache.spark.ui.jobs.UIData.JobUIData + + +/** Page showing list of all ongoing and recently finished jobs */ +private[ui] class AllJobsPage(parent: JobsTab) extends WebUIPage() { + private val startTime: Option[Long] = parent.sc.map(_.startTime) + private val listener = parent.listener + + private def jobsTable(jobs: Seq[JobUIData]): Seq[Node] = { +val someJobHasJobGroup = jobs.exists(_.jobGroup.isDefined) + +val columns: Seq[Node] = { + th{if (someJobHasJobGroup) Job Id (Job Group) else Job Id}/th + thDescription/th + thSubmitted/th + thDuration/th + th class=sorttable_nosortStages: Succeeded/Total/th + th class=sorttable_nosortTasks (for all stages): Succeeded/Total/th +} + +def makeRow(job: JobUIData): Seq[Node] = { + val lastStageInfo = listener.stageIdToInfo.get(job.stageIds.max) + val lastStageData = lastStageInfo.flatMap { s = +listener.stageIdToData.get((s.stageId, s.attemptId)) + } + val lastStageName = lastStageInfo.map(_.name).getOrElse((Unknown Stage Name)) + val lastStageDescription = lastStageData.flatMap(_.description).getOrElse() + val duration: Option[Long] = { +job.startTime.map { start = + val end = job.endTime.getOrElse(System.currentTimeMillis()) + end - start +} + } + val formattedDuration = duration.map(d = UIUtils.formatDuration(d)).getOrElse(Unknown) + val formattedSubmissionTime = job.startTime.map(UIUtils.formatDate).getOrElse(Unknown) --- End diff -- I realize we use `Unknown` in a few places. Can you declare a ``` val UNKNOWN: String = Unknown ``` in `UIUtils`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3009#discussion_r20665799 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ui.jobs + +import scala.xml.{Node, NodeSeq} + +import javax.servlet.http.HttpServletRequest + +import org.apache.spark.ui.{WebUIPage, UIUtils} +import org.apache.spark.ui.jobs.UIData.JobUIData + + +/** Page showing list of all ongoing and recently finished jobs */ +private[ui] class AllJobsPage(parent: JobsTab) extends WebUIPage() { + private val startTime: Option[Long] = parent.sc.map(_.startTime) + private val listener = parent.listener + + private def jobsTable(jobs: Seq[JobUIData]): Seq[Node] = { +val someJobHasJobGroup = jobs.exists(_.jobGroup.isDefined) + +val columns: Seq[Node] = { + th{if (someJobHasJobGroup) Job Id (Job Group) else Job Id}/th + thDescription/th + thSubmitted/th + thDuration/th + th class=sorttable_nosortStages: Succeeded/Total/th + th class=sorttable_nosortTasks (for all stages): Succeeded/Total/th +} + +def makeRow(job: JobUIData): Seq[Node] = { + val lastStageInfo = listener.stageIdToInfo.get(job.stageIds.max) + val lastStageData = lastStageInfo.flatMap { s = +listener.stageIdToData.get((s.stageId, s.attemptId)) + } + val lastStageName = lastStageInfo.map(_.name).getOrElse((Unknown Stage Name)) + val lastStageDescription = lastStageData.flatMap(_.description).getOrElse() + val duration: Option[Long] = { +job.startTime.map { start = + val end = job.endTime.getOrElse(System.currentTimeMillis()) + end - start +} + } + val formattedDuration = duration.map(d = UIUtils.formatDuration(d)).getOrElse(Unknown) + val formattedSubmissionTime = job.startTime.map(UIUtils.formatDate).getOrElse(Unknown) + val detailUrl = + %s/jobs/job?id=%s.format(UIUtils.prependBaseUri(parent.basePath), job.jobId) + + tr +td sorttable_customkey={job.jobId.toString} + {job.jobId} {job.jobGroup.map(id = s($id)).getOrElse()} +/td +td + divem{lastStageDescription}/em/div + a href={detailUrl}{lastStageName}/a +/td +td sorttable_customkey={job.startTime.getOrElse(-1).toString} + {formattedSubmissionTime} +/td +td sorttable_customkey={duration.getOrElse(-1).toString}{formattedDuration}/td +td class=stage-progress-cell + {job.completedStageIndices.size}/{job.stageIds.size} + {if (job.numFailedStages 0) s(${job.numFailedStages} failed) else } +/td +td class=progress-cell + {UIUtils.makeProgressBar(job.numActiveTasks, job.numCompletedTasks, + job.numFailedTasks, job.numTasks)} +/td + /tr +} + +table class=table table-bordered table-striped table-condensed sortable + thead{columns}/thead + tbody +{jobs.map(makeRow)} + /tbody +/table + } + + def render(request: HttpServletRequest): Seq[Node] = { +listener.synchronized { + val activeJobs = listener.activeJobs.values.toSeq + val completedJobs = listener.completedJobs.reverse.toSeq + val failedJobs = listener.failedJobs.reverse.toSeq + val now = System.currentTimeMillis + + val activeJobsTable = +jobsTable(activeJobs.sortBy(_.startTime.getOrElse(-1L)).reverse) + val completedJobsTable = +jobsTable(completedJobs.sortBy(_.endTime.getOrElse(-1L)).reverse) + val failedJobsTable = +jobsTable(failedJobs.sortBy(_.endTime.getOrElse(-1L)).reverse) + + val summary: NodeSeq =
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/3009#issuecomment-63854023 All superficial comments on the code, but I tried it out and still got this somewhat icky situation where a completed stage has fewer succeeded tasks than the total: ![image](https://cloud.githubusercontent.com/assets/1108612/5130344/a80bdca2-709e-11e4-905b-9e5baebec7c9.png) Is this still the expected behavior? (this happened from running val rdd = sc.parallelize(1 to 10, 2).map((_, 1)).reduceByKey(_+_) and then counting the elements twice) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3009#discussion_r20666049 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala --- @@ -195,9 +180,10 @@ private[ui] class StageTableBase( private[ui] class FailedStageTable( stages: Seq[StageInfo], -parent: JobProgressTab, -killEnabled: Boolean = false) - extends StageTableBase(stages, parent, killEnabled) { +basePath: String, +listener: JobProgressListener, +isFairScheduler: Boolean) + extends StageTableBase(stages, basePath, listener, isFairScheduler, killEnabled = false) { --- End diff -- Not your change, but weird how `killEnabled` is an attribute of the `StageTableBase`. The right thing to do is to have a `KillableStageTable` or something. We can fix this later (no action needed on your part) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/3009#issuecomment-63855330 Also, I thought more about having this in 1.2, and I'm -0.5 on putting this in 1.2. Given all of the subtleties you ended up running into with this, Josh, I don't think it's a good idea to put it in the release at the last minute without giving folks plenty of time to test it out. Based on my (admittedly limited!) understanding of most Spark users, the UI page is one of the first thing a user will look at, and I don't think pushing a major change to what the user first sees when he or she interacts with Spark this late in the release cycle is a good idea. If I were a Spark user and had successfully tried the preview release Patrick had posted, and then later found that the final 1.2 release changed the landing page for the UI in a buggy / unintuitive / insert unexpected bug here, I think I'd be annoyed and question the Spark QA process. I'm certainly willing to be overruled if others are less concerned about the points I mentioned. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4439] [MLlib] add python api for random...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3320#issuecomment-63856051 [Test build #23677 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23677/consoleFull) for PR 3320 at commit [`e0df852`](https://github.com/apache/spark/commit/e0df852ab4f353b9f800fe5374195fee5a06aa52). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add example that reads a local file, writes to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3347#issuecomment-63856412 [Test build #23675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23675/consoleFull) for PR 3347 at commit [`af8ccb7`](https://github.com/apache/spark/commit/af8ccb785098f3c3a99f9a9d9675abbe989401a0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3381#issuecomment-63856447 [Test build #23676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23676/consoleFull) for PR 3381 at commit [`f7c704a`](https://github.com/apache/spark/commit/f7c704af4d615977c43b8f6af87c5166aee0ac03). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3381#issuecomment-63857314 [Test build #23676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23676/consoleFull) for PR 3381 at commit [`f7c704a`](https://github.com/apache/spark/commit/f7c704af4d615977c43b8f6af87c5166aee0ac03). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `final class Date extends Ordered[Date] with Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3381#issuecomment-63857315 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23676/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4488][PySpark] Add control over map-sid...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/3366#issuecomment-63859043 What's the cases that we should disable map side aggregation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3009#discussion_r20669493 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -56,7 +56,11 @@ case class SparkListenerTaskEnd( extends SparkListenerEvent @DeveloperApi -case class SparkListenerJobStart(jobId: Int, stageIds: Seq[Int], properties: Properties = null) +case class SparkListenerJobStart( +jobId: Int, +stageInfos: Seq[StageInfo], +stageIds: Seq[Int], // Note: this is here for backwards-compatibility --- End diff -- The issue if someone with an older version of Spark reads a log message from a newer version. We can't go back and modify how older versions parse the fields, so we need to include fields expected by older versions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3009#discussion_r20669574 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -56,7 +56,11 @@ case class SparkListenerTaskEnd( extends SparkListenerEvent @DeveloperApi -case class SparkListenerJobStart(jobId: Int, stageIds: Seq[Int], properties: Properties = null) +case class SparkListenerJobStart( +jobId: Int, +stageInfos: Seq[StageInfo], +stageIds: Seq[Int], // Note: this is here for backwards-compatibility --- End diff -- The problem I described could be fixed by just modifying the serializer and not the message here. However, we'd also like to provide binary compatiblity for people who wrote custom listeners and expect this field to be there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] elimin...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2524#issuecomment-63862394 @CodingCat I think we discussed in https://issues.apache.org/jira/browse/SPARK-3628 that it would be best to do this only for result stages first. Can you do that? The reason is that we can't fully guarantee these semantics for transformations, for two reasons: * A shuffle stage may be resubmitted once the old one is garbage-collected (if periodic cleanup is on) * If you use an accumulator in a pipelined transformation like a map(), and then you make a new RDD built on top of that (e.g. apply another map() to it), it won't count as the same stage so you'll still get the updates twice I think we can clarify our documentation to say accumulators offer this guarantee only in actions, and should be used more as counters in other settings. It would also lead to a *much* simpler patch, which is highly preferred for a bug fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3009#issuecomment-63862396 Hi Kay, The behavior you noticed is intentional. If a job completes but doesn't run all of it's stages, it ends up showing as finished with a partially completed progress bar. IMO - this should go in the release because it's one of the most requested features for Spark (giving better insights into runtime performance of a job). Because of the way the UI/Listners are structured it's very low risk and cannot interfere with other lower level functionality. Also, the existing Stage page is left unmodified, so this is purely additive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] elimin...
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2524#discussion_r20670036 --- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala --- @@ -282,7 +285,6 @@ private object Accumulators { return ret } - // Add values to the original accumulators with some given IDs --- End diff -- Why was this comment removed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] elimin...
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2524#discussion_r20670065 --- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala --- @@ -226,9 +227,12 @@ GrowableAccumulableParam[R % Growable[T] with TraversableOnce[T] with Serializa * @param param helper object defining how to add elements of type `T` * @tparam T result type */ -class Accumulator[T](@transient initialValue: T, param: AccumulatorParam[T], name: Option[String]) +class Accumulator[T](@transient initialValue: T, param: AccumulatorParam[T], + name: Option[String]) extends Accumulable[T,T](initialValue, param, name) { - def this(initialValue: T, param: AccumulatorParam[T]) = this(initialValue, param, None) + + def this(initialValue: T, param: AccumulatorParam[T]) = +this(initialValue, param, None) --- End diff -- Why was formatting changed here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4513][SQL] Support relational operator ...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/3387 [SPARK-4513][SQL] Support relational operator '=' in Spark SQL The relational operator '=' is not working in Spark SQL. Same works in Spark HiveQL You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark = Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3387.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3387 commit 7198e90fd6458bc44c0c40762c0d493d240e5e69 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-11-20T19:04:04Z Supporting relational operator '=' in Spark SQL --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] elimin...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2524#issuecomment-63863153 BTW when doing this only for result stages, my suggestion is to use the data structures within the stage instead of having a second HashMap. I believe I mentioned this before too (maybe on the previous PR): all you need to do is move the accumulator update code within the `if (!job.finished(rt.outputId)) {` for such stages, similar to how it only fetches results once for each task. Again the point is to avoid adding a new data structure in DAGScheduler that we must then carefully manage and clean up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4513][SQL] Support relational operator ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3387#issuecomment-63863591 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: add Sphinx as a dependency of building docs
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/3388 add Sphinx as a dependency of building docs You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark doc_readme Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3388.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3388 commit daa14828b0e9d8ad37bdae072963e97166c484a1 Author: Davies Liu dav...@databricks.com Date: 2014-11-20T19:22:51Z add Sphinx dependency --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: add Sphinx as a dependency of building docs
Github user davies commented on the pull request: https://github.com/apache/spark/pull/3388#issuecomment-63863907 cc @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: add Sphinx as a dependency of building docs
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3388#issuecomment-63864808 [Test build #23678 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23678/consoleFull) for PR 3388 at commit [`daa1482`](https://github.com/apache/spark/commit/daa14828b0e9d8ad37bdae072963e97166c484a1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] elimin...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/2524#issuecomment-63866302 @mateiz , I see...I had the impression that we agreed on still support shuffle stage deduplication finally... OK, I can shrink this patch to only support result stage --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4439] [MLlib] add python api for random...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3320#issuecomment-63869584 [Test build #23677 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23677/consoleFull) for PR 3320 at commit [`e0df852`](https://github.com/apache/spark/commit/e0df852ab4f353b9f800fe5374195fee5a06aa52). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4439] [MLlib] add python api for random...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3320#issuecomment-63869599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23677/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3200#issuecomment-63870038 [Test build #23679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23679/consoleFull) for PR 3200 at commit [`9ae85aa`](https://github.com/apache/spark/commit/9ae85aa1ebabdc099d7f655bc1d9021d34d2910f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add example that reads a local file, writes to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3347#issuecomment-63870055 [Test build #23675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23675/consoleFull) for PR 3347 at commit [`af8ccb7`](https://github.com/apache/spark/commit/af8ccb785098f3c3a99f9a9d9675abbe989401a0). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4145] [WIP] Web UI job pages
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3009#discussion_r20673960 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -144,11 +146,30 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging { } override def onJobStart(jobStart: SparkListenerJobStart) = synchronized { -val jobGroup = Option(jobStart.properties).map(_.getProperty(SparkContext.SPARK_JOB_GROUP_ID)) +val jobGroup = for ( + props - Option(jobStart.properties); + group - Option(props.getProperty(SparkContext.SPARK_JOB_GROUP_ID)) +) yield group val jobData: JobUIData = - new JobUIData(jobStart.jobId, jobStart.stageIds, jobGroup, JobExecutionStatus.RUNNING) + new JobUIData(jobStart.jobId, Some(System.currentTimeMillis), None, jobStart.stageIds, +jobGroup, JobExecutionStatus.RUNNING) +// Compute (a potential underestimate of) the number of tasks that will be run by this job: --- End diff -- How's this for an explanation? ``` // Compute (a potential underestimate of) the number of tasks that will be run by this job. // This may be an underestimate because the job start event references all of the result // stages's transitive stage dependencies, but some of these stages might be skipped if their // output is available from earlier runs. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4477] [PySpark] remove numpy from RDDSa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3351#issuecomment-63871878 [Test build #23680 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23680/consoleFull) for PR 3351 at commit [`ee17d78`](https://github.com/apache/spark/commit/ee17d7846438e270967e38120d0fb6c63523defd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org