[GitHub] spark pull request: [SPARK-5198][Mesos] Change executorId more uni...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/3994#issuecomment-70358767 @mateiz I think I don't know fine-grained mode how you intend to behave exactly. What help me to understand more? I don't know how multi executor break spark's intended behaviour. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][minor] Improved Row documentation.
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4085 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5282][mllib]: RowMatrix easily gets int...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4069#issuecomment-70359385 [Test build #25700 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25700/consoleFull) for PR 4069 at commit [`e54e5c8`](https://github.com/apache/spark/commit/e54e5c8b23c2cc5ae066a68712169d5eb188f4f9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5198][Mesos] Change executorId more uni...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/3994#issuecomment-70359541 @tnachen - Slave page ![screen shot 2015-01-17 at 5 38 20 pm](https://cloud.githubusercontent.com/assets/3612566/5788288/a87230de-9e6f-11e4-8e18-972d6b3b9204.png) - Sandbox page ![screen shot 2015-01-17 at 5 33 15 pm](https://cloud.githubusercontent.com/assets/3612566/5788290/a8d2b486-9e6f-11e4-9a41-32c72824d3cc.png) - stderr ![screen shot 2015-01-17 at 5 33 30 pm](https://cloud.githubusercontent.com/assets/3612566/5788289/a8a600c6-9e6f-11e4-902d-4d890ff67d89.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5198][Mesos] Change executorId more uni...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/3994#issuecomment-70359713 @tnachen And slave's logs around task 34, 63. It looks like that if any task occurs error while running, the executor running that task is terminated. Check this, please. ``` I0117 17:21:43.678827 41388 slave.cpp:625] Got assigned task 34 for framework 20150117-171023-3391097354-60030-7325-0004 I0117 17:21:43.679612 41388 slave.cpp:734] Launching task 34 for framework 20150117-171023-3391097354-60030-7325-0004 I0117 17:21:43.721297 41388 slave.cpp:844] Queuing task '34' for executor 20141110-112437-3374320138-60030-57359-44 of framework '20150117-171023-3391097354-60030-7325-0004 I0117 17:21:43.775977 41388 slave.cpp:358] Successfully attached file '/data03/mesos/slaves/20141110-112437-3374320138-60030-57359-44/frameworks/20150117-171023-3391097354-60030-7325-0004/executors/20141110-112437-3374320138-60030-57359-44/runs/3fdbdd09-98cd-4197-954f-d95d9b3b4aee' I0117 17:21:43.721451 41386 mesos_containerizer.cpp:407] Starting container '3fdbdd09-98cd-4197-954f-d95d9b3b4aee' for executor '20141110-112437-3374320138-60030-57359-44' of framework '20150117-171023-3391097354-60030-7325-0004' I0117 17:21:43.777179 41386 mesos_containerizer.cpp:528] Fetching URIs for container '3fdbdd09-98cd-4197-954f-d95d9b3b4aee' using command '/usr/bin/env MESOS_EXECUTOR_URIS=hdfs:///app/spark/spark-1.3.0-SNAPSHOT-bin-2.3.0-cdh5.0.1.tgz+0X MESOS_WORK_DIRECTORY=/data03/mesos/slaves/20141110-112437-3374320138-60030-57359-44/frameworks/20150117-171023-3391097354-60030-7325-0004/executors/20141110-112437-3374320138-60030-57359-44/runs/3fdbdd09-98cd-4197-954f-d95d9b3b4aee HADOOP_HOME=/app/hdfs/ /app/mesos-0.18.1/libexec/mesos/mesos-fetcher' I0117 17:22:28.863304 41374 slave.cpp:2523] Current usage 44.85%. Max allowed age: 3.160841566048842days I0117 17:22:38.472086 41384 slave.cpp:625] Got assigned task 63 for framework 20150117-171023-3391097354-60030-7325-0004 I0117 17:22:38.472584 41384 slave.cpp:734] Launching task 63 for framework 20150117-171023-3391097354-60030-7325-0004 I0117 17:22:38.472801 41384 slave.cpp:844] Queuing task '63' for executor 20141110-112437-3374320138-60030-57359-44 of framework '20150117-171023-3391097354-60030-7325-0004 I0117 17:22:43.721726 41370 slave.cpp:2475] Terminating executor 20141110-112437-3374320138-60030-57359-44 of framework 20150117-171023-3391097354-60030-7325-0004 because it did not register within 1mins I0117 17:22:43.722038 41378 mesos_containerizer.cpp:818] Destroying container '3fdbdd09-98cd-4197-954f-d95d9b3b4aee' I0117 17:22:43.722295 41378 slave.cpp:2052] Executor '20141110-112437-3374320138-60030-57359-44' of framework 20150117-171023-3391097354-60030-7325-0004 has terminated with unknown status E0117 17:22:43.722744 41376 slave.cpp:2332] Failed to unmonitor container for executor 20141110-112437-3374320138-60030-57359-44 of framework 20150117-171023-3391097354-60030-7325-0004: Not monitored I0117 17:22:43.737566 41378 slave.cpp:1669] Handling status update TASK_LOST (UUID: cc571b34-e161-4ec9-bcf1-033bd967209f) for task 34 of framework 20150117-171023-3391097354-60030-7325-0004 from @0.0.0.0:0 I0117 17:22:43.737829 41378 slave.cpp:3142] Terminating task 34 I0117 17:22:43.738701 41372 status_update_manager.cpp:315] Received status update TASK_LOST (UUID: cc571b34-e161-4ec9-bcf1-033bd967209f) for task 34 of framework 20150117-171023-3391097354-60030-7325-0004 I0117 17:22:43.739341 41378 slave.cpp:1669] Handling status update TASK_LOST (UUID: f198e879-a762-4cce-97ff-5261cb4ff820) for task 63 of framework 20150117-171023-3391097354-60030-7325-0004 from @0.0.0.0:0 I0117 17:22:43.739398 41372 status_update_manager.cpp:494] Creating StatusUpdate stream for task 34 of framework 20150117-171023-3391097354-60030-7325-0004 I0117 17:22:43.739542 41378 slave.cpp:3142] Terminating task 63 I0117 17:22:43.739869 41372 status_update_manager.cpp:368] Forwarding status update TASK_LOST (UUID: cc571b34-e161-4ec9-bcf1-033bd967209f) for task 34 of framework 20150117-171023-3391097354-60030-7325-0004 to master@10.10.32.202:60030 I0117 17:22:43.740393 41372 status_update_manager.cpp:315] Received status update TASK_LOST (UUID: f198e879-a762-4cce-97ff-5261cb4ff820) for task 63 of framework 20150117-171023-3391097354-60030-7325-0004 I0117 17:22:43.740411 41384 slave.cpp:1789] Status update manager successfully handled status update TASK_LOST (UUID: cc571b34-e161-4ec9-bcf1-033bd967209f) for task 34 of framework 20150117-171023-3391097354-60030-7325-0004 I0117 17:22:43.740573 41372 status_update_manager.cpp:494] Creating StatusUpdate stream for task 63 of framework 20150117-171023-3391097354-60030-7325-0004 I0117 17:22:43.740892 41372 status_update_manager.cpp:368] Forwarding status update TASK_LOST (UUID: f198e879-a762-4cce-97ff
[GitHub] spark pull request: [SPARK-3880] HBase as data source to SparkSQL
Github user OopsOutOfMemory commented on the pull request: https://github.com/apache/spark/pull/4084#issuecomment-70359819 and we need check the coding styles. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5282][mllib]: RowMatrix easily gets int...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4069#issuecomment-70360796 [Test build #25700 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25700/consoleFull) for PR 4069 at commit [`e54e5c8`](https://github.com/apache/spark/commit/e54e5c8b23c2cc5ae066a68712169d5eb188f4f9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5282][mllib]: RowMatrix easily gets int...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4069#issuecomment-70360799 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25700/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5285][SQL] Removed GroupExpression in c...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4075#issuecomment-70361176 The `GroupExpression` is not used as transformation in `Analyzer`, but in `Optimizer`, that's why it still can pass the unit test. I should document this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5285][SQL] Removed GroupExpression in c...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/4075#issuecomment-70362210 @chenghao-intelï¼ do we need to add unit test fot this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4074#issuecomment-70362340 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4074#issuecomment-70362399 [Test build #25701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25701/consoleFull) for PR 4074 at commit [`d76f8e3`](https://github.com/apache/spark/commit/d76f8e3cbe10f2ed5239281d6098d619640368d5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4074#issuecomment-70364288 [Test build #25701 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25701/consoleFull) for PR 4074 at commit [`d76f8e3`](https://github.com/apache/spark/commit/d76f8e3cbe10f2ed5239281d6098d619640368d5). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4074#issuecomment-70364290 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25701/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5219][Core] Add locks to avoid scheduli...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/4019#issuecomment-70365375 These methods are called in threads of `TaskResultGetter.getTaskResultExecutor`. And they access variables such as `isZombie`, `taskInfos` in `TaskSetManager`, which also are used in `TaskSchedulerImpl`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-70366247 @jkbradley I use JPMML to verify the exported model produces the same results, here the details of my tests: https://github.com/selvinsource/spark-pmml-exporter-validator --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...
Github user ash211 commented on a diff in the pull request: https://github.com/apache/spark/pull/4074#discussion_r23125261 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -545,6 +546,12 @@ class RDDSuite extends FunSuite with SharedSparkContext { assert(sortedTopK === nums.sorted(ord).take(5)) } + test(isEmpty) { +assert(sc.emptyRDD.isEmpty()) +assert(sc.parallelize(Seq[Int]()).isEmpty()) +assert(!sc.parallelize(Seq(1)).isEmpty()) --- End diff -- I don't think this tests the case where there are multiple partitions but no data in any of the partitions. Maybe add something like `assert(sc.parallelize(Seq(1,2,3), 3).filter(_ 0).isEmpty())` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Comment for the newly optimi...
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/4086 [SPARK-4937][SQL] Comment for the newly optimization rules in `BooleanSimplification` Follow up of #3778 /cc @rxin You can merge this pull request into a Git repository by running: $ git pull https://github.com/scwf/spark commentforspark-4937 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4086.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4086 commit 2d3406e63dfd8e527fd2f6ed9fc27cc342a51459 Author: scwf wangf...@huawei.com Date: 2015-01-17T14:33:07Z added comment for spark-4937 commit aaf89f64333d2a9692a1068d0165c36128744d42 Author: scwf wangf...@huawei.com Date: 2015-01-17T14:34:57Z code style issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Comment for the newly optimi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4086#issuecomment-70369529 [Test build #25702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25702/consoleFull) for PR 4086 at commit [`aaf89f6`](https://github.com/apache/spark/commit/aaf89f64333d2a9692a1068d0165c36128744d42). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70369703 Could you please tell me what is the preferred way to generate random data in spark? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70370192 [Test build #25703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25703/consoleFull) for PR 4014 at commit [`ab22f7b`](https://github.com/apache/spark/commit/ab22f7b55988ba324e14969c89d8edfe4d663504). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70370361 [Test build #25704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25704/consoleFull) for PR 4073 at commit [`a7bfc70`](https://github.com/apache/spark/commit/a7bfc70e4382efeee83e2657844e20b3b9f60448). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/3997#discussion_r23125961 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -449,6 +461,37 @@ class SparseVector( override def toString: String = (%s,%s,%s).format(size, indices.mkString([, ,, ]), values.mkString([, ,, ])) + override def equals(other: Any): Boolean = { +other match { + case v: SparseVector = { +if (this.size != v.size) { return false } +val thisValues = this.values +val thisIndices = this.indices +val thisSize = thisValues.size +val otherValues = v.values +val otherIndices = v.indices +val otherSize = otherValues.size + +var k1 = 0 +var k2 = 0 +var allEqual = true +while (allEqual) { + while (k1 thisSize thisValues(k1) == 0) k1 += 1 + while (k2 otherSize otherValues(k2) == 0) k2 += 1 + + if (k1 = thisSize || k2 = otherSize) { +return k1 = thisSize k2 = otherSize // check end alignment + } + allEqual = thisIndices(k1) == otherIndices(k2) thisValues(k1) == otherValues(k2) + k1 += 1 + k2 += 1 +} +allEqual + } + case _ = super.equals(other) +} + } + --- End diff -- yes, and I found the sparse vs dense is actually quite similar to sparse vs sparse, I'm trying to unify them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Comment for the newly optimi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4086#issuecomment-70371913 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25702/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Comment for the newly optimi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4086#issuecomment-70371911 [Test build #25702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25702/consoleFull) for PR 4086 at commit [`aaf89f6`](https://github.com/apache/spark/commit/aaf89f64333d2a9692a1068d0165c36128744d42). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70372569 [Test build #25703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25703/consoleFull) for PR 4014 at commit [`ab22f7b`](https://github.com/apache/spark/commit/ab22f7b55988ba324e14969c89d8edfe4d663504). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70372572 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25703/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70372786 [Test build #25704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25704/consoleFull) for PR 4073 at commit [`a7bfc70`](https://github.com/apache/spark/commit/a7bfc70e4382efeee83e2657844e20b3b9f60448). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70372791 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25704/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...
GitHub user leahmcguire opened a pull request: https://github.com/apache/spark/pull/4087 [SPARK-4894][mllib] Added Bernoulli option to NaiveBayes model in mllib Added optional model type parameter for NaiveBayes training. Can be either Multinomial or Bernoulli. When Bernoulli is given the Bernoulli smoothing is used for fitting and for prediction as per: http://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html. Default for model is original Multinomial fit and predict. Added additional testing for Bernoulli and Multinomial models. You can merge this pull request into a Git repository by running: $ git pull https://github.com/leahmcguire/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4087.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4087 commit ce73c63e8bac40b02ae0a8147c3b424783f6094a Author: leahmcguire lmcgu...@salesforce.com Date: 2015-01-16T16:06:06Z added Bernoulli option to niave bayes model in mllib, added optional model type parameter for training. When Bernoulli is given the Bernoulli smoothing is used for fitting and for prediction http://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4087#issuecomment-70373574 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70375368 @jkbradley I've added a test according to the other tests in the `RandomForestSuite` . Let me know if there is anything left. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70375392 [Test build #25705 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25705/consoleFull) for PR 4073 at commit [`d1df1b2`](https://github.com/apache/spark/commit/d1df1b2df9b76e94abf95182fb47902b2740e6d3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3564#issuecomment-70382464 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25706/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/3564#issuecomment-70381703 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] [SPARK-5301] Missing conversions and o...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4089#issuecomment-70389330 [Test build #25708 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25708/consoleFull) for PR 4089 at commit [`cb10ae5`](https://github.com/apache/spark/commit/cb10ae5a36be7d942e74005ed22610287e3059eb). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] [SPARK-5301] Missing conversions and o...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4089#issuecomment-70389332 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25708/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70378420 [Test build #25705 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25705/consoleFull) for PR 4073 at commit [`d1df1b2`](https://github.com/apache/spark/commit/d1df1b2df9b76e94abf95182fb47902b2740e6d3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5198][Mesos] Change executorId more uni...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/3994#issuecomment-70388679 From the logs it is indeed hit the executor registeration timeout (1 minutes), so Mesos terminated the task. I don't think changing the executor Id fixes this problem, and isn't necessary I think. Can you try changing the timeout via slave flags to a longer time and try again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-70378423 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25705/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5285][SQL] Removed GroupExpression in c...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4075#issuecomment-70379577 I think there still could be a unit test to make sure that things in GroupExpressions get optimized. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3564#issuecomment-70381864 [Test build #25706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25706/consoleFull) for PR 3564 at commit [`f697a55`](https://github.com/apache/spark/commit/f697a5523dd96629e2502ba61c76f9e4717b858e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3200#discussion_r23128478 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.linalg.distributed + +import breeze.linalg.{DenseMatrix = BDM} + +import org.apache.spark._ +import org.apache.spark.mllib.linalg._ +import org.apache.spark.mllib.rdd.RDDFunctions._ +import org.apache.spark.rdd.RDD +import org.apache.spark.storage.StorageLevel +import org.apache.spark.util.Utils + +/** + * A grid partitioner, which stores every block in a separate partition. + * + * @param numRowBlocks Number of blocks that form the rows of the matrix. + * @param numColBlocks Number of blocks that form the columns of the matrix. + * @param rowPerBlock Number of rows that make up each block. + * @param colPerBlock Number of columns that make up each block. + */ +private[mllib] class GridPartitioner( +val numRowBlocks: Int, +val numColBlocks: Int, +val rowPerBlock: Int, +val colPerBlock: Int, +override val numPartitions: Int) extends Partitioner { + + /** + * Returns the index of the partition the SubMatrix belongs to. + * + * @param key The key for the SubMatrix. Can be its position in the grid (its column major index) + *or a tuple of three integers that are the final row index after the multiplication, + *the index of the block to multiply with, and the final column index after the + *multiplication. + * @return The index of the partition, which the SubMatrix belongs to. + */ + override def getPartition(key: Any): Int = { +key match { + case ind: (Int, Int) = +Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions) + case indices: (Int, Int, Int) = +Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, numPartitions) + case _ = +throw new IllegalArgumentException(Unrecognized key) +} + } + + /** Checks whether the partitioners have the same characteristics */ + override def equals(obj: Any): Boolean = { +obj match { + case r: GridPartitioner = +(this.numPartitions == r.numPartitions) (this.rowPerBlock == r.rowPerBlock) + (this.colPerBlock == r.colPerBlock) + case _ = +false +} + } +} + +/** + * Represents a distributed matrix in blocks of local matrices. + * + * @param numRowBlocks Number of blocks that form the rows of this matrix + * @param numColBlocks Number of blocks that form the columns of this matrix + * @param rdd The RDD of SubMatrices (local matrices) that form this matrix + */ +class BlockMatrix( +val numRowBlocks: Int, +val numColBlocks: Int, +val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with Logging { + + type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, blockColIndex), matrix) + + /** + * Alternate constructor for BlockMatrix without the input of a partitioner. Will use a Grid + * Partitioner by default. + * + * @param numRowBlocks Number of blocks that form the rows of this matrix + * @param numColBlocks Number of blocks that form the columns of this matrix + * @param rdd The RDD of SubMatrices (local matrices) that form this matrix + * @param rowPerBlock Number of rows that make up each block. + * @param colPerBlock Number of columns that make up each block. + */ + def this( + numRowBlocks: Int, + numColBlocks: Int, + rdd: RDD[((Int, Int), Matrix)], + rowPerBlock: Int, + colPerBlock: Int) = { +this(numRowBlocks, numColBlocks, rdd) +val part = new GridPartitioner(numRowBlocks, numColBlocks,
[GitHub] spark pull request: SPARK-5019 - GaussianMixtureModel exposes inst...
GitHub user tgaloppo opened a pull request: https://github.com/apache/spark/pull/4088 SPARK-5019 - GaussianMixtureModel exposes instances of MultivariateGauss... This PR modifies GaussianMixtureModel to expose instances of MutlivariateGaussian rather than separate mean and covariance arrays. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgaloppo/spark spark-5019 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4088.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4088 commit 091e8da4913eacf28530ab7fb2bd6c39ab2cef4b Author: Travis Galoppo tjg2...@columbia.edu Date: 2015-01-16T16:06:57Z SPARK-5019 - GaussianMixtureModel exposes instances of MultivariateGaussian rather than mean/covariance matrices --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5019 - GaussianMixtureModel exposes inst...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4088#issuecomment-70386105 [Test build #25707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25707/consoleFull) for PR 4088 at commit [`091e8da`](https://github.com/apache/spark/commit/091e8da4913eacf28530ab7fb2bd6c39ab2cef4b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5285][SQL] Removed GroupExpression in c...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/4075#issuecomment-70386657 @marmbrus after my investigate i think it is very rare case we will optimize GroupExpressions. Should our optimization cover sql such as ```SELECT a, b, count(*) FROM T1 GROUP BY a, b, 1+1 GROUPING SETS (1+1, a, (a, b), b, ())```? if not maybe we do not need to optimize it, then the change of this PR is safe. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] [SPARK-5301] Missing conversions and o...
GitHub user rezazadeh opened a pull request: https://github.com/apache/spark/pull/4089 [MLlib] [SPARK-5301] Missing conversions and operations on IndexedRowMatrix and CoordinateMatrix * Transpose is missing from CoordinateMatrix (this is cheap to compute, so it should be there) * IndexedRowMatrix should be convertable to CoordinateMatrix (conversion added) Tests for both added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rezazadeh/spark matutils Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4089.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4089 commit a7ae0488f49117501506f88b10d8dc606d2207c6 Author: Reza Zadeh r...@databricks.com Date: 2015-01-17T22:06:50Z Missing linear algebra utilities commit cb10ae5a36be7d942e74005ed22610287e3059eb Author: Reza Zadeh r...@databricks.com Date: 2015-01-17T22:11:27Z remove unnecessary import --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] [SPARK-5301] Missing conversions and o...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4089#issuecomment-70387085 [Test build #25708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25708/consoleFull) for PR 4089 at commit [`cb10ae5`](https://github.com/apache/spark/commit/cb10ae5a36be7d942e74005ed22610287e3059eb). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3564#issuecomment-70382458 [Test build #25706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25706/consoleFull) for PR 3564 at commit [`f697a55`](https://github.com/apache/spark/commit/f697a5523dd96629e2502ba61c76f9e4717b858e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5198][Mesos] Change executorId more uni...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3994#issuecomment-70388163 @jongyoul the goal of fine-grained mode is to run many Spark tasks in the same executor, which is why we're giving them all the same executor ID. Mesos supports this in its concept of executors, and it has the benefit that Mesos can account for the CPUs used by each task separately and give those CPUs to other frameworks when Spark is not active. In contrast, coarse-grained mode reserves the CPUs on the machine for the whole lifetime of the executor. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5019 - GaussianMixtureModel exposes inst...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4088#issuecomment-70388528 [Test build #25707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25707/consoleFull) for PR 4088 at commit [`091e8da`](https://github.com/apache/spark/commit/091e8da4913eacf28530ab7fb2bd6c39ab2cef4b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5019 - GaussianMixtureModel exposes inst...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4088#issuecomment-70388533 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25707/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Comment for the newly optimi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4086 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Comment for the newly optimi...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4086#issuecomment-70389973 Merging in master. I will submit a PR to update the description to make it more clear. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3880] HBase as data source to SparkSQL
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4084#issuecomment-70390837 Hi - thanks for working on this... it looks interesting. I'd like to close this issue (i.e. the PR) and discuss more on the JIRA/dev list rather than having a big pull request like this. For very large features this is the way we do it. If you look on your wiki it says If you are proposing a larger change, attach a design document to your JIRA first (example) and email the dev mailing list to discuss it. https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/4090#discussion_r23129926 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -302,89 +302,100 @@ object OptimizeIn extends Rule[LogicalPlan] { object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan = q transformExpressionsUp { - case and @ And(left, right) = -(left, right) match { - case (Literal(true, BooleanType), r) = r - case (l, Literal(true, BooleanType)) = l - case (Literal(false, BooleanType), _) = Literal(false) - case (_, Literal(false, BooleanType)) = Literal(false) - // a a = a - case (l, r) if l fastEquals r = l - case (_, _) = -/* Do optimize for predicates using formula (a || b) (a || c) = a || (b c) - * 1. Split left and right to get the disjunctive predicates, - *i.e. lhsSet = (a, b), rhsSet = (a, c) - * 2. Find the common predict between lhsSet and rhsSet, i.e. common = (a) - * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff = (b), rdiff = (c) - * 4. Apply the formula, get the optimized predict: common || (ldiff rdiff) - */ -val lhsSet = splitDisjunctivePredicates(left).toSet -val rhsSet = splitDisjunctivePredicates(right).toSet -val common = lhsSet.intersect(rhsSet) -val ldiff = lhsSet.diff(common) -val rdiff = rhsSet.diff(common) -if (ldiff.size == 0 || rdiff.size == 0) { - // a (a || b) = a - common.reduce(Or) -} else { - // (a || b || c || ...) (a || b || d || ...) (a || b || e || ...) ... = - // (a || b) || ((c || ...) (f || ...) (e || ...) ...) - (ldiff.reduceOption(Or) ++ rdiff.reduceOption(Or)) -.reduceOption(And) -.map(_ :: common.toList) -.getOrElse(common.toList) -.reduce(Or) -} -} - - case or @ Or(left, right) = -(left, right) match { - case (Literal(true, BooleanType), _) = Literal(true) - case (_, Literal(true, BooleanType)) = Literal(true) - case (Literal(false, BooleanType), r) = r - case (l, Literal(false, BooleanType)) = l - // a || a = a - case (l, r) if l fastEquals r = l - case (_, _) = -/* Do optimize for predicates using formula (a b) || (a c) = a (b || c) - * 1. Split left and right to get the conjunctive predicates, - *i.e. lhsSet = (a, b), rhsSet = (a, c) - * 2. Find the common predict between lhsSet and rhsSet, i.e. common = (a) - * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff = (b), rdiff = (c) - * 4. Apply the formula, get the optimized predict: common (ldiff || rdiff) - */ -val lhsSet = splitConjunctivePredicates(left).toSet -val rhsSet = splitConjunctivePredicates(right).toSet -val common = lhsSet.intersect(rhsSet) -val ldiff = lhsSet.diff(common) -val rdiff = rhsSet.diff(common) -if ( ldiff.size == 0 || rdiff.size == 0) { - // a || (b a) = a - common.reduce(And) -} else { - // (a b c ...) || (a b d ...) || (a b e ...) ... = - // a b ((c ...) || (d ...) || (e ...) || ...) - (ldiff.reduceOption(And) ++ rdiff.reduceOption(And)) -.reduceOption(Or) -.map(_ :: common.toList) -.getOrElse(common.toList) -.reduce(And) -} -} - - case not @ Not(exp) = -exp match { - case Literal(true, BooleanType) = Literal(false) - case Literal(false, BooleanType) = Literal(true) - case GreaterThan(l, r) = LessThanOrEqual(l, r) - case GreaterThanOrEqual(l, r) = LessThan(l, r) - case LessThan(l, r) = GreaterThanOrEqual(l, r) - case LessThanOrEqual(l, r) = GreaterThan(l, r) - case Not(e) = e - case _ = not -} - - // Turn if (true) a else b into a, and if (false) a else b into b. + case and @ And(left, right) = (left, right) match { +// true r = r +case (Literal(true, BooleanType), r) = r +
[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4074#discussion_r23129922 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala --- @@ -436,6 +436,12 @@ trait JavaRDDLike[T, This : JavaRDDLike[T, This]] extends Serializable { def first(): T = rdd.first() /** + * @return true if and only if the RDD contains no elements at all. Note that an RDD + * may be empty even when it has at least 1 partition. + */ + def isEmpty(): Boolean = rdd.isEmpty() --- End diff -- Okay sounds good @srowen want to just add an exclusion then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4877] Allow user first classes to exten...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3725#issuecomment-70392673 @holdenk @pwendell Can one of you review this, sign off, and commit? I don't really have enough expertise here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-70394597 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25711/consoleFull) for PR 2634 at commit [`35da8e9`](https://github.com/apache/spark/commit/35da8e9e188e66946d5799d061ecc3ca150f). * This patch **fails** unit tests. * This patch **does not** merge cleanly! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-70394598 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25711/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Refactors deeply nested FP style ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4091#issuecomment-70394771 [Test build #25712 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25712/consoleFull) for PR 4091 at commit [`e833ca4`](https://github.com/apache/spark/commit/e833ca4b7a108c053870ba03a013656556fd3d58). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Refactors deeply nested FP style ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4091#issuecomment-70394772 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25712/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5289]: Backport publishing of repl, yar...
Github user pwendell closed the pull request at: https://github.com/apache/spark/pull/4079 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5289]: Backport publishing of repl, yar...
GitHub user pwendell reopened a pull request: https://github.com/apache/spark/pull/4079 [SPARK-5289]: Backport publishing of repl, yarn into branch-1.2. This change was done in SPARK-4048 as part of a larger refactoring, but we need to backport this publishing of yarn and repl into Spark 1.2, so that we can cut a 1.2.1 release with these artifacts. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pwendell/spark skip-deps Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4079.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4079 commit 807b833680d433ada6f9fd0e262197ffa8de5f89 Author: Patrick Wendell patr...@databricks.com Date: 2015-01-16T22:31:56Z [SPARK-5289]: Backport publishing of repl, yarn into branch-1.2. This change was done in SPARK-4048 as part of a larger refactoring, but we need to backport this publishing of yarn and repl into Spark 1.2, so that we can cut a 1.2.1 release with these artifacts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5289]: Backport publishing of repl, yar...
Github user pwendell closed the pull request at: https://github.com/apache/spark/pull/4079 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX]: Minor clean up regarding skipped art...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4080 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5249] Added type specific set functions...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4042#issuecomment-70390499 @JoshRosen or @srowen - what are your feelings on it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5249] Added type specific set functions...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4042#issuecomment-70390485 I'd actually prefer not to have this in Spark. It's not really clear what we will do with an `Any`, and the user can really easily just call `toString` explicitly. I also looked at two other similar constructs in Java (the Java Properties class and Hadoop's Configuration class) and none of them offer this type of interface. There are multiple language API's that have this `setConf` and they all require string keys and values, it's just a bit inconsistent to do this kind of implicit conversion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/4090#discussion_r23129720 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -302,89 +302,100 @@ object OptimizeIn extends Rule[LogicalPlan] { object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan = q transformExpressionsUp { - case and @ And(left, right) = -(left, right) match { - case (Literal(true, BooleanType), r) = r - case (l, Literal(true, BooleanType)) = l - case (Literal(false, BooleanType), _) = Literal(false) - case (_, Literal(false, BooleanType)) = Literal(false) - // a a = a - case (l, r) if l fastEquals r = l - case (_, _) = -/* Do optimize for predicates using formula (a || b) (a || c) = a || (b c) - * 1. Split left and right to get the disjunctive predicates, - *i.e. lhsSet = (a, b), rhsSet = (a, c) - * 2. Find the common predict between lhsSet and rhsSet, i.e. common = (a) - * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff = (b), rdiff = (c) - * 4. Apply the formula, get the optimized predict: common || (ldiff rdiff) - */ -val lhsSet = splitDisjunctivePredicates(left).toSet -val rhsSet = splitDisjunctivePredicates(right).toSet -val common = lhsSet.intersect(rhsSet) -val ldiff = lhsSet.diff(common) -val rdiff = rhsSet.diff(common) -if (ldiff.size == 0 || rdiff.size == 0) { - // a (a || b) = a - common.reduce(Or) -} else { - // (a || b || c || ...) (a || b || d || ...) (a || b || e || ...) ... = - // (a || b) || ((c || ...) (f || ...) (e || ...) ...) - (ldiff.reduceOption(Or) ++ rdiff.reduceOption(Or)) -.reduceOption(And) -.map(_ :: common.toList) -.getOrElse(common.toList) -.reduce(Or) -} -} - - case or @ Or(left, right) = -(left, right) match { - case (Literal(true, BooleanType), _) = Literal(true) - case (_, Literal(true, BooleanType)) = Literal(true) - case (Literal(false, BooleanType), r) = r - case (l, Literal(false, BooleanType)) = l - // a || a = a - case (l, r) if l fastEquals r = l - case (_, _) = -/* Do optimize for predicates using formula (a b) || (a c) = a (b || c) - * 1. Split left and right to get the conjunctive predicates, - *i.e. lhsSet = (a, b), rhsSet = (a, c) - * 2. Find the common predict between lhsSet and rhsSet, i.e. common = (a) - * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff = (b), rdiff = (c) - * 4. Apply the formula, get the optimized predict: common (ldiff || rdiff) - */ -val lhsSet = splitConjunctivePredicates(left).toSet -val rhsSet = splitConjunctivePredicates(right).toSet -val common = lhsSet.intersect(rhsSet) -val ldiff = lhsSet.diff(common) -val rdiff = rhsSet.diff(common) -if ( ldiff.size == 0 || rdiff.size == 0) { - // a || (b a) = a - common.reduce(And) -} else { - // (a b c ...) || (a b d ...) || (a b e ...) ... = - // a b ((c ...) || (d ...) || (e ...) || ...) - (ldiff.reduceOption(And) ++ rdiff.reduceOption(And)) -.reduceOption(Or) -.map(_ :: common.toList) -.getOrElse(common.toList) -.reduce(And) -} -} - - case not @ Not(exp) = -exp match { - case Literal(true, BooleanType) = Literal(false) - case Literal(false, BooleanType) = Literal(true) - case GreaterThan(l, r) = LessThanOrEqual(l, r) - case GreaterThanOrEqual(l, r) = LessThan(l, r) - case LessThan(l, r) = GreaterThanOrEqual(l, r) - case LessThanOrEqual(l, r) = GreaterThan(l, r) - case Not(e) = e - case _ = not -} - - // Turn if (true) a else b into a, and if (false) a else b into b. + case and @ And(left, right) = (left, right) match { +// true r = r +case (Literal(true, BooleanType), r) = r +
[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/4090#discussion_r23129994 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -302,89 +302,100 @@ object OptimizeIn extends Rule[LogicalPlan] { object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan = q transformExpressionsUp { - case and @ And(left, right) = -(left, right) match { - case (Literal(true, BooleanType), r) = r - case (l, Literal(true, BooleanType)) = l - case (Literal(false, BooleanType), _) = Literal(false) - case (_, Literal(false, BooleanType)) = Literal(false) - // a a = a - case (l, r) if l fastEquals r = l - case (_, _) = -/* Do optimize for predicates using formula (a || b) (a || c) = a || (b c) - * 1. Split left and right to get the disjunctive predicates, - *i.e. lhsSet = (a, b), rhsSet = (a, c) - * 2. Find the common predict between lhsSet and rhsSet, i.e. common = (a) - * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff = (b), rdiff = (c) - * 4. Apply the formula, get the optimized predict: common || (ldiff rdiff) - */ -val lhsSet = splitDisjunctivePredicates(left).toSet -val rhsSet = splitDisjunctivePredicates(right).toSet -val common = lhsSet.intersect(rhsSet) -val ldiff = lhsSet.diff(common) -val rdiff = rhsSet.diff(common) -if (ldiff.size == 0 || rdiff.size == 0) { - // a (a || b) = a - common.reduce(Or) -} else { - // (a || b || c || ...) (a || b || d || ...) (a || b || e || ...) ... = - // (a || b) || ((c || ...) (f || ...) (e || ...) ...) - (ldiff.reduceOption(Or) ++ rdiff.reduceOption(Or)) -.reduceOption(And) -.map(_ :: common.toList) -.getOrElse(common.toList) -.reduce(Or) -} -} - - case or @ Or(left, right) = -(left, right) match { - case (Literal(true, BooleanType), _) = Literal(true) - case (_, Literal(true, BooleanType)) = Literal(true) - case (Literal(false, BooleanType), r) = r - case (l, Literal(false, BooleanType)) = l - // a || a = a - case (l, r) if l fastEquals r = l - case (_, _) = -/* Do optimize for predicates using formula (a b) || (a c) = a (b || c) - * 1. Split left and right to get the conjunctive predicates, - *i.e. lhsSet = (a, b), rhsSet = (a, c) - * 2. Find the common predict between lhsSet and rhsSet, i.e. common = (a) - * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff = (b), rdiff = (c) - * 4. Apply the formula, get the optimized predict: common (ldiff || rdiff) - */ -val lhsSet = splitConjunctivePredicates(left).toSet -val rhsSet = splitConjunctivePredicates(right).toSet -val common = lhsSet.intersect(rhsSet) -val ldiff = lhsSet.diff(common) -val rdiff = rhsSet.diff(common) -if ( ldiff.size == 0 || rdiff.size == 0) { - // a || (b a) = a - common.reduce(And) -} else { - // (a b c ...) || (a b d ...) || (a b e ...) ... = - // a b ((c ...) || (d ...) || (e ...) || ...) - (ldiff.reduceOption(And) ++ rdiff.reduceOption(And)) -.reduceOption(Or) -.map(_ :: common.toList) -.getOrElse(common.toList) -.reduce(And) -} -} - - case not @ Not(exp) = -exp match { - case Literal(true, BooleanType) = Literal(false) - case Literal(false, BooleanType) = Literal(true) - case GreaterThan(l, r) = LessThanOrEqual(l, r) - case GreaterThanOrEqual(l, r) = LessThan(l, r) - case LessThan(l, r) = GreaterThanOrEqual(l, r) - case LessThanOrEqual(l, r) = GreaterThan(l, r) - case Not(e) = e - case _ = not -} - - // Turn if (true) a else b into a, and if (false) a else b into b. + case and @ And(left, right) = (left, right) match { +// true r = r +case (Literal(true, BooleanType), r) = r +
[GitHub] spark pull request: [SPARK-5279][SQL] Use java.math.BigDecimal as ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4092#issuecomment-70394229 [Test build #25713 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25713/consoleFull) for PR 4092 at commit [`10cb496`](https://github.com/apache/spark/commit/10cb496ad55417c8db2b7a6058cae623353f83ca). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4020#issuecomment-70395176 LGTM - @sryza and @ksakellis look okay to you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Added SparkGCE Script for Version 0.9.1
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/681#issuecomment-70395313 This is being maintained in it's own package now, so let's close this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5285][SQL] Removed GroupExpression in c...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4075#issuecomment-70390298 I don't really see the benefit of removing it. Transform should be able to walk all expression, even if there are no optimizations that apply today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/4090 [SQL][Minor] Added comments and examples to explain BooleanSimplification You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark booleanSimplification Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4090.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4090 commit 68c89866962b836f479a3fc41fbd503f8bc7ff47 Author: Reynold Xin r...@databricks.com Date: 2015-01-18T00:10:20Z [SQL][Minor] Added comments and examples to explain BooleanSimplification. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4090#issuecomment-70390579 [Test build #25709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25709/consoleFull) for PR 4090 at commit [`68c8986`](https://github.com/apache/spark/commit/68c89866962b836f479a3fc41fbd503f8bc7ff47). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4090#issuecomment-70390536 cc @chenglian @scwf --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/4090#issuecomment-70390994 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-70393196 [Test build #25710 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25710/consoleFull) for PR 3519 at commit [`ce0e30c`](https://github.com/apache/spark/commit/ce0e30c50d7b55c1aa598a0d1b49e2e9beff94a9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class IsotonicRegressionModel (` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-70393197 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25710/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5279][SQL] Use java.math.BigDecimal as ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4092#issuecomment-70394459 [Test build #25713 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25713/consoleFull) for PR 4092 at commit [`10cb496`](https://github.com/apache/spark/commit/10cb496ad55417c8db2b7a6058cae623353f83ca). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5279][SQL] Use java.math.BigDecimal as ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4092#issuecomment-70394460 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25713/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70399407 BTW - my apologies for marking this as a starter task, it turned out to be more complicated. We can credit you for having worked on the feature as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5285][SQL] Removed GroupExpression in c...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/4075#issuecomment-70390780 Yes, i agree. Then should i add the unit test i give above? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4090 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70394250 Hey @ilganeli - I took a slightly deeper look this time. I still don't totally follow how this all hooks together, but I wonder if it's possible to write a single utility function that is much simpler. It would just do the following: ``` /** * Given an object reference, recursively traverses all fields of the reference, * fields of objects within those fields, and so on. If any of those references * are neither Serializable nor Externalizable, prints the path from the root object * to the reference. */ def findNonSerailizableReferences(root: AnyRef): String { } ``` And it would do something like: 1. Start with the root reference. 2. Traverse the graph of all referred-to objects, maintaining path information. Path information means both the sequence of parent pointers and the field name. 3. Check whether Serializable.class.isAssignableFrom(c) or Externalizable.class.isAssignableFrom(c) for any object encountered, where c is the class of the object. 4. When the first object that isn't serializable is encountered, print the path to that object. This wouldn't work for custom serializers, it would only work for the Java serializer. However, that's all we support for closure's anyways. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5208][DOC] Add more documentation to Ne...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4012#issuecomment-70395228 @sarutak when we added the netty shuffle we actually decided not to expose these in order to keep the overall # of configurations manageable. We couldn't think of a user scenario where these would make a large difference (correct me if that is wrong @aarondav). Did you have a specific use case in mind, or was this mostly for completeness reasons? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5217 Spark UI should report pending stag...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4043#issuecomment-70395195 @ScrapCodes mind bringing up to date? The current form LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2630 Input data size of CoalescedRDD cou...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2310 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3880] HBase as data source to SparkSQL
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4084 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Added --package argument to make-distributio...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3682 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Merge pull request #1 from apache/master
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4035 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Added SparkGCE Script for Version 0.9.1
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/681 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3997#issuecomment-70398840 [Test build #25714 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25714/consoleFull) for PR 3997 at commit [`93f0d46`](https://github.com/apache/spark/commit/93f0d461487f9582a6bc2a34f09179dbe8672d3d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5307] SerializationDebugger to help deb...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4093#issuecomment-70399174 [Test build #25716 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25716/consoleFull) for PR 4093 at commit [`bde6512`](https://github.com/apache/spark/commit/bde6512a55765a48ca74f321068f9ab91516edae). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5307] SerializationDebugger to help deb...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/4093#discussion_r23131447 --- Diff: core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala --- @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.serializer + +import java.io._ +import java.lang.reflect.Field +import java.security.AccessController + +import scala.collection.mutable +import scala.util.control.NonFatal + + +private[serializer] +object SerializationDebugger { + + /** + * Write an object to the [[ObjectOutputStream]]. If a NotSerializableException is encountered, + * use our debug stream to capture the serialization stack leading to the problematic object. + */ + def writeObject(out: ObjectOutputStream, obj: Any): Unit = { +try { + out.writeObject(obj) +} catch { + case e: NotSerializableException = +if (enableDebugging) throw improveException(obj, e) else throw e +} + } + + /** + * Improve the given NotSerializableException with the serialization stack leading from the given + * object to the problematic object. + */ + private def improveException(obj: Any, e: NotSerializableException): NotSerializableException = { +if (depthField != null) { + val out = new DebugStream(new ByteArrayOutputStream) + try { +out.writeObject(obj) +e + } catch { +case nse: NotSerializableException = + new NotSerializableException( +nse.getMessage + \n + +s\tSerialization stack (${out.stack.size}):\n + +out.stack.map(o = s\t- $o (class ${o.getClass.getName})).mkString(\n) + \n + +\tRun the JVM with sun.io.serialization.extendedDebugInfo for more information.) --- End diff -- It is actually -Dsun.io.serialization.extendedDebugInfo=true. Kinda long ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5289]: Backport publishing of repl, yar...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4079#issuecomment-70390243 @vanzin I tried to cover it in #4080 - but basically there were changes you made that were anyways being requested by others in the community (asking us to publish these to Maven, which we did prior to 1.2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4074#discussion_r23129615 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala --- @@ -436,6 +436,12 @@ trait JavaRDDLike[T, This : JavaRDDLike[T, This]] extends Serializable { def first(): T = rdd.first() /** + * @return true if and only if the RDD contains no elements at all. Note that an RDD + * may be empty even when it has at least 1 partition. + */ + def isEmpty(): Boolean = rdd.isEmpty() --- End diff -- So this is actually a legitimate API break _if_ we think users are themselves extending `JavaRDDLike` trait, because it will add a method to the associated interface. One option is to just do it and ask users not to write code that directly accepts or extends `JavaRDDLike`, and maybe we could document that in the JavaDoc. Another option is just to add this to the concrete implementations in JavaRDD and JavaPairRDD. @JoshRosen, any thoughts one way or the other? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3880] HBase as data source to SparkSQL
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4084#issuecomment-70390848 Also one thing that would help is if you could create a standalone project for this on github (see spark-avro). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/4090#discussion_r23129765 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -302,89 +302,100 @@ object OptimizeIn extends Rule[LogicalPlan] { object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan = q transformExpressionsUp { - case and @ And(left, right) = -(left, right) match { - case (Literal(true, BooleanType), r) = r - case (l, Literal(true, BooleanType)) = l - case (Literal(false, BooleanType), _) = Literal(false) - case (_, Literal(false, BooleanType)) = Literal(false) - // a a = a - case (l, r) if l fastEquals r = l - case (_, _) = -/* Do optimize for predicates using formula (a || b) (a || c) = a || (b c) - * 1. Split left and right to get the disjunctive predicates, - *i.e. lhsSet = (a, b), rhsSet = (a, c) - * 2. Find the common predict between lhsSet and rhsSet, i.e. common = (a) - * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff = (b), rdiff = (c) - * 4. Apply the formula, get the optimized predict: common || (ldiff rdiff) - */ -val lhsSet = splitDisjunctivePredicates(left).toSet -val rhsSet = splitDisjunctivePredicates(right).toSet -val common = lhsSet.intersect(rhsSet) -val ldiff = lhsSet.diff(common) -val rdiff = rhsSet.diff(common) -if (ldiff.size == 0 || rdiff.size == 0) { - // a (a || b) = a - common.reduce(Or) -} else { - // (a || b || c || ...) (a || b || d || ...) (a || b || e || ...) ... = - // (a || b) || ((c || ...) (f || ...) (e || ...) ...) - (ldiff.reduceOption(Or) ++ rdiff.reduceOption(Or)) -.reduceOption(And) -.map(_ :: common.toList) -.getOrElse(common.toList) -.reduce(Or) -} -} - - case or @ Or(left, right) = -(left, right) match { - case (Literal(true, BooleanType), _) = Literal(true) - case (_, Literal(true, BooleanType)) = Literal(true) - case (Literal(false, BooleanType), r) = r - case (l, Literal(false, BooleanType)) = l - // a || a = a - case (l, r) if l fastEquals r = l - case (_, _) = -/* Do optimize for predicates using formula (a b) || (a c) = a (b || c) - * 1. Split left and right to get the conjunctive predicates, - *i.e. lhsSet = (a, b), rhsSet = (a, c) - * 2. Find the common predict between lhsSet and rhsSet, i.e. common = (a) - * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff = (b), rdiff = (c) - * 4. Apply the formula, get the optimized predict: common (ldiff || rdiff) - */ -val lhsSet = splitConjunctivePredicates(left).toSet -val rhsSet = splitConjunctivePredicates(right).toSet -val common = lhsSet.intersect(rhsSet) -val ldiff = lhsSet.diff(common) -val rdiff = rhsSet.diff(common) -if ( ldiff.size == 0 || rdiff.size == 0) { - // a || (b a) = a - common.reduce(And) -} else { - // (a b c ...) || (a b d ...) || (a b e ...) ... = - // a b ((c ...) || (d ...) || (e ...) || ...) - (ldiff.reduceOption(And) ++ rdiff.reduceOption(And)) -.reduceOption(Or) -.map(_ :: common.toList) -.getOrElse(common.toList) -.reduce(And) -} -} - - case not @ Not(exp) = -exp match { - case Literal(true, BooleanType) = Literal(false) - case Literal(false, BooleanType) = Literal(true) - case GreaterThan(l, r) = LessThanOrEqual(l, r) - case GreaterThanOrEqual(l, r) = LessThan(l, r) - case LessThan(l, r) = GreaterThanOrEqual(l, r) - case LessThanOrEqual(l, r) = GreaterThan(l, r) - case Not(e) = e - case _ = not -} - - // Turn if (true) a else b into a, and if (false) a else b into b. + case and @ And(left, right) = (left, right) match { +// true r = r +case (Literal(true, BooleanType), r) = r +