date:20150117

[GitHub] spark pull request: [SPARK-5198][Mesos] Change executorId more uni...

2015-01-17 Thread jongyoul

Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/3994#issuecomment-70358767
  
@mateiz I think I don't know fine-grained mode how you intend to behave 
exactly. What help me to understand more? I don't know how multi executor break 
spark's intended behaviour.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][minor] Improved Row documentation.

2015-01-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4085


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5282][mllib]: RowMatrix easily gets int...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4069#issuecomment-70359385
  
  [Test build #25700 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25700/consoleFull)
 for   PR 4069 at commit 
[`e54e5c8`](https://github.com/apache/spark/commit/e54e5c8b23c2cc5ae066a68712169d5eb188f4f9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5198][Mesos] Change executorId more uni...

2015-01-17 Thread jongyoul

Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/3994#issuecomment-70359541
  
@tnachen 
- Slave page
![screen shot 2015-01-17 at 5 38 20 
pm](https://cloud.githubusercontent.com/assets/3612566/5788288/a87230de-9e6f-11e4-8e18-972d6b3b9204.png)
- Sandbox page
![screen shot 2015-01-17 at 5 33 15 
pm](https://cloud.githubusercontent.com/assets/3612566/5788290/a8d2b486-9e6f-11e4-9a41-32c72824d3cc.png)
- stderr 
![screen shot 2015-01-17 at 5 33 30 
pm](https://cloud.githubusercontent.com/assets/3612566/5788289/a8a600c6-9e6f-11e4-902d-4d890ff67d89.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5198][Mesos] Change executorId more uni...

2015-01-17 Thread jongyoul

Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/3994#issuecomment-70359713
  
@tnachen And slave's logs around task 34, 63. It looks like that if any 
task occurs error while running, the executor running that task is terminated. 
Check this, please.

```
I0117 17:21:43.678827 41388 slave.cpp:625] Got assigned task 34 for 
framework 20150117-171023-3391097354-60030-7325-0004
I0117 17:21:43.679612 41388 slave.cpp:734] Launching task 34 for framework 
20150117-171023-3391097354-60030-7325-0004
I0117 17:21:43.721297 41388 slave.cpp:844] Queuing task '34' for executor 
20141110-112437-3374320138-60030-57359-44 of framework 
'20150117-171023-3391097354-60030-7325-0004
I0117 17:21:43.775977 41388 slave.cpp:358] Successfully attached file 
'/data03/mesos/slaves/20141110-112437-3374320138-60030-57359-44/frameworks/20150117-171023-3391097354-60030-7325-0004/executors/20141110-112437-3374320138-60030-57359-44/runs/3fdbdd09-98cd-4197-954f-d95d9b3b4aee'
I0117 17:21:43.721451 41386 mesos_containerizer.cpp:407] Starting container 
'3fdbdd09-98cd-4197-954f-d95d9b3b4aee' for executor 
'20141110-112437-3374320138-60030-57359-44' of framework 
'20150117-171023-3391097354-60030-7325-0004'
I0117 17:21:43.777179 41386 mesos_containerizer.cpp:528] Fetching URIs for 
container '3fdbdd09-98cd-4197-954f-d95d9b3b4aee' using command '/usr/bin/env 
MESOS_EXECUTOR_URIS=hdfs:///app/spark/spark-1.3.0-SNAPSHOT-bin-2.3.0-cdh5.0.1.tgz+0X
 
MESOS_WORK_DIRECTORY=/data03/mesos/slaves/20141110-112437-3374320138-60030-57359-44/frameworks/20150117-171023-3391097354-60030-7325-0004/executors/20141110-112437-3374320138-60030-57359-44/runs/3fdbdd09-98cd-4197-954f-d95d9b3b4aee
 HADOOP_HOME=/app/hdfs/ /app/mesos-0.18.1/libexec/mesos/mesos-fetcher'
I0117 17:22:28.863304 41374 slave.cpp:2523] Current usage 44.85%. Max 
allowed age: 3.160841566048842days
I0117 17:22:38.472086 41384 slave.cpp:625] Got assigned task 63 for 
framework 20150117-171023-3391097354-60030-7325-0004
I0117 17:22:38.472584 41384 slave.cpp:734] Launching task 63 for framework 
20150117-171023-3391097354-60030-7325-0004
I0117 17:22:38.472801 41384 slave.cpp:844] Queuing task '63' for executor 
20141110-112437-3374320138-60030-57359-44 of framework 
'20150117-171023-3391097354-60030-7325-0004
I0117 17:22:43.721726 41370 slave.cpp:2475] Terminating executor 
20141110-112437-3374320138-60030-57359-44 of framework 
20150117-171023-3391097354-60030-7325-0004 because it did not register within 
1mins
I0117 17:22:43.722038 41378 mesos_containerizer.cpp:818] Destroying 
container '3fdbdd09-98cd-4197-954f-d95d9b3b4aee'
I0117 17:22:43.722295 41378 slave.cpp:2052] Executor 
'20141110-112437-3374320138-60030-57359-44' of framework 
20150117-171023-3391097354-60030-7325-0004 has terminated with unknown status
E0117 17:22:43.722744 41376 slave.cpp:2332] Failed to unmonitor container 
for executor 20141110-112437-3374320138-60030-57359-44 of framework 
20150117-171023-3391097354-60030-7325-0004: Not monitored
I0117 17:22:43.737566 41378 slave.cpp:1669] Handling status update 
TASK_LOST (UUID: cc571b34-e161-4ec9-bcf1-033bd967209f) for task 34 of framework 
20150117-171023-3391097354-60030-7325-0004 from @0.0.0.0:0
I0117 17:22:43.737829 41378 slave.cpp:3142] Terminating task 34
I0117 17:22:43.738701 41372 status_update_manager.cpp:315] Received status 
update TASK_LOST (UUID: cc571b34-e161-4ec9-bcf1-033bd967209f) for task 34 of 
framework 20150117-171023-3391097354-60030-7325-0004
I0117 17:22:43.739341 41378 slave.cpp:1669] Handling status update 
TASK_LOST (UUID: f198e879-a762-4cce-97ff-5261cb4ff820) for task 63 of framework 
20150117-171023-3391097354-60030-7325-0004 from @0.0.0.0:0
I0117 17:22:43.739398 41372 status_update_manager.cpp:494] Creating 
StatusUpdate stream for task 34 of framework 
20150117-171023-3391097354-60030-7325-0004
I0117 17:22:43.739542 41378 slave.cpp:3142] Terminating task 63
I0117 17:22:43.739869 41372 status_update_manager.cpp:368] Forwarding 
status update TASK_LOST (UUID: cc571b34-e161-4ec9-bcf1-033bd967209f) for task 
34 of framework 20150117-171023-3391097354-60030-7325-0004 to 
master@10.10.32.202:60030
I0117 17:22:43.740393 41372 status_update_manager.cpp:315] Received status 
update TASK_LOST (UUID: f198e879-a762-4cce-97ff-5261cb4ff820) for task 63 of 
framework 20150117-171023-3391097354-60030-7325-0004
I0117 17:22:43.740411 41384 slave.cpp:1789] Status update manager 
successfully handled status update TASK_LOST (UUID: 
cc571b34-e161-4ec9-bcf1-033bd967209f) for task 34 of framework 
20150117-171023-3391097354-60030-7325-0004
I0117 17:22:43.740573 41372 status_update_manager.cpp:494] Creating 
StatusUpdate stream for task 63 of framework 
20150117-171023-3391097354-60030-7325-0004
I0117 17:22:43.740892 41372 status_update_manager.cpp:368] Forwarding 
status update TASK_LOST (UUID: f198e879-a762-4cce-97ff

[GitHub] spark pull request: [SPARK-3880] HBase as data source to SparkSQL

2015-01-17 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on the pull request:

https://github.com/apache/spark/pull/4084#issuecomment-70359819
  
and we need check the coding styles.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5282][mllib]: RowMatrix easily gets int...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4069#issuecomment-70360796
  
  [Test build #25700 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25700/consoleFull)
 for   PR 4069 at commit 
[`e54e5c8`](https://github.com/apache/spark/commit/e54e5c8b23c2cc5ae066a68712169d5eb188f4f9).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5282][mllib]: RowMatrix easily gets int...

2015-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4069#issuecomment-70360799
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25700/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5285][SQL] Removed GroupExpression in c...

2015-01-17 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/4075#issuecomment-70361176
  
The `GroupExpression` is not used as transformation in `Analyzer`, but in 
`Optimizer`, that's why it still can pass the unit test. I should document this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5285][SQL] Removed GroupExpression in c...

2015-01-17 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/4075#issuecomment-70362210
  
@chenghao-intelï¼ do we need to add unit test fot this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...

2015-01-17 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4074#issuecomment-70362340
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4074#issuecomment-70362399
  
  [Test build #25701 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25701/consoleFull)
 for   PR 4074 at commit 
[`d76f8e3`](https://github.com/apache/spark/commit/d76f8e3cbe10f2ed5239281d6098d619640368d5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4074#issuecomment-70364288
  
  [Test build #25701 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25701/consoleFull)
 for   PR 4074 at commit 
[`d76f8e3`](https://github.com/apache/spark/commit/d76f8e3cbe10f2ed5239281d6098d619640368d5).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...

2015-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4074#issuecomment-70364290
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25701/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5219][Core] Add locks to avoid scheduli...

2015-01-17 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/4019#issuecomment-70365375
  
These methods are called in threads of 
`TaskResultGetter.getTaskResultExecutor`. And they access variables such as 
`isZombie`, `taskInfos` in `TaskSetManager`, which also are used in 
`TaskSchedulerImpl`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-01-17 Thread selvinsource

Github user selvinsource commented on the pull request:

https://github.com/apache/spark/pull/3062#issuecomment-70366247
  
@jkbradley 
I use JPMML to verify the exported model produces the same results, here 
the details of my tests:
https://github.com/selvinsource/spark-pmml-exporter-validator


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...

2015-01-17 Thread ash211

Github user ash211 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4074#discussion_r23125261
  
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -545,6 +546,12 @@ class RDDSuite extends FunSuite with 
SharedSparkContext {
 assert(sortedTopK === nums.sorted(ord).take(5))
   }
 
+  test(isEmpty) {
+assert(sc.emptyRDD.isEmpty())
+assert(sc.parallelize(Seq[Int]()).isEmpty())
+assert(!sc.parallelize(Seq(1)).isEmpty())
--- End diff --

I don't think this tests the case where there are multiple partitions but 
no data in any of the partitions.  Maybe add something like

`assert(sc.parallelize(Seq(1,2,3), 3).filter(_  0).isEmpty())`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Comment for the newly optimi...

2015-01-17 Thread scwf

GitHub user scwf opened a pull request:

https://github.com/apache/spark/pull/4086

[SPARK-4937][SQL] Comment for the newly optimization rules in 
`BooleanSimplification`

Follow up of #3778 
/cc @rxin 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scwf/spark commentforspark-4937

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4086.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4086


commit 2d3406e63dfd8e527fd2f6ed9fc27cc342a51459
Author: scwf wangf...@huawei.com
Date:   2015-01-17T14:33:07Z

added comment for spark-4937

commit aaf89f64333d2a9692a1068d0165c36128744d42
Author: scwf wangf...@huawei.com
Date:   2015-01-17T14:34:57Z

code style issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Comment for the newly optimi...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4086#issuecomment-70369529
  
  [Test build #25702 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25702/consoleFull)
 for   PR 4086 at commit 
[`aaf89f6`](https://github.com/apache/spark/commit/aaf89f64333d2a9692a1068d0165c36128744d42).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-17 Thread MechCoder

Github user MechCoder commented on the pull request:

https://github.com/apache/spark/pull/4073#issuecomment-70369703
  
Could you please tell me what is the preferred way to generate random data 
in spark?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70370192
  
  [Test build #25703 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25703/consoleFull)
 for   PR 4014 at commit 
[`ab22f7b`](https://github.com/apache/spark/commit/ab22f7b55988ba324e14969c89d8edfe4d663504).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4073#issuecomment-70370361
  
  [Test build #25704 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25704/consoleFull)
 for   PR 4073 at commit 
[`a7bfc70`](https://github.com/apache/spark/commit/a7bfc70e4382efeee83e2657844e20b3b9f60448).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....

2015-01-17 Thread hhbyyh

Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/3997#discussion_r23125961
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -449,6 +461,37 @@ class SparseVector(
   override def toString: String =
 (%s,%s,%s).format(size, indices.mkString([, ,, ]), 
values.mkString([, ,, ]))
 
+  override def equals(other: Any): Boolean = {
+other match {
+  case v: SparseVector = {
+if (this.size != v.size) { return false }
+val thisValues = this.values
+val thisIndices = this.indices
+val thisSize = thisValues.size
+val otherValues = v.values
+val otherIndices = v.indices
+val otherSize = otherValues.size
+
+var k1 = 0
+var k2 = 0
+var allEqual = true
+while (allEqual) {
+  while (k1  thisSize  thisValues(k1) == 0) k1 += 1
+  while (k2  otherSize  otherValues(k2) == 0) k2 += 1
+
+  if (k1 = thisSize || k2 = otherSize) {
+return k1 = thisSize  k2 = otherSize // check end alignment
+  }
+  allEqual = thisIndices(k1) == otherIndices(k2)  thisValues(k1) 
== otherValues(k2)
+  k1 += 1
+  k2 += 1
+}
+allEqual
+  }
+  case _ = super.equals(other)
+}
+  }
+
--- End diff --

yes, and I found the sparse vs dense is actually quite similar to sparse vs 
sparse, I'm trying to unify them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Comment for the newly optimi...

2015-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4086#issuecomment-70371913
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25702/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Comment for the newly optimi...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4086#issuecomment-70371911
  
  [Test build #25702 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25702/consoleFull)
 for   PR 4086 at commit 
[`aaf89f6`](https://github.com/apache/spark/commit/aaf89f64333d2a9692a1068d0165c36128744d42).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70372569
  
  [Test build #25703 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25703/consoleFull)
 for   PR 4014 at commit 
[`ab22f7b`](https://github.com/apache/spark/commit/ab22f7b55988ba324e14969c89d8edfe4d663504).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70372572
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25703/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4073#issuecomment-70372786
  
  [Test build #25704 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25704/consoleFull)
 for   PR 4073 at commit 
[`a7bfc70`](https://github.com/apache/spark/commit/a7bfc70e4382efeee83e2657844e20b3b9f60448).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4073#issuecomment-70372791
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25704/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...

2015-01-17 Thread leahmcguire

GitHub user leahmcguire opened a pull request:

https://github.com/apache/spark/pull/4087

[SPARK-4894][mllib] Added Bernoulli option to NaiveBayes model in mllib

Added optional model type parameter for  NaiveBayes training. Can be either 
Multinomial or Bernoulli. 

When Bernoulli is given the Bernoulli smoothing is used for fitting and for 
prediction as per: 
http://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html.

 Default for model is original Multinomial fit and predict.

Added additional testing for Bernoulli and Multinomial models.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/leahmcguire/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4087.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4087


commit ce73c63e8bac40b02ae0a8147c3b424783f6094a
Author: leahmcguire lmcgu...@salesforce.com
Date:   2015-01-16T16:06:06Z

added Bernoulli option to niave bayes model in mllib, added optional model 
type parameter for training. When Bernoulli is given the Bernoulli smoothing is 
used for fitting and for prediction 
http://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...

2015-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4087#issuecomment-70373574
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-17 Thread MechCoder

Github user MechCoder commented on the pull request:

https://github.com/apache/spark/pull/4073#issuecomment-70375368
  
@jkbradley I've added a test according to the other tests in the 
`RandomForestSuite` . Let me know if there is anything left.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4073#issuecomment-70375392
  
  [Test build #25705 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25705/consoleFull)
 for   PR 4073 at commit 
[`d1df1b2`](https://github.com/apache/spark/commit/d1df1b2df9b76e94abf95182fb47902b2740e6d3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...

2015-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3564#issuecomment-70382464
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25706/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...

2015-01-17 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/3564#issuecomment-70381703
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib] [SPARK-5301] Missing conversions and o...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4089#issuecomment-70389330
  
  [Test build #25708 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25708/consoleFull)
 for   PR 4089 at commit 
[`cb10ae5`](https://github.com/apache/spark/commit/cb10ae5a36be7d942e74005ed22610287e3059eb).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib] [SPARK-5301] Missing conversions and o...

2015-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4089#issuecomment-70389332
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25708/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4073#issuecomment-70378420
  
  [Test build #25705 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25705/consoleFull)
 for   PR 4073 at commit 
[`d1df1b2`](https://github.com/apache/spark/commit/d1df1b2df9b76e94abf95182fb47902b2740e6d3).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5198][Mesos] Change executorId more uni...

2015-01-17 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/3994#issuecomment-70388679
  
From the logs it is indeed hit the executor registeration timeout (1 
minutes), so Mesos terminated the task.

I don't think changing the executor Id fixes this problem, and isn't 
necessary I think.

Can you try changing the timeout via slave flags to a longer time and try 
again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4073#issuecomment-70378423
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25705/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5285][SQL] Removed GroupExpression in c...

2015-01-17 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/4075#issuecomment-70379577
  
I think there still could be a unit test to make sure that things in 
GroupExpressions get optimized.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3564#issuecomment-70381864
  
  [Test build #25706 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25706/consoleFull)
 for   PR 3564 at commit 
[`f697a55`](https://github.com/apache/spark/commit/f697a5523dd96629e2502ba61c76f9e4717b858e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3974][MLlib] Distributed Block Matrix A...

2015-01-17 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3200#discussion_r23128478
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.distributed
+
+import breeze.linalg.{DenseMatrix = BDM}
+
+import org.apache.spark._
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.rdd.RDDFunctions._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * A grid partitioner, which stores every block in a separate partition.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of the matrix.
+ * @param numColBlocks Number of blocks that form the columns of the 
matrix.
+ * @param rowPerBlock Number of rows that make up each block.
+ * @param colPerBlock Number of columns that make up each block.
+ */
+private[mllib] class GridPartitioner(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rowPerBlock: Int,
+val colPerBlock: Int,
+override val numPartitions: Int) extends Partitioner {
+
+  /**
+   * Returns the index of the partition the SubMatrix belongs to.
+   *
+   * @param key The key for the SubMatrix. Can be its position in the grid 
(its column major index)
+   *or a tuple of three integers that are the final row index 
after the multiplication,
+   *the index of the block to multiply with, and the final 
column index after the
+   *multiplication.
+   * @return The index of the partition, which the SubMatrix belongs to.
+   */
+  override def getPartition(key: Any): Int = {
+key match {
+  case ind: (Int, Int) =
+Utils.nonNegativeMod(ind._1 + ind._2 * numRowBlocks, numPartitions)
+  case indices: (Int, Int, Int) =
+Utils.nonNegativeMod(indices._1 + indices._3 * numRowBlocks, 
numPartitions)
+  case _ =
+throw new IllegalArgumentException(Unrecognized key)
+}
+  }
+
+  /** Checks whether the partitioners have the same characteristics */
+  override def equals(obj: Any): Boolean = {
+obj match {
+  case r: GridPartitioner =
+(this.numPartitions == r.numPartitions)  (this.rowPerBlock == 
r.rowPerBlock) 
+  (this.colPerBlock == r.colPerBlock)
+  case _ =
+false
+}
+  }
+}
+
+/**
+ * Represents a distributed matrix in blocks of local matrices.
+ *
+ * @param numRowBlocks Number of blocks that form the rows of this matrix
+ * @param numColBlocks Number of blocks that form the columns of this 
matrix
+ * @param rdd The RDD of SubMatrices (local matrices) that form this matrix
+ */
+class BlockMatrix(
+val numRowBlocks: Int,
+val numColBlocks: Int,
+val rdd: RDD[((Int, Int), Matrix)]) extends DistributedMatrix with 
Logging {
+
+  type SubMatrix = ((Int, Int), Matrix) // ((blockRowIndex, 
blockColIndex), matrix)
+
+  /**
+   * Alternate constructor for BlockMatrix without the input of a 
partitioner. Will use a Grid
+   * Partitioner by default.
+   *
+   * @param numRowBlocks Number of blocks that form the rows of this matrix
+   * @param numColBlocks Number of blocks that form the columns of this 
matrix
+   * @param rdd The RDD of SubMatrices (local matrices) that form this 
matrix
+   * @param rowPerBlock Number of rows that make up each block.
+   * @param colPerBlock Number of columns that make up each block.
+   */
+  def this(
+  numRowBlocks: Int,
+  numColBlocks: Int,
+  rdd: RDD[((Int, Int), Matrix)],
+  rowPerBlock: Int,
+  colPerBlock: Int) = {
+this(numRowBlocks, numColBlocks, rdd)
+val part = new GridPartitioner(numRowBlocks, numColBlocks,

[GitHub] spark pull request: SPARK-5019 - GaussianMixtureModel exposes inst...

2015-01-17 Thread tgaloppo

GitHub user tgaloppo opened a pull request:

https://github.com/apache/spark/pull/4088

SPARK-5019 - GaussianMixtureModel exposes instances of MultivariateGauss...

This PR modifies GaussianMixtureModel to expose instances of 
MutlivariateGaussian rather than separate mean and covariance arrays.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgaloppo/spark spark-5019

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4088.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4088


commit 091e8da4913eacf28530ab7fb2bd6c39ab2cef4b
Author: Travis Galoppo tjg2...@columbia.edu
Date:   2015-01-16T16:06:57Z

SPARK-5019 - GaussianMixtureModel exposes instances of MultivariateGaussian 
rather than mean/covariance matrices




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5019 - GaussianMixtureModel exposes inst...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4088#issuecomment-70386105
  
  [Test build #25707 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25707/consoleFull)
 for   PR 4088 at commit 
[`091e8da`](https://github.com/apache/spark/commit/091e8da4913eacf28530ab7fb2bd6c39ab2cef4b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5285][SQL] Removed GroupExpression in c...

2015-01-17 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/4075#issuecomment-70386657
  
@marmbrus after my investigate i think it is very rare case we will 
optimize GroupExpressions. 
Should our optimization cover sql such as ```SELECT a, b, count(*) FROM T1 
GROUP BY a, b, 1+1 GROUPING SETS (1+1, a, (a, b), b, ())```? if not maybe we do 
not need to optimize it, then the change of this PR is safe.  



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib] [SPARK-5301] Missing conversions and o...

2015-01-17 Thread rezazadeh

GitHub user rezazadeh opened a pull request:

https://github.com/apache/spark/pull/4089

[MLlib] [SPARK-5301] Missing conversions and operations on IndexedRowMatrix 
and CoordinateMatrix

* Transpose is missing from CoordinateMatrix (this is cheap to compute, so 
it should be there)
* IndexedRowMatrix should be convertable to CoordinateMatrix (conversion 
added)

Tests for both added.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rezazadeh/spark matutils

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4089.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4089


commit a7ae0488f49117501506f88b10d8dc606d2207c6
Author: Reza Zadeh r...@databricks.com
Date:   2015-01-17T22:06:50Z

Missing linear algebra utilities

commit cb10ae5a36be7d942e74005ed22610287e3059eb
Author: Reza Zadeh r...@databricks.com
Date:   2015-01-17T22:11:27Z

remove unnecessary import




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib] [SPARK-5301] Missing conversions and o...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4089#issuecomment-70387085
  
  [Test build #25708 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25708/consoleFull)
 for   PR 4089 at commit 
[`cb10ae5`](https://github.com/apache/spark/commit/cb10ae5a36be7d942e74005ed22610287e3059eb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3564#issuecomment-70382458
  
  [Test build #25706 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25706/consoleFull)
 for   PR 3564 at commit 
[`f697a55`](https://github.com/apache/spark/commit/f697a5523dd96629e2502ba61c76f9e4717b858e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5198][Mesos] Change executorId more uni...

2015-01-17 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/3994#issuecomment-70388163
  
@jongyoul the goal of fine-grained mode is to run many Spark tasks in the 
same executor, which is why we're giving them all the same executor ID. Mesos 
supports this in its concept of executors, and it has the benefit that Mesos 
can account for the CPUs used by each task separately and give those CPUs to 
other frameworks when Spark is not active. In contrast, coarse-grained mode 
reserves the CPUs on the machine for the whole lifetime of the executor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5019 - GaussianMixtureModel exposes inst...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4088#issuecomment-70388528
  
  [Test build #25707 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25707/consoleFull)
 for   PR 4088 at commit 
[`091e8da`](https://github.com/apache/spark/commit/091e8da4913eacf28530ab7fb2bd6c39ab2cef4b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5019 - GaussianMixtureModel exposes inst...

2015-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4088#issuecomment-70388533
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25707/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Comment for the newly optimi...

2015-01-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4086


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Comment for the newly optimi...

2015-01-17 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4086#issuecomment-70389973
  
Merging in master. I will submit a PR to update the description to make it 
more clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3880] HBase as data source to SparkSQL

2015-01-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4084#issuecomment-70390837
  
Hi - thanks for working on this... it looks interesting. I'd like to close 
this issue (i.e. the PR) and discuss more on the JIRA/dev list rather than 
having a big pull request like this. For very large features this is the way we 
do it. If you look on your wiki it says If you are proposing a larger change, 
attach a design document to your JIRA first (example) and email the dev mailing 
list to discuss it.

https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...

2015-01-17 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/4090#discussion_r23129926
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -302,89 +302,100 @@ object OptimizeIn extends Rule[LogicalPlan] {
 object BooleanSimplification extends Rule[LogicalPlan] with 
PredicateHelper {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan = q transformExpressionsUp {
-  case and @ And(left, right) =
-(left, right) match {
-  case (Literal(true, BooleanType), r) = r
-  case (l, Literal(true, BooleanType)) = l
-  case (Literal(false, BooleanType), _) = Literal(false)
-  case (_, Literal(false, BooleanType)) = Literal(false)
-  // a  a = a
-  case (l, r) if l fastEquals r = l
-  case (_, _) =
-/* Do optimize for predicates using formula (a || b)  (a || 
c) = a || (b  c)
- * 1. Split left and right to get the disjunctive predicates,
- *i.e. lhsSet = (a, b), rhsSet = (a, c)
- * 2. Find the common predict between lhsSet and rhsSet, i.e. 
common = (a)
- * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff 
= (b), rdiff = (c)
- * 4. Apply the formula, get the optimized predict: common || 
(ldiff  rdiff)
- */
-val lhsSet = splitDisjunctivePredicates(left).toSet
-val rhsSet = splitDisjunctivePredicates(right).toSet
-val common = lhsSet.intersect(rhsSet)
-val ldiff = lhsSet.diff(common)
-val rdiff = rhsSet.diff(common)
-if (ldiff.size == 0 || rdiff.size == 0) {
-  // a  (a || b) = a
-  common.reduce(Or)
-} else {
-  // (a || b || c || ...)  (a || b || d || ...)  (a || b 
|| e || ...) ... =
-  // (a || b) || ((c || ...)  (f || ...)  (e || ...)  
...)
-  (ldiff.reduceOption(Or) ++ rdiff.reduceOption(Or))
-.reduceOption(And)
-.map(_ :: common.toList)
-.getOrElse(common.toList)
-.reduce(Or)
-}
-}
-
-  case or @ Or(left, right) =
-(left, right) match {
-  case (Literal(true, BooleanType), _) = Literal(true)
-  case (_, Literal(true, BooleanType)) = Literal(true)
-  case (Literal(false, BooleanType), r) = r
-  case (l, Literal(false, BooleanType)) = l
-  // a || a = a
-  case (l, r) if l fastEquals r = l
-  case (_, _) =
-/* Do optimize for predicates using formula (a  b) || (a  
c) = a  (b || c)
- * 1. Split left and right to get the conjunctive predicates,
- *i.e.  lhsSet = (a, b), rhsSet = (a, c)
- * 2. Find the common predict between lhsSet and rhsSet, i.e. 
common = (a)
- * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff 
= (b), rdiff = (c)
- * 4. Apply the formula, get the optimized predict: common  
(ldiff || rdiff)
- */
-val lhsSet = splitConjunctivePredicates(left).toSet
-val rhsSet = splitConjunctivePredicates(right).toSet
-val common = lhsSet.intersect(rhsSet)
-val ldiff = lhsSet.diff(common)
-val rdiff = rhsSet.diff(common)
-if ( ldiff.size == 0 || rdiff.size == 0) {
-  // a || (b  a) = a
-  common.reduce(And)
-} else {
-  // (a  b  c  ...) || (a  b  d  ...) || (a  b 
 e  ...) ... =
-  // a  b  ((c  ...) || (d  ...) || (e  ...) || ...)
-  (ldiff.reduceOption(And) ++ rdiff.reduceOption(And))
-.reduceOption(Or)
-.map(_ :: common.toList)
-.getOrElse(common.toList)
-.reduce(And)
-}
-}
-
-  case not @ Not(exp) =
-exp match {
-  case Literal(true, BooleanType) = Literal(false)
-  case Literal(false, BooleanType) = Literal(true)
-  case GreaterThan(l, r) = LessThanOrEqual(l, r)
-  case GreaterThanOrEqual(l, r) = LessThan(l, r)
-  case LessThan(l, r) = GreaterThanOrEqual(l, r)
-  case LessThanOrEqual(l, r) = GreaterThan(l, r)
-  case Not(e) = e
-  case _ = not
-}
-
-  // Turn if (true) a else b into a, and if (false) a else b into 
b.
+  case and @ And(left, right) = (left, right) match {
+// true  r  =  r
+case (Literal(true, BooleanType), r) = r
+

[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...

2015-01-17 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/4074#discussion_r23129922
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala 
---
@@ -436,6 +436,12 @@ trait JavaRDDLike[T, This : JavaRDDLike[T, This]] 
extends Serializable {
   def first(): T = rdd.first()
 
   /**
+   * @return true if and only if the RDD contains no elements at all. Note 
that an RDD
+   * may be empty even when it has at least 1 partition.
+   */
+  def isEmpty(): Boolean = rdd.isEmpty()
--- End diff --

Okay sounds good @srowen want to just add an exclusion then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4877] Allow user first classes to exten...

2015-01-17 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3725#issuecomment-70392673
  
@holdenk @pwendell Can one of you review this, sign off, and commit?  I 
don't really have enough expertise here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2634#issuecomment-70394597
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25711/consoleFull)
 for   PR 2634 at commit 
[`35da8e9`](https://github.com/apache/spark/commit/35da8e9e188e66946d5799d061ecc3ca150f).
 * This patch **fails** unit tests.
 * This patch **does not** merge cleanly!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2634#issuecomment-70394598
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25711/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][Minor] Refactors deeply nested FP style ...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4091#issuecomment-70394771
  
  [Test build #25712 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25712/consoleFull)
 for   PR 4091 at commit 
[`e833ca4`](https://github.com/apache/spark/commit/e833ca4b7a108c053870ba03a013656556fd3d58).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][Minor] Refactors deeply nested FP style ...

2015-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4091#issuecomment-70394772
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25712/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5289]: Backport publishing of repl, yar...

2015-01-17 Thread pwendell

Github user pwendell closed the pull request at:

https://github.com/apache/spark/pull/4079


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5289]: Backport publishing of repl, yar...

2015-01-17 Thread pwendell

GitHub user pwendell reopened a pull request:

https://github.com/apache/spark/pull/4079

[SPARK-5289]: Backport publishing of repl, yarn into branch-1.2.

This change was done in SPARK-4048 as part of a larger refactoring,
but we need to backport this publishing of yarn and repl into Spark
1.2, so that we can cut a 1.2.1 release with these artifacts.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pwendell/spark skip-deps

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4079.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4079


commit 807b833680d433ada6f9fd0e262197ffa8de5f89
Author: Patrick Wendell patr...@databricks.com
Date:   2015-01-16T22:31:56Z

[SPARK-5289]: Backport publishing of repl, yarn into branch-1.2.

This change was done in SPARK-4048 as part of a larger refactoring,
but we need to backport this publishing of yarn and repl into Spark
1.2, so that we can cut a 1.2.1 release with these artifacts.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5289]: Backport publishing of repl, yar...

2015-01-17 Thread pwendell

Github user pwendell closed the pull request at:

https://github.com/apache/spark/pull/4079


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [HOTFIX]: Minor clean up regarding skipped art...

2015-01-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4080


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5249] Added type specific set functions...

2015-01-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4042#issuecomment-70390499
  
@JoshRosen or @srowen - what are your feelings on it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5249] Added type specific set functions...

2015-01-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4042#issuecomment-70390485
  
I'd actually prefer not to have this in Spark. It's not really clear what 
we will do with an `Any`, and the user can really easily just call `toString` 
explicitly. I also looked at two other similar constructs in Java (the Java 
Properties class and Hadoop's Configuration class) and none of them offer this 
type of interface. There are multiple language API's that have this `setConf` 
and they all require string keys and values, it's just a bit inconsistent to do 
this kind of implicit conversion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...

2015-01-17 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4090#discussion_r23129720
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -302,89 +302,100 @@ object OptimizeIn extends Rule[LogicalPlan] {
 object BooleanSimplification extends Rule[LogicalPlan] with 
PredicateHelper {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan = q transformExpressionsUp {
-  case and @ And(left, right) =
-(left, right) match {
-  case (Literal(true, BooleanType), r) = r
-  case (l, Literal(true, BooleanType)) = l
-  case (Literal(false, BooleanType), _) = Literal(false)
-  case (_, Literal(false, BooleanType)) = Literal(false)
-  // a  a = a
-  case (l, r) if l fastEquals r = l
-  case (_, _) =
-/* Do optimize for predicates using formula (a || b)  (a || 
c) = a || (b  c)
- * 1. Split left and right to get the disjunctive predicates,
- *i.e. lhsSet = (a, b), rhsSet = (a, c)
- * 2. Find the common predict between lhsSet and rhsSet, i.e. 
common = (a)
- * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff 
= (b), rdiff = (c)
- * 4. Apply the formula, get the optimized predict: common || 
(ldiff  rdiff)
- */
-val lhsSet = splitDisjunctivePredicates(left).toSet
-val rhsSet = splitDisjunctivePredicates(right).toSet
-val common = lhsSet.intersect(rhsSet)
-val ldiff = lhsSet.diff(common)
-val rdiff = rhsSet.diff(common)
-if (ldiff.size == 0 || rdiff.size == 0) {
-  // a  (a || b) = a
-  common.reduce(Or)
-} else {
-  // (a || b || c || ...)  (a || b || d || ...)  (a || b 
|| e || ...) ... =
-  // (a || b) || ((c || ...)  (f || ...)  (e || ...)  
...)
-  (ldiff.reduceOption(Or) ++ rdiff.reduceOption(Or))
-.reduceOption(And)
-.map(_ :: common.toList)
-.getOrElse(common.toList)
-.reduce(Or)
-}
-}
-
-  case or @ Or(left, right) =
-(left, right) match {
-  case (Literal(true, BooleanType), _) = Literal(true)
-  case (_, Literal(true, BooleanType)) = Literal(true)
-  case (Literal(false, BooleanType), r) = r
-  case (l, Literal(false, BooleanType)) = l
-  // a || a = a
-  case (l, r) if l fastEquals r = l
-  case (_, _) =
-/* Do optimize for predicates using formula (a  b) || (a  
c) = a  (b || c)
- * 1. Split left and right to get the conjunctive predicates,
- *i.e.  lhsSet = (a, b), rhsSet = (a, c)
- * 2. Find the common predict between lhsSet and rhsSet, i.e. 
common = (a)
- * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff 
= (b), rdiff = (c)
- * 4. Apply the formula, get the optimized predict: common  
(ldiff || rdiff)
- */
-val lhsSet = splitConjunctivePredicates(left).toSet
-val rhsSet = splitConjunctivePredicates(right).toSet
-val common = lhsSet.intersect(rhsSet)
-val ldiff = lhsSet.diff(common)
-val rdiff = rhsSet.diff(common)
-if ( ldiff.size == 0 || rdiff.size == 0) {
-  // a || (b  a) = a
-  common.reduce(And)
-} else {
-  // (a  b  c  ...) || (a  b  d  ...) || (a  b 
 e  ...) ... =
-  // a  b  ((c  ...) || (d  ...) || (e  ...) || ...)
-  (ldiff.reduceOption(And) ++ rdiff.reduceOption(And))
-.reduceOption(Or)
-.map(_ :: common.toList)
-.getOrElse(common.toList)
-.reduce(And)
-}
-}
-
-  case not @ Not(exp) =
-exp match {
-  case Literal(true, BooleanType) = Literal(false)
-  case Literal(false, BooleanType) = Literal(true)
-  case GreaterThan(l, r) = LessThanOrEqual(l, r)
-  case GreaterThanOrEqual(l, r) = LessThan(l, r)
-  case LessThan(l, r) = GreaterThanOrEqual(l, r)
-  case LessThanOrEqual(l, r) = GreaterThan(l, r)
-  case Not(e) = e
-  case _ = not
-}
-
-  // Turn if (true) a else b into a, and if (false) a else b into 
b.
+  case and @ And(left, right) = (left, right) match {
+// true  r  =  r
+case (Literal(true, BooleanType), r) = r
+

[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...

2015-01-17 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/4090#discussion_r23129994
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -302,89 +302,100 @@ object OptimizeIn extends Rule[LogicalPlan] {
 object BooleanSimplification extends Rule[LogicalPlan] with 
PredicateHelper {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan = q transformExpressionsUp {
-  case and @ And(left, right) =
-(left, right) match {
-  case (Literal(true, BooleanType), r) = r
-  case (l, Literal(true, BooleanType)) = l
-  case (Literal(false, BooleanType), _) = Literal(false)
-  case (_, Literal(false, BooleanType)) = Literal(false)
-  // a  a = a
-  case (l, r) if l fastEquals r = l
-  case (_, _) =
-/* Do optimize for predicates using formula (a || b)  (a || 
c) = a || (b  c)
- * 1. Split left and right to get the disjunctive predicates,
- *i.e. lhsSet = (a, b), rhsSet = (a, c)
- * 2. Find the common predict between lhsSet and rhsSet, i.e. 
common = (a)
- * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff 
= (b), rdiff = (c)
- * 4. Apply the formula, get the optimized predict: common || 
(ldiff  rdiff)
- */
-val lhsSet = splitDisjunctivePredicates(left).toSet
-val rhsSet = splitDisjunctivePredicates(right).toSet
-val common = lhsSet.intersect(rhsSet)
-val ldiff = lhsSet.diff(common)
-val rdiff = rhsSet.diff(common)
-if (ldiff.size == 0 || rdiff.size == 0) {
-  // a  (a || b) = a
-  common.reduce(Or)
-} else {
-  // (a || b || c || ...)  (a || b || d || ...)  (a || b 
|| e || ...) ... =
-  // (a || b) || ((c || ...)  (f || ...)  (e || ...)  
...)
-  (ldiff.reduceOption(Or) ++ rdiff.reduceOption(Or))
-.reduceOption(And)
-.map(_ :: common.toList)
-.getOrElse(common.toList)
-.reduce(Or)
-}
-}
-
-  case or @ Or(left, right) =
-(left, right) match {
-  case (Literal(true, BooleanType), _) = Literal(true)
-  case (_, Literal(true, BooleanType)) = Literal(true)
-  case (Literal(false, BooleanType), r) = r
-  case (l, Literal(false, BooleanType)) = l
-  // a || a = a
-  case (l, r) if l fastEquals r = l
-  case (_, _) =
-/* Do optimize for predicates using formula (a  b) || (a  
c) = a  (b || c)
- * 1. Split left and right to get the conjunctive predicates,
- *i.e.  lhsSet = (a, b), rhsSet = (a, c)
- * 2. Find the common predict between lhsSet and rhsSet, i.e. 
common = (a)
- * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff 
= (b), rdiff = (c)
- * 4. Apply the formula, get the optimized predict: common  
(ldiff || rdiff)
- */
-val lhsSet = splitConjunctivePredicates(left).toSet
-val rhsSet = splitConjunctivePredicates(right).toSet
-val common = lhsSet.intersect(rhsSet)
-val ldiff = lhsSet.diff(common)
-val rdiff = rhsSet.diff(common)
-if ( ldiff.size == 0 || rdiff.size == 0) {
-  // a || (b  a) = a
-  common.reduce(And)
-} else {
-  // (a  b  c  ...) || (a  b  d  ...) || (a  b 
 e  ...) ... =
-  // a  b  ((c  ...) || (d  ...) || (e  ...) || ...)
-  (ldiff.reduceOption(And) ++ rdiff.reduceOption(And))
-.reduceOption(Or)
-.map(_ :: common.toList)
-.getOrElse(common.toList)
-.reduce(And)
-}
-}
-
-  case not @ Not(exp) =
-exp match {
-  case Literal(true, BooleanType) = Literal(false)
-  case Literal(false, BooleanType) = Literal(true)
-  case GreaterThan(l, r) = LessThanOrEqual(l, r)
-  case GreaterThanOrEqual(l, r) = LessThan(l, r)
-  case LessThan(l, r) = GreaterThanOrEqual(l, r)
-  case LessThanOrEqual(l, r) = GreaterThan(l, r)
-  case Not(e) = e
-  case _ = not
-}
-
-  // Turn if (true) a else b into a, and if (false) a else b into 
b.
+  case and @ And(left, right) = (left, right) match {
+// true  r  =  r
+case (Literal(true, BooleanType), r) = r
+

[GitHub] spark pull request: [SPARK-5279][SQL] Use java.math.BigDecimal as ...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4092#issuecomment-70394229
  
  [Test build #25713 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25713/consoleFull)
 for   PR 4092 at commit 
[`10cb496`](https://github.com/apache/spark/commit/10cb496ad55417c8db2b7a6058cae623353f83ca).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...

2015-01-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4020#issuecomment-70395176
  
LGTM - @sryza  and @ksakellis  look okay to you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added SparkGCE Script for Version 0.9.1

2015-01-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/681#issuecomment-70395313
  
This is being maintained in it's own package now, so let's close this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5285][SQL] Removed GroupExpression in c...

2015-01-17 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/4075#issuecomment-70390298
  
I don't really see the benefit of removing it.  Transform should be able to 
walk all expression, even if there  are no optimizations that apply today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...

2015-01-17 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/4090

[SQL][Minor] Added comments and examples to explain BooleanSimplification



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark booleanSimplification

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4090.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4090


commit 68c89866962b836f479a3fc41fbd503f8bc7ff47
Author: Reynold Xin r...@databricks.com
Date:   2015-01-18T00:10:20Z

[SQL][Minor] Added comments and examples to explain BooleanSimplification.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4090#issuecomment-70390579
  
  [Test build #25709 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25709/consoleFull)
 for   PR 4090 at commit 
[`68c8986`](https://github.com/apache/spark/commit/68c89866962b836f479a3fc41fbd503f8bc7ff47).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...

2015-01-17 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4090#issuecomment-70390536
  
cc @chenglian @scwf


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...

2015-01-17 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/4090#issuecomment-70390994
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-70393196
  
  [Test build #25710 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25710/consoleFull)
 for   PR 3519 at commit 
[`ce0e30c`](https://github.com/apache/spark/commit/ce0e30c50d7b55c1aa598a0d1b49e2e9beff94a9).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class IsotonicRegressionModel (`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2015-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-70393197
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25710/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5279][SQL] Use java.math.BigDecimal as ...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4092#issuecomment-70394459
  
  [Test build #25713 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25713/consoleFull)
 for   PR 4092 at commit 
[`10cb496`](https://github.com/apache/spark/commit/10cb496ad55417c8db2b7a6058cae623353f83ca).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5279][SQL] Use java.math.BigDecimal as ...

2015-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4092#issuecomment-70394460
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25713/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3518#issuecomment-70399407
  
BTW - my apologies for marking this as a starter task, it turned out to be 
more complicated. We can credit you for having worked on the feature as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5285][SQL] Removed GroupExpression in c...

2015-01-17 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/4075#issuecomment-70390780
  
Yes, i agree.  Then should i add the unit test i give above?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...

2015-01-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4090


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2015-01-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3518#issuecomment-70394250
  
Hey @ilganeli - I took a slightly deeper look this time. I still don't 
totally follow how this all hooks together, but I wonder if it's possible to 
write a single utility function that is much simpler. It would just do the 
following:

```
/**
 * Given an object reference, recursively traverses all fields of the 
reference,
 * fields of objects within those fields, and so on. If any of those 
references
 * are neither Serializable nor Externalizable, prints the path from the 
root object
 * to the reference.
 */
def findNonSerailizableReferences(root: AnyRef): String {
  
}
```

And it would do something like:

1. Start with the root reference.
2. Traverse the graph of all referred-to objects, maintaining path 
information. Path information means both the sequence of parent pointers and 
the field name.
3. Check whether Serializable.class.isAssignableFrom(c) or 
Externalizable.class.isAssignableFrom(c) for any object encountered, where c is 
the class of the object.
4. When the first object that isn't serializable is encountered, print the 
path to that object.

This wouldn't work for custom serializers, it would only work for the Java 
serializer. However, that's all we support for closure's anyways.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5208][DOC] Add more documentation to Ne...

2015-01-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4012#issuecomment-70395228
  
@sarutak when we added the netty shuffle we actually decided not to expose 
these in order to keep the overall # of configurations manageable. We couldn't 
think of a user scenario where these would make a large difference (correct me 
if that is wrong @aarondav).

Did you have a specific use case in mind, or was this mostly for 
completeness reasons?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5217 Spark UI should report pending stag...

2015-01-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4043#issuecomment-70395195
  
@ScrapCodes mind bringing up to date? The current form LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2630 Input data size of CoalescedRDD cou...

2015-01-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2310


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3880] HBase as data source to SparkSQL

2015-01-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4084


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added --package argument to make-distributio...

2015-01-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3682


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Merge pull request #1 from apache/master

2015-01-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4035


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added SparkGCE Script for Version 0.9.1

2015-01-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/681


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3997#issuecomment-70398840
  
  [Test build #25714 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25714/consoleFull)
 for   PR 3997 at commit 
[`93f0d46`](https://github.com/apache/spark/commit/93f0d461487f9582a6bc2a34f09179dbe8672d3d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5307] SerializationDebugger to help deb...

2015-01-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4093#issuecomment-70399174
  
  [Test build #25716 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25716/consoleFull)
 for   PR 4093 at commit 
[`bde6512`](https://github.com/apache/spark/commit/bde6512a55765a48ca74f321068f9ab91516edae).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5307] SerializationDebugger to help deb...

2015-01-17 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4093#discussion_r23131447
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala ---
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.serializer
+
+import java.io._
+import java.lang.reflect.Field
+import java.security.AccessController
+
+import scala.collection.mutable
+import scala.util.control.NonFatal
+
+
+private[serializer]
+object SerializationDebugger {
+
+  /**
+   * Write an object to the [[ObjectOutputStream]]. If a 
NotSerializableException is encountered,
+   * use our debug stream to capture the serialization stack leading to 
the problematic object.
+   */
+  def writeObject(out: ObjectOutputStream, obj: Any): Unit = {
+try {
+  out.writeObject(obj)
+} catch {
+  case e: NotSerializableException =
+if (enableDebugging) throw improveException(obj, e) else throw e
+}
+  }
+
+  /**
+   * Improve the given NotSerializableException with the serialization 
stack leading from the given
+   * object to the problematic object.
+   */
+  private def improveException(obj: Any, e: NotSerializableException): 
NotSerializableException = {
+if (depthField != null) {
+  val out = new DebugStream(new ByteArrayOutputStream)
+  try {
+out.writeObject(obj)
+e
+  } catch {
+case nse: NotSerializableException =
+  new NotSerializableException(
+nse.getMessage + \n +
+s\tSerialization stack (${out.stack.size}):\n +
+out.stack.map(o = s\t- $o (class 
${o.getClass.getName})).mkString(\n) + \n +
+\tRun the JVM with sun.io.serialization.extendedDebugInfo for 
more information.)
--- End diff --

It is actually -Dsun.io.serialization.extendedDebugInfo=true. Kinda long 
...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5289]: Backport publishing of repl, yar...

2015-01-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4079#issuecomment-70390243
  
@vanzin I tried to cover it in #4080 - but basically there were changes you 
made that were anyways being requested by others in the community (asking us to 
publish these to Maven, which we did prior to 1.2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...

2015-01-17 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/4074#discussion_r23129615
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala 
---
@@ -436,6 +436,12 @@ trait JavaRDDLike[T, This : JavaRDDLike[T, This]] 
extends Serializable {
   def first(): T = rdd.first()
 
   /**
+   * @return true if and only if the RDD contains no elements at all. Note 
that an RDD
+   * may be empty even when it has at least 1 partition.
+   */
+  def isEmpty(): Boolean = rdd.isEmpty()
--- End diff --

So this is actually a legitimate API break _if_ we think users are 
themselves extending `JavaRDDLike` trait, because it will add a method to the 
associated interface. One option is to just do it and ask users not to write 
code that directly accepts or extends `JavaRDDLike`, and maybe we could 
document that in the JavaDoc. Another option is just to add this to the 
concrete implementations in JavaRDD and JavaPairRDD. @JoshRosen, any thoughts 
one way or the other?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3880] HBase as data source to SparkSQL

2015-01-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4084#issuecomment-70390848
  
Also one thing that would help is if you could create a standalone project 
for this on github (see spark-avro).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][Minor] Added comments and examples to ex...

2015-01-17 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/4090#discussion_r23129765
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -302,89 +302,100 @@ object OptimizeIn extends Rule[LogicalPlan] {
 object BooleanSimplification extends Rule[LogicalPlan] with 
PredicateHelper {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan = q transformExpressionsUp {
-  case and @ And(left, right) =
-(left, right) match {
-  case (Literal(true, BooleanType), r) = r
-  case (l, Literal(true, BooleanType)) = l
-  case (Literal(false, BooleanType), _) = Literal(false)
-  case (_, Literal(false, BooleanType)) = Literal(false)
-  // a  a = a
-  case (l, r) if l fastEquals r = l
-  case (_, _) =
-/* Do optimize for predicates using formula (a || b)  (a || 
c) = a || (b  c)
- * 1. Split left and right to get the disjunctive predicates,
- *i.e. lhsSet = (a, b), rhsSet = (a, c)
- * 2. Find the common predict between lhsSet and rhsSet, i.e. 
common = (a)
- * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff 
= (b), rdiff = (c)
- * 4. Apply the formula, get the optimized predict: common || 
(ldiff  rdiff)
- */
-val lhsSet = splitDisjunctivePredicates(left).toSet
-val rhsSet = splitDisjunctivePredicates(right).toSet
-val common = lhsSet.intersect(rhsSet)
-val ldiff = lhsSet.diff(common)
-val rdiff = rhsSet.diff(common)
-if (ldiff.size == 0 || rdiff.size == 0) {
-  // a  (a || b) = a
-  common.reduce(Or)
-} else {
-  // (a || b || c || ...)  (a || b || d || ...)  (a || b 
|| e || ...) ... =
-  // (a || b) || ((c || ...)  (f || ...)  (e || ...)  
...)
-  (ldiff.reduceOption(Or) ++ rdiff.reduceOption(Or))
-.reduceOption(And)
-.map(_ :: common.toList)
-.getOrElse(common.toList)
-.reduce(Or)
-}
-}
-
-  case or @ Or(left, right) =
-(left, right) match {
-  case (Literal(true, BooleanType), _) = Literal(true)
-  case (_, Literal(true, BooleanType)) = Literal(true)
-  case (Literal(false, BooleanType), r) = r
-  case (l, Literal(false, BooleanType)) = l
-  // a || a = a
-  case (l, r) if l fastEquals r = l
-  case (_, _) =
-/* Do optimize for predicates using formula (a  b) || (a  
c) = a  (b || c)
- * 1. Split left and right to get the conjunctive predicates,
- *i.e.  lhsSet = (a, b), rhsSet = (a, c)
- * 2. Find the common predict between lhsSet and rhsSet, i.e. 
common = (a)
- * 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff 
= (b), rdiff = (c)
- * 4. Apply the formula, get the optimized predict: common  
(ldiff || rdiff)
- */
-val lhsSet = splitConjunctivePredicates(left).toSet
-val rhsSet = splitConjunctivePredicates(right).toSet
-val common = lhsSet.intersect(rhsSet)
-val ldiff = lhsSet.diff(common)
-val rdiff = rhsSet.diff(common)
-if ( ldiff.size == 0 || rdiff.size == 0) {
-  // a || (b  a) = a
-  common.reduce(And)
-} else {
-  // (a  b  c  ...) || (a  b  d  ...) || (a  b 
 e  ...) ... =
-  // a  b  ((c  ...) || (d  ...) || (e  ...) || ...)
-  (ldiff.reduceOption(And) ++ rdiff.reduceOption(And))
-.reduceOption(Or)
-.map(_ :: common.toList)
-.getOrElse(common.toList)
-.reduce(And)
-}
-}
-
-  case not @ Not(exp) =
-exp match {
-  case Literal(true, BooleanType) = Literal(false)
-  case Literal(false, BooleanType) = Literal(true)
-  case GreaterThan(l, r) = LessThanOrEqual(l, r)
-  case GreaterThanOrEqual(l, r) = LessThan(l, r)
-  case LessThan(l, r) = GreaterThanOrEqual(l, r)
-  case LessThanOrEqual(l, r) = GreaterThan(l, r)
-  case Not(e) = e
-  case _ = not
-}
-
-  // Turn if (true) a else b into a, and if (false) a else b into 
b.
+  case and @ And(left, right) = (left, right) match {
+// true  r  =  r
+case (Literal(true, BooleanType), r) = r
+

1 2 >

1 - 100 of 123 matches

Mail list logo