[GitHub] spark issue #15607: [SPARK-16137][SPARKR] randomForest for R

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15607
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67438/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15607: [SPARK-16137][SPARKR] randomForest for R

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15607
  
**[Test build #67438 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67438/consoleFull)**
 for PR 15607 at commit 
[`7d6e605`](https://github.com/apache/spark/commit/7d6e6055681e858b46d71708f3c86b67f1793cfd).
 * This patch **fails some tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class RandomForestClassifierWrapperWriter(instance: 
RandomForestClassifierWrapper)`
  * `  class RandomForestClassifierWrapperReader extends 
MLReader[RandomForestClassifierWrapper] `
  * `  class RandomForestRegressorWrapperWriter(instance: 
RandomForestRegressorWrapper)`
  * `  class RandomForestRegressorWrapperReader extends 
MLReader[RandomForestRegressorWrapper] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15607: [SPARK-16137][SPARKR] randomForest for R

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15607
  
**[Test build #67438 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67438/consoleFull)**
 for PR 15607 at commit 
[`7d6e605`](https://github.com/apache/spark/commit/7d6e6055681e858b46d71708f3c86b67f1793cfd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15607: [SPARK-16137][SPARKR] randomForest for R

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15607
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15605: [WIP] [SPARK-18067] [SQL] SortMergeJoin adds shuffle if ...

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15605
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15605: [WIP] [SPARK-18067] [SQL] SortMergeJoin adds shuffle if ...

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15605
  
**[Test build #67431 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67431/consoleFull)**
 for PR 15605 at commit 
[`dac2f49`](https://github.com/apache/spark/commit/dac2f49b5dc66b67d632632907cc654f4d319434).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15605: [WIP] [SPARK-18067] [SQL] SortMergeJoin adds shuffle if ...

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15605
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67431/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15607: [SPARK-16137][SPARKR] randomForest for R

2016-10-23 Thread felixcheung
GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/15607

[SPARK-16137][SPARKR] randomForest for R

## What changes were proposed in this pull request?

Random Forest Regression and Classification for R
Clean-up/reordering generics.R

## How was this patch tested?

manual tests, unit tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rrandomforest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15607.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15607


commit 7d6e6055681e858b46d71708f3c86b67f1793cfd
Author: Felix Cheung 
Date:   2016-10-24T05:50:00Z

randomForest for R




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15581: [SPARK-18044][STREAMING] FileStreamSource should not inf...

2016-10-23 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15581
  
@zsxwing shall we backport 
[this](https://github.com/apache/spark/pull/14803) first? Seems in 2.0 we don't 
support partitioned file source.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14426
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67425/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14426
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15606: [SPARK-18070][SQL] binary operator should not consider n...

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15606
  
**[Test build #67437 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67437/consoleFull)**
 for PR 15606 at commit 
[`ba880d8`](https://github.com/apache/spark/commit/ba880d8f954d29eb8bc62b4af3ced1b46aa78dc3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14426
  
**[Test build #67425 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67425/consoleFull)**
 for PR 14426 at commit 
[`dfe6a3e`](https://github.com/apache/spark/commit/dfe6a3e20e39f33edb95cc278586bf84d2cf62ba).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Hint(name: String, parameters: Seq[String], child: 
LogicalPlan) extends UnaryNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15606: [SPARK-18070][SQL] binary operator should not consider n...

2016-10-23 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15606
  
cc @gatorsmile @yhuai


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15553
  
**[Test build #67436 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67436/consoleFull)**
 for PR 15553 at commit 
[`ec49e8b`](https://github.com/apache/spark/commit/ec49e8bfa3d4a43106b418afb14aa7a5d2c3aff4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15606: [SPARK-18070][SQL] binary operator should not con...

2016-10-23 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/15606

[SPARK-18070][SQL] binary operator should not consider nullability when 
comparing input types

## What changes were proposed in this pull request?

Binary operator requires its inputs to be of same type, but it should not 
consider nullability, e.g. `EqualTo` should be able to compare an 
element-nullable array and an element-non-nullable array.

## How was this patch tested?

a regression test in `DataFrameSuite`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark type-bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15606.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15606


commit ba880d8f954d29eb8bc62b4af3ced1b46aa78dc3
Author: Wenchen Fan 
Date:   2016-10-24T05:31:23Z

binary operator should not consider nullability when comparing input types




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-23 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/15553
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15596: [SQL] Remove shuffle codes in CollectLimitExec

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15596
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67427/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-10-23 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/14136
  
@rxin @hvanhovell Thanks for your advice, I'll address them ASAP!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15596: [SQL] Remove shuffle codes in CollectLimitExec

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15596
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15596: [SQL] Remove shuffle codes in CollectLimitExec

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15596
  
**[Test build #67427 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67427/consoleFull)**
 for PR 15596 at commit 
[`6d7095c`](https://github.com/apache/spark/commit/6d7095ce50d845377621b6f969732c3fff4a90e3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-10-23 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14136
  
@jiangxb1987 can you add some test cases based on Herman's comment?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13891
  
**[Test build #67435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67435/consoleFull)**
 for PR 13891 at commit 
[`a6b5a16`](https://github.com/apache/spark/commit/a6b5a16cd78e4efe99fda40f92592c9712b04146).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13891
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13891
  
**[Test build #67434 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67434/consoleFull)**
 for PR 13891 at commit 
[`1081e64`](https://github.com/apache/spark/commit/1081e64c3fbd31c3d35b987b3200eae8c8c688e2).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13891
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67434/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15541#discussion_r84621271
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -250,24 +251,24 @@ private[spark] class TaskSchedulerImpl(
   private def resourceOfferSingleTaskSet(
   taskSet: TaskSetManager,
   maxLocality: TaskLocality,
-  shuffledOffers: Seq[WorkerOffer],
-  availableCpus: Array[Int],
-  tasks: IndexedSeq[ArrayBuffer[TaskDescription]]) : Boolean = {
+  taskAssigner: TaskAssigner) : Boolean = {
 var launchedTask = false
-for (i <- 0 until shuffledOffers.size) {
-  val execId = shuffledOffers(i).executorId
-  val host = shuffledOffers(i).host
-  if (availableCpus(i) >= CPUS_PER_TASK) {
+taskAssigner.init()
+while (taskAssigner.hasNext) {
+  var isAccepted = false
+  val currentOffer = taskAssigner.next()
+  val execId = currentOffer.workOffer.executorId
+  val host = currentOffer.workOffer.host
+  if (currentOffer.coresAvailable >= CPUS_PER_TASK) {
 try {
   for (task <- taskSet.resourceOffer(execId, host, maxLocality)) {
-tasks(i) += task
+currentOffer.assignTask(task, CPUS_PER_TASK)
 val tid = task.taskId
 taskIdToTaskSetManager(tid) = taskSet
 taskIdToExecutorId(tid) = execId
 executorIdToTaskCount(execId) += 1
-availableCpus(i) -= CPUS_PER_TASK
-assert(availableCpus(i) >= 0)
--- End diff --

@zhzhan ah, thanks for explanation! I don't notice it returns an Option. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13891
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13891
  
**[Test build #67432 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67432/consoleFull)**
 for PR 13891 at commit 
[`1f3ff96`](https://github.com/apache/spark/commit/1f3ff9621abc97345db70642fd24a9803b83c7f4).
 * This patch **fails MiMa tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13891
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67432/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14638
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67426/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14638
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14638
  
**[Test build #67426 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67426/consoleFull)**
 for PR 14638 at commit 
[`9010388`](https://github.com/apache/spark/commit/9010388485cc7430a13a971f851aa6dfdb7ac5f6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-23 Thread zhzhan
Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15541#discussion_r84621076
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -250,24 +251,24 @@ private[spark] class TaskSchedulerImpl(
   private def resourceOfferSingleTaskSet(
   taskSet: TaskSetManager,
   maxLocality: TaskLocality,
-  shuffledOffers: Seq[WorkerOffer],
-  availableCpus: Array[Int],
-  tasks: IndexedSeq[ArrayBuffer[TaskDescription]]) : Boolean = {
+  taskAssigner: TaskAssigner) : Boolean = {
 var launchedTask = false
-for (i <- 0 until shuffledOffers.size) {
-  val execId = shuffledOffers(i).executorId
-  val host = shuffledOffers(i).host
-  if (availableCpus(i) >= CPUS_PER_TASK) {
+taskAssigner.init()
+while (taskAssigner.hasNext) {
+  var isAccepted = false
+  val currentOffer = taskAssigner.next()
+  val execId = currentOffer.workOffer.executorId
+  val host = currentOffer.workOffer.host
+  if (currentOffer.coresAvailable >= CPUS_PER_TASK) {
 try {
   for (task <- taskSet.resourceOffer(execId, host, maxLocality)) {
-tasks(i) += task
+currentOffer.assignTask(task, CPUS_PER_TASK)
 val tid = task.taskId
 taskIdToTaskSetManager(tid) = taskSet
 taskIdToExecutorId(tid) = execId
 executorIdToTaskCount(execId) += 1
-availableCpus(i) -= CPUS_PER_TASK
-assert(availableCpus(i) >= 0)
--- End diff --

@viirya The assert will not fail even in the legacy code, because  
taskSet.resourceOffer(execId, host, maxLocality) return an Option. The for loop 
at most run once.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15596: [SQL] Remove shuffle codes in CollectLimitExec

2016-10-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15596#discussion_r84621075
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java 
---
@@ -39,7 +39,7 @@
   protected int partitionIndex = -1;
 
   public boolean hasNext() throws IOException {
-if (currentRows.isEmpty()) {
+if (!shouldStop()) {
--- End diff --

I revert this change as it could cause problem in releasing memory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-23 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15541
  
One concern I have is the cluster of machines with different codes. It is 
possibly if you construct your cluster with solution like ec2 and buy different 
kinds of nodes. In this case, the balance assigner would first consume machines 
with higher cores, and packed assigner would consume machines with less codes. 
I don't know if this is an issue to most of you. Even it is, this might not be 
able to solve in this change and we can consider this later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-23 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15541
  
LGTM. I have a question for the legacy code, but not related to this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13891
  
**[Test build #67434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67434/consoleFull)**
 for PR 13891 at commit 
[`1081e64`](https://github.com/apache/spark/commit/1081e64c3fbd31c3d35b987b3200eae8c8c688e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15597: [SPARK-18063][SQL] Failed to infer constraints over mult...

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15597
  
**[Test build #67433 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67433/consoleFull)**
 for PR 15597 at commit 
[`32e0b36`](https://github.com/apache/spark/commit/32e0b36eb3c2441ecf6418fed1c0944d1841c13c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15605: [WIP] [SPARK-18067] [SQL] SortMergeJoin adds shuffle if ...

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15605
  
**[Test build #67431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67431/consoleFull)**
 for PR 15605 at commit 
[`dac2f49`](https://github.com/apache/spark/commit/dac2f49b5dc66b67d632632907cc654f4d319434).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13891
  
**[Test build #67432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67432/consoleFull)**
 for PR 13891 at commit 
[`1f3ff96`](https://github.com/apache/spark/commit/1f3ff9621abc97345db70642fd24a9803b83c7f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15605: [WIP] [SPARK-18067] [SQL] SortMergeJoin adds shuffle if ...

2016-10-23 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/15605
  
cc @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15605: [WIP] [SPARK-18067] [SQL] SortMergeJoin adds shuf...

2016-10-23 Thread tejasapatil
GitHub user tejasapatil opened a pull request:

https://github.com/apache/spark/pull/15605

[WIP] [SPARK-18067] [SQL] SortMergeJoin adds shuffle if join predicates 
have non partitioned columns

## What changes were proposed in this pull request?

See https://issues.apache.org/jira/browse/SPARK-18067 for discussion. 
Putting out a PR to get some feedback about the approach.

Assume that there are two tables with columns `key` and `value` both hash 
partitioned over `key`. Assume these are the partitions for the children:

partitions |child 1 | child 2
- | -- | 
partition 0 | [0, 0, 0, 3] | [0, 0, 3, 3]
partition 1 | [1, 4, 4] | [4]
partition 2 | [2, 2] | [2, 5, 5, 5]

Since we have _all_ the same values of `key` in a given partition, we can 
evaluate other join predicates like (`tableA.value` = `tableB.value`) right 
there without needing any shuffle.

What is previously being done i.e. `HashPartitioning(key, value)` expects  
over rows with same value of `pmod( hash(key, value))` to be in the same 
partition and does not take advantage of the fact that we already have rows 
with same `key` packed together. 

This PR uses `PartitioningCollection` instead of `HashPartitioning` for 
expected partitioning.

Query:

```
val df = (0 until 16).map(i => (i, i * 2)).toDF("i", "j").coalesce(1)
df.write.format("org.apache.spark.sql.hive.orc.OrcFileFormat").bucketBy(8, 
"i").sortBy("i").saveAsTable("tableA")
df.write.format("org.apache.spark.sql.hive.orc.OrcFileFormat").bucketBy(8, 
"i").sortBy("i").saveAsTable("tableB")

hc.sql("SELECT * FROM tableA a JOIN tableB b ON a.i=b.i AND 
a.j=b.j").explain(true)
```

Before:

```
*SortMergeJoin [i#38, j#39], [i#40, j#41], Inner
:- *Sort [i#38 ASC NULLS FIRST, j#39 ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(i#38, j#39, 200)
: +- *Project [i#38, j#39]
:+- *Filter (isnotnull(i#38) && isnotnull(j#39))
:   +- *FileScan orc default.tablea[i#38,j#39] Batched: false, 
Format: ORC, Location: 
ListingFileCatalog[file:/Users/tejasp/Desktop/dev/tp-spark-2/spark/spark-warehouse/tablea],
 PartitionFilters: [], PushedFilters: [IsNotNull(i), IsNotNull(j)], ReadSchema: 
struct
+- *Sort [i#40 ASC NULLS FIRST, j#41 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(i#40, j#41, 200)
  +- *Project [i#40, j#41]
 +- *Filter (isnotnull(i#40) && isnotnull(j#41))
+- *FileScan orc default.tableb[i#40,j#41] Batched: false, 
Format: ORC, Location: 
ListingFileCatalog[file:/Users/tejasp/Desktop/dev/tp-spark-2/spark/spark-warehouse/tableb],
 PartitionFilters: [], PushedFilters: [IsNotNull(i), IsNotNull(j)], ReadSchema: 
struct
```

After:

```
== Physical Plan ==
*SortMergeJoin [i#38, j#39], [i#40, j#41], Inner
:- *Sort [i#38 ASC NULLS FIRST, j#39 ASC NULLS FIRST], false, 0
:  +- *Project [i#38, j#39]
: +- *Filter (isnotnull(j#39) && isnotnull(i#38))
:+- *FileScan orc default.tablea[i#38,j#39] Batched: false, Format: 
ORC, Location: 
ListingFileCatalog[file:/Users/tejasp/Desktop/dev/tp-spark/spark-warehouse/tablea],
 PartitionFilters: [], PushedFilters: [IsNotNull(j), IsNotNull(i)], ReadSchema: 
struct
+- *Sort [i#40 ASC NULLS FIRST, j#41 ASC NULLS FIRST], false, 0
   +- *Project [i#40, j#41]
  +- *Filter (isnotnull(j#41) && isnotnull(i#40))
 +- *FileScan orc default.tableb[i#40,j#41] Batched: false, Format: 
ORC, Location: 
ListingFileCatalog[file:/Users/tejasp/Desktop/dev/tp-spark/spark-warehouse/tableb],
 PartitionFilters: [], PushedFilters: [IsNotNull(j), IsNotNull(i)], ReadSchema: 
struct
```

## How was this patch tested?

WIP. I need to add tests for:
[] Check if the planner is not introducing extra `Shuffle` for such query
[] Check if the compatibility among `PartitioningCollection` and 
`HashPartitioning` makes sense.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tejasapatil/spark 
SPARK-18067_smb_join_pred_avoid_shuffle

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15605.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15605


commit dac2f49b5dc66b67d632632907cc654f4d319434
Author: Tejas Patil 
Date:   2016-10-24T04:23:13Z

[SPARK-18067] [SQL] SortMergeJoin adds shuffle if join predicates have non 
partitioned columns




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is 

[GitHub] spark issue #15428: [SPARK-17219][ML] enhanced NaN value handling in Bucketi...

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15428
  
**[Test build #67430 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67430/consoleFull)**
 for PR 15428 at commit 
[`b14fbab`](https://github.com/apache/spark/commit/b14fbab7487a8464ba2a53bb9804e00fd14d3785).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15428: [SPARK-17219][ML] enhanced NaN value handling in Bucketi...

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15428
  
**[Test build #67429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67429/consoleFull)**
 for PR 15428 at commit 
[`70cee57`](https://github.com/apache/spark/commit/70cee5791d09f596e55f39da1c4d0386a1b7e7a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML]add feature selector method base...

2016-10-23 Thread mpjlu
Github user mpjlu commented on the issue:

https://github.com/apache/spark/pull/15212
  
Hi @yanboliang and @srowen  , could you please review whether this PR 
includes all your comments. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-23 Thread zhzhan
Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15541#discussion_r84619879
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -250,24 +251,24 @@ private[spark] class TaskSchedulerImpl(
   private def resourceOfferSingleTaskSet(
   taskSet: TaskSetManager,
   maxLocality: TaskLocality,
-  shuffledOffers: Seq[WorkerOffer],
-  availableCpus: Array[Int],
-  tasks: IndexedSeq[ArrayBuffer[TaskDescription]]) : Boolean = {
+  taskAssigner: TaskAssigner) : Boolean = {
 var launchedTask = false
-for (i <- 0 until shuffledOffers.size) {
-  val execId = shuffledOffers(i).executorId
-  val host = shuffledOffers(i).host
-  if (availableCpus(i) >= CPUS_PER_TASK) {
+taskAssigner.init()
+while (taskAssigner.hasNext) {
+  var isAccepted = false
+  val currentOffer = taskAssigner.next()
+  val execId = currentOffer.workOffer.executorId
+  val host = currentOffer.workOffer.host
+  if (currentOffer.coresAvailable >= CPUS_PER_TASK) {
 try {
   for (task <- taskSet.resourceOffer(execId, host, maxLocality)) {
-tasks(i) += task
+currentOffer.assignTask(task, CPUS_PER_TASK)
 val tid = task.taskId
 taskIdToTaskSetManager(tid) = taskSet
 taskIdToExecutorId(tid) = execId
 executorIdToTaskCount(execId) += 1
-availableCpus(i) -= CPUS_PER_TASK
-assert(availableCpus(i) >= 0)
--- End diff --

Thanks @viirya for the comments. Actually I was thinking removing the check 
although it is part of the legacy code. Now the check is moved into OfferState, 
which makes more sense. IMHO, typically the assertion should never fail. But 
from the OfferState's perspective, it should guarantee such restriction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15541
  
**[Test build #67428 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67428/consoleFull)**
 for PR 15541 at commit 
[`a820e96`](https://github.com/apache/spark/commit/a820e96284f1d9108ef62cd3ef55171ebd47e08f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-23 Thread zhzhan
Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15541#discussion_r84619023
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala 
---
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable.PriorityQueue
+import scala.util.Random
+
+import org.apache.spark.internal.{config, Logging}
+import org.apache.spark.SparkConf
+import org.apache.spark.util.Utils
+
+/** Tracks the current state of the workers with available cores and 
assigned task list. */
+private[scheduler] class OfferState(val workOffer: WorkerOffer) {
+  /** The current remaining cores that can be allocated to tasks. */
+  var coresAvailable: Int = workOffer.cores
+  /** The list of tasks that are assigned to this WorkerOffer. */
+  val tasks = new ArrayBuffer[TaskDescription](coresAvailable)
+
+  def assignTask(task: TaskDescription, cpu: Int): Unit = {
+tasks += task
+coresAvailable -= cpu
+assert(coresAvailable >= 0)
+  }
+}
+
+/**
+ * TaskAssigner is the base class for all task assigner implementations, 
and can be
+ * extended to implement different task scheduling algorithms.
+ * Together with [[org.apache.spark.scheduler.TaskScheduler 
TaskScheduler]], TaskAssigner
+ * is used to assign tasks to workers with available cores. Internally, 
when TaskScheduler
+ * performs task assignment given available workers, it first sorts the 
candidate tasksets,
+ * and then for each taskset, it takes multiple rounds to request 
TaskAssigner for task
+ * assignment with different locality restrictions until there is either 
no qualified
+ * workers or no valid tasks to be assigned.
+ *
+ * TaskAssigner is responsible to maintain the worker availability state 
and task assignment
+ * information. The contract between 
[[org.apache.spark.scheduler.TaskScheduler TaskScheduler]]
+ * and TaskAssigner is as follows.
+ *
+ * First, TaskScheduler invokes construct() of TaskAssigner to initialize 
the its internal
+ * worker states at the beginning of resource offering.
+ *
+ * Second, before each round of task assignment for a taskset, 
TaskScheduler invokes the init()
+ * of TaskAssigner to initialize the data structure for the round.
+ *
+ * Third, when performing real task assignment, hasNext/next() is used by 
TaskScheduler
+ * to check the worker availability and retrieve current offering from 
TaskAssigner.
+ *
+ * Fourth, TaskScheduler calls offerAccepted() to notify the TaskAssigner 
so that
+ * TaskAssigner can decide whether the current offer is valid or not for 
the next request.
+ *
+ * Fifth, after task assignment is done, TaskScheduler invokes the 
function tasks to
+ * retrieve all the task assignment information.
+ */
+
+private[scheduler] sealed abstract class TaskAssigner {
+  protected var offer: Seq[OfferState] = _
+  protected var cpuPerTask = 1
+
+  protected def withCpuPerTask(cpuPerTask: Int): TaskAssigner = {
+this.cpuPerTask = cpuPerTask
--- End diff --

You mean cpuPerTask >= 1? I don't  think we need this check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15513
  
Could I please hear your thoughts @rxin, @srowen and @jodersky about 
changing the format to 
https://github.com/apache/spark/pull/15513#issuecomment-255594355 or 
https://github.com/apache/spark/pull/15513#issuecomment-255594464 ? 

I would like to be very sure before proceeding further to prevent extra 
efforts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15541#discussion_r84618189
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala 
---
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable.PriorityQueue
+import scala.util.Random
+
+import org.apache.spark.internal.{config, Logging}
+import org.apache.spark.SparkConf
+import org.apache.spark.util.Utils
+
+/** Tracks the current state of the workers with available cores and 
assigned task list. */
+private[scheduler] class OfferState(val workOffer: WorkerOffer) {
+  /** The current remaining cores that can be allocated to tasks. */
+  var coresAvailable: Int = workOffer.cores
+  /** The list of tasks that are assigned to this WorkerOffer. */
+  val tasks = new ArrayBuffer[TaskDescription](coresAvailable)
+
+  def assignTask(task: TaskDescription, cpu: Int): Unit = {
+tasks += task
+coresAvailable -= cpu
+assert(coresAvailable >= 0)
+  }
+}
+
+/**
+ * TaskAssigner is the base class for all task assigner implementations, 
and can be
+ * extended to implement different task scheduling algorithms.
+ * Together with [[org.apache.spark.scheduler.TaskScheduler 
TaskScheduler]], TaskAssigner
+ * is used to assign tasks to workers with available cores. Internally, 
when TaskScheduler
+ * performs task assignment given available workers, it first sorts the 
candidate tasksets,
+ * and then for each taskset, it takes multiple rounds to request 
TaskAssigner for task
+ * assignment with different locality restrictions until there is either 
no qualified
+ * workers or no valid tasks to be assigned.
+ *
+ * TaskAssigner is responsible to maintain the worker availability state 
and task assignment
+ * information. The contract between 
[[org.apache.spark.scheduler.TaskScheduler TaskScheduler]]
+ * and TaskAssigner is as follows.
+ *
+ * First, TaskScheduler invokes construct() of TaskAssigner to initialize 
the its internal
+ * worker states at the beginning of resource offering.
+ *
+ * Second, before each round of task assignment for a taskset, 
TaskScheduler invokes the init()
+ * of TaskAssigner to initialize the data structure for the round.
+ *
+ * Third, when performing real task assignment, hasNext/next() is used by 
TaskScheduler
+ * to check the worker availability and retrieve current offering from 
TaskAssigner.
+ *
+ * Fourth, TaskScheduler calls offerAccepted() to notify the TaskAssigner 
so that
+ * TaskAssigner can decide whether the current offer is valid or not for 
the next request.
+ *
+ * Fifth, after task assignment is done, TaskScheduler invokes the 
function tasks to
+ * retrieve all the task assignment information.
+ */
+
+private[scheduler] sealed abstract class TaskAssigner {
+  protected var offer: Seq[OfferState] = _
+  protected var cpuPerTask = 1
+
+  protected def withCpuPerTask(cpuPerTask: Int): TaskAssigner = {
+this.cpuPerTask = cpuPerTask
--- End diff --

Do we need to add sanity check for `cpuPerTask`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15596: [SQL] Remove shuffle codes in CollectLimitExec

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15596
  
**[Test build #67427 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67427/consoleFull)**
 for PR 15596 at commit 
[`6d7095c`](https://github.com/apache/spark/commit/6d7095ce50d845377621b6f969732c3fff4a90e3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15513
  
https://github.com/apache/spark/pull/15513#issuecomment-255632253 I 
initially wanted to propose that idea but I didn't  becuase I was worried of 
ignoring existing name rules in each expression.
So, I only replaced obviously arbitrary names to `expr` here and followed 
the majority names in each file.

I guess it is a safe choice to mention them in the arguments. Also, I would 
like to avoid a lot of potential arguments about naming, for a simple example, 
https://github.com/apache/spark/pull/15513#discussion_r84406275 

BTW, I guess I didn't use the name `expr` for literal in this PR though. 
Will try to double check this one as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15588: [SPARK-18039][Scheduler] fix bug maxRegisteredWaitingTim...

2016-10-23 Thread Astralidea
Github user Astralidea commented on the issue:

https://github.com/apache/spark/pull/15588
  
@lw-lin 
spark.scheduler.minRegisteredResourcesRatio does not work.
The reason it may could be I use mesos and it run executor not through 
driver.
but I still need to make sure it have sufficient resources registered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14638
  
**[Test build #67426 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67426/consoleFull)**
 for PR 14638 at commit 
[`9010388`](https://github.com/apache/spark/commit/9010388485cc7430a13a971f851aa6dfdb7ac5f6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15541
  
LGTM except a few minor comments. Could @rxin do the final check? 

Thank you for your work! @zhzhan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15541#discussion_r84616602
  
--- Diff: docs/configuration.md ---
@@ -1350,6 +1350,20 @@ Apart from these, the following properties are also 
available, and may be useful
 Should be greater than or equal to 1. Number of allowed retries = this 
value - 1.
   
 
+
+  spark.scheduler.taskAssigner
+  roundrobin
+  
+The strategy of how to allocate tasks among workers with free cores. 
Three task
+assigners (roundrobin, packed, and balanced) are supported currently. 
By default, roundrobin
+with randomness is used to allocate task to workers with available 
cores in a
--- End diff --

`allocate task` -> `allocate tasks`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14426
  
**[Test build #67425 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67425/consoleFull)**
 for PR 14426 at commit 
[`dfe6a3e`](https://github.com/apache/spark/commit/dfe6a3e20e39f33edb95cc278586bf84d2cf62ba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15541#discussion_r84616347
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala 
---
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable.PriorityQueue
+import scala.util.Random
+
+import org.apache.spark.internal.{config, Logging}
+import org.apache.spark.SparkConf
+import org.apache.spark.util.Utils
+
+/** Tracks the current state of the workers with available cores and 
assigned task list. */
+private[scheduler] class OfferState(val workOffer: WorkerOffer) {
+  /** The current remaining cores that can be allocated to tasks. */
+  var coresAvailable: Int = workOffer.cores
+  /** The list of tasks that are assigned to this WorkerOffer. */
+  val tasks = new ArrayBuffer[TaskDescription](coresAvailable)
+
+  def assignTask(task: TaskDescription, cpu: Int): Unit = {
+tasks += task
+coresAvailable -= cpu
+assert(coresAvailable >= 0)
+  }
+}
+
+/**
+ * TaskAssigner is the base class for all task assigner implementations, 
and can be
+ * extended to implement different task scheduling algorithms.
+ * Together with [[org.apache.spark.scheduler.TaskScheduler 
TaskScheduler]], TaskAssigner
+ * is used to assign tasks to workers with available cores. Internally, 
when TaskScheduler
+ * performs task assignment given available workers, it first sorts the 
candidate tasksets,
+ * and then for each taskset, it takes multiple rounds to request 
TaskAssigner for task
+ * assignment with different locality restrictions until there is either 
no qualified
+ * workers or no valid tasks to be assigned.
+ *
+ * TaskAssigner is responsible to maintain the worker availability state 
and task assignment
+ * information. The contract between 
[[org.apache.spark.scheduler.TaskScheduler TaskScheduler]]
+ * and TaskAssigner is as follows.
+ *
+ * First, TaskScheduler invokes construct() of TaskAssigner to initialize 
the its internal
+ * worker states at the beginning of resource offering.
+ *
+ * Second, before each round of task assignment for a taskset, 
TaskScheduler invokes the init()
+ * of TaskAssigner to initialize the data structure for the round.
+ *
+ * Third, when performing real task assignment, hasNext/next() is used by 
TaskScheduler
+ * to check the worker availability and retrieve current offering from 
TaskAssigner.
+ *
+ * Fourth, TaskScheduler calls offerAccepted() to notify the TaskAssigner 
so that
+ * TaskAssigner can decide whether the current offer is valid or not for 
the next request.
+ *
+ * Fifth, after task assignment is done, TaskScheduler invokes the 
function tasks to
+ * retrieve all the task assignment information.
+ */
+
+private[scheduler] sealed abstract class TaskAssigner {
+  protected var offer: Seq[OfferState] = _
+  protected var cpuPerTask = 1
+
+  protected def withCpuPerTask(cpuPerTask: Int): TaskAssigner = {
+this.cpuPerTask = cpuPerTask
+this
+  }
+
+  /** The currently assigned offers. */
+  final def tasks: Seq[ArrayBuffer[TaskDescription]] = offer.map(_.tasks)
+
+  /**
+   * Invoked at the beginning of resource offering to construct the offer 
with the workoffers.
+   * By default, offers is randomly shuffled to avoid always placing tasks 
on the same set of
+   * workers.
+   */
+  def construct(workOffer: Seq[WorkerOffer]): Unit = {
+offer = Random.shuffle(workOffer.map(o => new OfferState(o)))
+  }
+
+  /** Invoked at each round of Taskset assignment to initialize the 
internal structure. */
+  def init(): Unit
+
+  /**
+   * Tests whether there is offer available to be used inside of one round 
of Taskset assignment.
+   * @return  `true` if a 

[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13891
  
**[Test build #67424 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67424/consoleFull)**
 for PR 13891 at commit 
[`513e791`](https://github.com/apache/spark/commit/513e7915ecb807bc04ed8a17fdaa121e9ac578b5).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13891
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13891
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67424/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15513: [SPARK-17963][SQL][Documentation] Add examples (e...

2016-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15513#discussion_r84615985
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/bitwiseExpressions.scala
 ---
@@ -27,8 +27,16 @@ import org.apache.spark.sql.types._
  * Code generation inherited from BinaryArithmetic.
  */
 @ExpressionDescription(
-  usage = "a _FUNC_ b - Bitwise AND.",
-  extended = "> SELECT 3 _FUNC_ 5; 1")
+  usage = "expr1 _FUNC_ expr2 - Bitwise AND.",
+  extended = """
+Arguments:
+  expr1 - an integral numeric expression.
--- End diff --

Oh, you meant what is that. I referred `IntegralType` class. maybe I should 
fix that just to `intergral expression`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15513
  
@gatorsmile Oh, they should be `literal` because in most cases users will 
give the input as `literal` for those cases. I though I used a word `literal` 
for those foldable expressions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15513: [SPARK-17963][SQL][Documentation] Add examples (e...

2016-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15513#discussion_r84615684
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala
 ---
@@ -43,11 +43,20 @@ import org.apache.spark.util.Utils
  * and the second element should be a literal string for 
the method name,
  * and the remaining are input arguments to the Java 
method.
  */
-// scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(class,method[,arg1[,arg2..]]) calls method with 
reflection",
-  extended = "> SELECT _FUNC_('java.util.UUID', 'randomUUID');\n 
c33fb387-8500-4bfa-81d2-6e0e3e930df2")
-// scalastyle:on line.size.limit
+  usage = "_FUNC_(class, method[, arg1[, arg2 ..]]) - Calls method with 
reflection.",
+  extended = """
+Arguments:
+  class - a string literal that represents a fully-qualified class 
name.
+  method - a string literal that represents a method name.
+  arg - a boolean, numeric or string expression that represents 
arguments for the method.
--- End diff --

Ah, no, numeric types except decimal. I will note that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13891
  
**[Test build #67424 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67424/consoleFull)**
 for PR 13891 at commit 
[`513e791`](https://github.com/apache/spark/commit/513e7915ecb807bc04ed8a17fdaa121e9ac578b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13891
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13891
  
**[Test build #67423 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67423/consoleFull)**
 for PR 13891 at commit 
[`294164d`](https://github.com/apache/spark/commit/294164d839b0ce191fee341b0eb82b81d506d8c8).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13891
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67423/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13891
  
**[Test build #67423 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67423/consoleFull)**
 for PR 13891 at commit 
[`294164d`](https://github.com/apache/spark/commit/294164d839b0ce191fee341b0eb82b81d506d8c8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15513: [SPARK-17963][SQL][Documentation] Add examples (e...

2016-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15513#discussion_r84615167
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala
 ---
@@ -76,8 +76,14 @@ abstract class Covariance(x: Expression, y: Expression) 
extends DeclarativeAggre
   }
 }
 
+
--- End diff --

This wouldn't be a nit because most of case class definitions across 
expressions seem have double-spaced indentation. Also, it seems fine - 
https://github.com/databricks/scala-style-guide#blank-lines-vertical-whitespace

> Use one or two blank line(s) to separate class definitions.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13891
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67422/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13891
  
**[Test build #67422 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67422/consoleFull)**
 for PR 13891 at commit 
[`d29fd67`](https://github.com/apache/spark/commit/d29fd67a2a24675b7be2f7f51ba170fda11a85d7).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13891
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13891
  
**[Test build #67422 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67422/consoleFull)**
 for PR 13891 at commit 
[`d29fd67`](https://github.com/apache/spark/commit/d29fd67a2a24675b7be2f7f51ba170fda11a85d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15513
  
I have another question. I found this PR also use `expressions` in the 
description when the function only accepts foldable expressions. This is also 
unclear to the users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15513: [SPARK-17963][SQL][Documentation] Add examples (e...

2016-10-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15513#discussion_r84613013
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala
 ---
@@ -43,11 +43,20 @@ import org.apache.spark.util.Utils
  * and the second element should be a literal string for 
the method name,
  * and the remaining are input arguments to the Java 
method.
  */
-// scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(class,method[,arg1[,arg2..]]) calls method with 
reflection",
-  extended = "> SELECT _FUNC_('java.util.UUID', 'randomUUID');\n 
c33fb387-8500-4bfa-81d2-6e0e3e930df2")
-// scalastyle:on line.size.limit
+  usage = "_FUNC_(class, method[, arg1[, arg2 ..]]) - Calls method with 
reflection.",
+  extended = """
+Arguments:
+  class - a string literal that represents a fully-qualified class 
name.
+  method - a string literal that represents a method name.
+  arg - a boolean, numeric or string expression that represents 
arguments for the method.
--- End diff --

Do we support decimal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15513
  
To differentiate literals from expressions, the simplest way is to add 
`expr` or `expression` into the function name. For the arguments whose names do 
not contain `expr`, they are treated as literal. I know there are multiple 
exceptions. For example, the input might be neither expressions nor literals. 
Then, we can explain it in the description. 

Below is the online doc of Oracle functions: 
https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions115.htm#SQLRF00680


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15579: Added support for extra command in front of spark.

2016-10-23 Thread sheepduke
Github user sheepduke commented on the issue:

https://github.com/apache/spark/pull/15579
  
@srowen Hi, at first we wanted to add support for numactl only, but later 
we thought that it is rather better to make it possible for adding the prefix 
command. It is not only for NUMA, but also for other optimization tools.

This feature makes life a lot easier when spark is called from a script or 
so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13252: [SPARK-15473][SQL] CSV data source writes header for emp...

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13252
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13252: [SPARK-15473][SQL] CSV data source writes header for emp...

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13252
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67421/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13252: [SPARK-15473][SQL] CSV data source writes header for emp...

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13252
  
**[Test build #67421 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67421/consoleFull)**
 for PR 13252 at commit 
[`031c9da`](https://github.com/apache/spark/commit/031c9dacba77c6197626d02ceb0e1081b18e187b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15603: [WEBUI][MINOR] Return types in methods + cleanup

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15603
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67420/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15603: [WEBUI][MINOR] Return types in methods + cleanup

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15603
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15603: [WEBUI][MINOR] Return types in methods + cleanup

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15603
  
**[Test build #67420 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67420/consoleFull)**
 for PR 15603 at commit 
[`6fcd247`](https://github.com/apache/spark/commit/6fcd247f546c2cfe15f6909bf8a9d75dc822ef15).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15513
  
IMHO, I would like to leave it and prefer to differentiate literal from 
expression as it throws an exception, for example, when a column is given as an 
argument for literal, the exception message usually says something like "the 
argument should be a string literal".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15604: [SPARK-18066] [CORE] [TESTS] Add Pool usage policies tes...

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15604
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15604: [SPARK-18066] [CORE] [TESTS] Add Pool usage policies tes...

2016-10-23 Thread erenavsarogullari
Github user erenavsarogullari commented on the issue:

https://github.com/apache/spark/pull/15604
  
cc @kayousterhout @markhamstra 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15604: Add Pool usage policies test coverage for FIFO & ...

2016-10-23 Thread erenavsarogullari
GitHub user erenavsarogullari opened a pull request:

https://github.com/apache/spark/pull/15604

Add Pool usage policies test coverage for FIFO & FAIR Schedulers

## What changes were proposed in this pull request?
The following FIFO & FAIR Schedulers Pool usage cases need to have unit 
test coverage :

- FIFO Scheduler just uses **root pool** so even if `spark.scheduler.pool` 
property is set, related pool is not created and `TaskSetManagers` are added to 
**root pool**.
- FAIR Scheduler uses `default pool` when `spark.scheduler.pool` property 
is not set. This can be happened when 
 - `Properties` object is **null** 
 - `Properties` object is **empty**(`new Properties()`) 
 - It points **default pool**(`spark.scheduler.pool=default`).
- FAIR Scheduler creates a **new pool** with **default values** when 
`spark.scheduler.pool` property points **non-existent** pool. This can be 
happened when **scheduler allocation file** is not set or it does not contain 
related pool.

## How was this patch tested?
New Unit tests are added.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/erenavsarogullari/spark SPARK-18066

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15604.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15604


commit 9e588cf33ea3a99b384399a7b444ee7fb9df82d5
Author: erenavsarogullari 
Date:   2016-10-23T20:37:26Z

Add Pool usage policies test coverage for FIFO & FAIR Schedulers




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15513: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15513
  
Thanks @gatorsmile. I will double-check if there is any instance more that 
misses default value.

For the format, I asked @rxin initially to avoid change a lot repeatedly. 
Could I please ask what do you think of the suggested format? I would like to 
confirm it before proceeding to sweep it - @srowen @rxin @jodersky.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15441: [SPARK-4411] [Web UI] Add "kill" link for jobs in the UI

2016-10-23 Thread ajbozarth
Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/15441
  
@srowen Addressed the try-finally, this look good to go now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15513: [SPARK-17963][SQL][Documentation] Add examples (e...

2016-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15513#discussion_r84604637
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 ---
@@ -49,21 +49,29 @@ import org.apache.spark.sql.types._
  *   DEFAULT_PERCENTILE_ACCURACY.
  */
 @ExpressionDescription(
-  usage =
-"""
-  _FUNC_(col, percentage [, accuracy]) - Returns the approximate 
percentile value of numeric
+  usage = """
+_FUNC_(col, percentage [, accuracy]) - Returns the approximate 
percentile value of numeric
   column `col` at the given percentage. The value of percentage must 
be between 0.0
   and 1.0. The `accuracy` parameter (default: 1) is a positive 
integer literal which
   controls approximation accuracy at the cost of memory. Higher value 
of `accuracy` yields
   better accuracy, `1.0/accuracy` is the relative error of the 
approximation.
-
-  _FUNC_(col, array(percentage1 [, percentage2]...) [, accuracy]) - 
Returns the approximate
-  percentile array of column `col` at the given percentage array. Each 
value of the
-  percentage array must be between 0.0 and 1.0. The `accuracy` 
parameter (default: 1) is
-   a positive integer literal which controls approximation accuracy at 
the cost of memory.
-   Higher value of `accuracy` yields better accuracy, `1.0/accuracy` 
is the relative error of
-   the approximation.
-""")
+  When percentage is an array, each value of the percentage array must 
be between 0.0 and 1.0.
+  In this case, returns the approximate percentile array of column 
`col` at the given
+  percentage array.
+  """,
+  extended = """
+Arguments:
+  col - a numeric expression.
+  percentage - a numeric literal or an array literal of numeric type 
that defines the
+percentile between 0.0 and 1.0. For example, 0.5 means 
50-percentile.
+  accuracy - a numeric literal that defines approximation accuracy.
--- End diff --

expression takes both value and column where as literal takes value only in 
general. (It throws an exception when a column is given instead).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15513: [SPARK-17963][SQL][Documentation] Add examples (e...

2016-10-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15513#discussion_r84604598
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/xpath.scala
 ---
@@ -150,8 +206,16 @@ case class XPathString(xml: Expression, path: 
Expression) extends XPathExtract {
 
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(xml, xpath) - Returns a string array of values within 
xml nodes that match the xpath expression",
-  extended = "> SELECT 
_FUNC_('b1b2b3c1c2','a/b/text()');\n['b1','b2','b3']")
+  usage = "_FUNC_(xml, xpath) - Returns a string array of values within 
the nodes of xml that match the XPath expression.",
+  extended = """
+Arguments:
+  xml - a string expression that represents XML document.
+  path - a string literal that represents XPath expression.
--- End diff --

expression takes both value and column where as literal takes value only in 
general. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-23 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/15553
  
@srowen This is explicitly supports maven.test.skip=true - the rationale 
for using it can be project specific (generating clean builds for various 
profile combinations, internal release builds from an apache tagged released 
version, etc for example).

We are seeing close to halving the build time for clean builds (use of 
zinc, batched build, etc is already enabled).

Regarding complexity of the build - if there are any suggestions on how to 
reduce that, please do let me know. The pr primarily reorganizes all test scope 
dependencies into a profile of its own which is activated by default unless 
maven.test.skip=true; which seems to be the only recommended way to do it from 
maven docs/discussions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13252: [SPARK-15473][SQL] CSV data source writes header for emp...

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13252
  
**[Test build #67421 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67421/consoleFull)**
 for PR 13252 at commit 
[`031c9da`](https://github.com/apache/spark/commit/031c9dacba77c6197626d02ceb0e1081b18e187b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15602: [SPARK-18058][SQL] [BRANCH-2.0]Comparing column t...

2016-10-23 Thread CodingCat
Github user CodingCat closed the pull request at:

https://github.com/apache/spark/pull/15602


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15602: [SPARK-18058][SQL] [BRANCH-2.0]Comparing column types ig...

2016-10-23 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/15602
  
Merging to branch-2.0. Thanks! Can you close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15541
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67418/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15541
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15541
  
**[Test build #67418 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67418/consoleFull)**
 for PR 15541 at commit 
[`6b29002`](https://github.com/apache/spark/commit/6b29002c29fecdbe32159dd0d31f53716630de46).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >