[GitHub] spark pull request: [SPARK-9152][SQL] Implement code generation fo...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7561#issuecomment-123352359
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9019][YARN] Add RM delegation token to ...

2015-07-21 Thread bolkedebruin
Github user bolkedebruin commented on the pull request:

https://github.com/apache/spark/pull/7489#issuecomment-123366143
  
@tgravescs I have added a full debug log to the jira issue (SPARK-9019). 
For this cluster the job enters accepted state but never running. I don't think 
I have seen another issue, but you might see one.

@harishreedharan I have commented out the body of the function 
getDriverLogUrls in YarnClusterSchedulerBackend and building now. I hope I have 
enough time to test it tonight.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9081][SPARK-9168][SQL] nanvl dropna/f...

2015-07-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7523


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8695][CORE][MLlib] TreeAggregation shou...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7397#issuecomment-123373169
  
  [Test build #37956 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37956/consoleFull)
 for   PR 7397 at commit 
[`041620c`](https://github.com/apache/spark/commit/041620c93dc72010bb0907c0c5363808878d2496).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8915] [Documentation, MLlib] Added @sin...

2015-07-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7371


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9152][SQL] Implement code generation fo...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7561#issuecomment-123388415
  
  [Test build #43 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/43/console)
 for   PR 7561 at commit 
[`aea58e0`](https://github.com/apache/spark/commit/aea58e0737a60a9f3dcdab49c2c8dfd66d1f8e49).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9152][SQL] Implement code generation fo...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7561#issuecomment-123388600
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9152][SQL] Implement code generation fo...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7561#issuecomment-123392783
  
  [Test build #37955 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37955/console)
 for   PR 7561 at commit 
[`aea58e0`](https://github.com/apache/spark/commit/aea58e0737a60a9f3dcdab49c2c8dfd66d1f8e49).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4751] Dynamic allocation in standa...

2015-07-21 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/7532#discussion_r35123818
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/master/ApplicationInfo.scala ---
@@ -96,6 +109,47 @@ private[spark] class ApplicationInfo(
 
   private[master] def coresLeft: Int = requestedCores - coresGranted
 
+  /**
+   * Return the number of executors waiting to be scheduled once space 
frees up.
+   *
+   * This is only defined if the application explicitly set the executor 
limit. For instance,
+   * if an application asks for 8 executors but there is only space for 5, 
then there will be
+   * 3 waiting executors.
+   */
+  private[master] def numWaitingExecutors: Int = {
+if (executorLimit != Integer.MAX_VALUE) {
+  math.max(0, executorLimit - executors.size)
+} else {
+  0
+}
+  }
+
+  /**
+   * Add a worker to the blacklist, called when the executor running on 
the worker is killed.
+   * This is used only if cores per executor is not set.
+   */
+  private[master] def blacklistWorker(workerId: String): Unit = {
+blacklistedWorkers += workerId
+  }
+
+  /**
+   * Remove workers from the blacklist, called when the application 
requests new executors.
+   * This is used only if cores per executor is not set.
+   */
+  private[master] def removeFromBlacklist(numWorkers: Int): Unit = {
+blacklistedWorkers.take(numWorkers).foreach { workerId =
--- End diff --

drop returns a copy


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9024] [WIP] Unsafe HashJoin

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7480#issuecomment-123370771
  
  [Test build #1149 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1149/consoleFull)
 for   PR 7480 at commit 
[`1a40f02`](https://github.com/apache/spark/commit/1a40f02df481263d7dc25aa5b96157e2f6a5380f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9019][YARN] Add RM delegation token to ...

2015-07-21 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/7489#issuecomment-123371110
  
hmm, perhaps its an interaction with the RM Ha:

15/07/21 16:02:35 INFO client.ConfiguredRMFailoverProxyProvider: Failing 
over to rm1
15/07/21 16:02:35 INFO retry.RetryInvocationHandler: Exception while 
invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over 
rm1 after 6 fail over attempts. Trying to fail over immediately.


we don't use RM HA. Either way I think we should just remove the 
getNodeReport call in  https://issues.apache.org/jira/browse/SPARK-8988. Then 
this wouldn't be an issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

2015-07-21 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r35121095
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -739,6 +742,88 @@ class DAGSchedulerSuite
 assertDataStructuresEmpty()
   }
 
+  test(verify not submit next stage while not have registered mapStatus) 
{
+val firstRDD = new MyRDD(sc, 3, Nil)
+val firstShuffleDep = new ShuffleDependency(firstRDD, null)
+val firstShuffleId = firstShuffleDep.shuffleId
+val shuffleMapRdd = new MyRDD(sc, 3, List(firstShuffleDep))
+val shuffleDep = new ShuffleDependency(shuffleMapRdd, null)
+val reduceRdd = new MyRDD(sc, 1, List(shuffleDep))
+submit(reduceRdd, Array(0))
+
+// things start out smoothly, stage 0 completes with no issues
+complete(taskSets(0), Seq(
+  (Success, makeMapStatus(hostB, shuffleMapRdd.partitions.size)),
+  (Success, makeMapStatus(hostB, shuffleMapRdd.partitions.size)),
+  (Success, makeMapStatus(hostA, shuffleMapRdd.partitions.size))
+))
+
+// then one executor dies, and a task fails in stage 1
+runEvent(ExecutorLost(exec-hostA))
+runEvent(CompletionEvent(taskSets(1).tasks(0),
+  FetchFailed(null, firstShuffleId, 2, 0, Fetch failed),
+  null, null, createFakeTaskInfo(), null))
+
+// so we resubmit stage 0, which completes happily
+Thread.sleep(1000)
+val stage0Resubmit = taskSets(2)
+assert(stage0Resubmit.stageId == 0)
+assert(stage0Resubmit.stageAttemptId === 1)
+val task = stage0Resubmit.tasks(0)
+assert(task.partitionId === 2)
+runEvent(CompletionEvent(task, Success,
+  makeMapStatus(hostC, shuffleMapRdd.partitions.size), null, 
createFakeTaskInfo(), null))
+
+// now here is where things get tricky : we will now have a task set 
representing
+// the second attempt for stage 1, but we *also* have some tasks for 
the first attempt for
+// stage 1 still going
+val stage1Resubmit = taskSets(3)
+assert(stage1Resubmit.stageId == 1)
+assert(stage1Resubmit.stageAttemptId === 1)
+assert(stage1Resubmit.tasks.length === 3)
+
+// we'll have some tasks finish from the first attempt, and some 
finish from the second attempt,
+// so that we actually have all stage outputs, though no attempt has 
completed all its
+// tasks
+runEvent(CompletionEvent(taskSets(3).tasks(0), Success,
+  makeMapStatus(hostC, reduceRdd.partitions.size), null, 
createFakeTaskInfo(), null))
+runEvent(CompletionEvent(taskSets(3).tasks(1), Success,
+  makeMapStatus(hostC, reduceRdd.partitions.size), null, 
createFakeTaskInfo(), null))
+// late task finish from the first attempt
+runEvent(CompletionEvent(taskSets(1).tasks(2), Success,
+  makeMapStatus(hostB, reduceRdd.partitions.size), null, 
createFakeTaskInfo(), null))
+
+// What should happen now is that we submit stage 2.  However, we 
might not see an error
+// b/c of DAGScheduler's error handling (it tends to swallow errors 
and just log them).  But
+// we can check some conditions.
+// Note that the really important thing here is not so much that we 
submit stage 2 *immediately*
+// but that we don't end up with some error from these interleaved 
completions.  It would also
+// be OK (though sub-optimal) if stage 2 simply waited until the 
resubmission of stage 1 had
+// all its tasks complete
+
+// check that we have all the map output for stage 0 (it should have 
been there even before
+// the last round of completions from stage 1, but just to double 
check it hasn't been messed
+// up)
+(0 until 3).foreach { reduceIdx =
+  val arr = mapOutputTracker.getServerStatuses(0, reduceIdx)
+  assert(arr != null)
+  assert(arr.nonEmpty)
--- End diff --

`getServerStatuses` has been removed in master -- I guess both of these 
should be

```scala
val statuses = mapOutputTracker.getMapSizesByExecutorId(0, reduceIdx)
assert(statuses != null)
assert(statuses.nonEmpty)
```

The new code will now throw an exception if we're missing the map output 
data, but I feel like its probably still good to leave those asserts in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For 

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

2015-07-21 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r35121106
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -739,6 +742,88 @@ class DAGSchedulerSuite
 assertDataStructuresEmpty()
   }
 
+  test(verify not submit next stage while not have registered mapStatus) 
{
+val firstRDD = new MyRDD(sc, 3, Nil)
+val firstShuffleDep = new ShuffleDependency(firstRDD, null)
+val firstShuffleId = firstShuffleDep.shuffleId
+val shuffleMapRdd = new MyRDD(sc, 3, List(firstShuffleDep))
+val shuffleDep = new ShuffleDependency(shuffleMapRdd, null)
+val reduceRdd = new MyRDD(sc, 1, List(shuffleDep))
+submit(reduceRdd, Array(0))
+
+// things start out smoothly, stage 0 completes with no issues
+complete(taskSets(0), Seq(
+  (Success, makeMapStatus(hostB, shuffleMapRdd.partitions.size)),
+  (Success, makeMapStatus(hostB, shuffleMapRdd.partitions.size)),
+  (Success, makeMapStatus(hostA, shuffleMapRdd.partitions.size))
+))
+
+// then one executor dies, and a task fails in stage 1
+runEvent(ExecutorLost(exec-hostA))
+runEvent(CompletionEvent(taskSets(1).tasks(0),
+  FetchFailed(null, firstShuffleId, 2, 0, Fetch failed),
+  null, null, createFakeTaskInfo(), null))
+
+// so we resubmit stage 0, which completes happily
+Thread.sleep(1000)
+val stage0Resubmit = taskSets(2)
+assert(stage0Resubmit.stageId == 0)
+assert(stage0Resubmit.stageAttemptId === 1)
+val task = stage0Resubmit.tasks(0)
+assert(task.partitionId === 2)
+runEvent(CompletionEvent(task, Success,
+  makeMapStatus(hostC, shuffleMapRdd.partitions.size), null, 
createFakeTaskInfo(), null))
+
+// now here is where things get tricky : we will now have a task set 
representing
+// the second attempt for stage 1, but we *also* have some tasks for 
the first attempt for
+// stage 1 still going
+val stage1Resubmit = taskSets(3)
+assert(stage1Resubmit.stageId == 1)
+assert(stage1Resubmit.stageAttemptId === 1)
+assert(stage1Resubmit.tasks.length === 3)
+
+// we'll have some tasks finish from the first attempt, and some 
finish from the second attempt,
+// so that we actually have all stage outputs, though no attempt has 
completed all its
+// tasks
+runEvent(CompletionEvent(taskSets(3).tasks(0), Success,
+  makeMapStatus(hostC, reduceRdd.partitions.size), null, 
createFakeTaskInfo(), null))
+runEvent(CompletionEvent(taskSets(3).tasks(1), Success,
+  makeMapStatus(hostC, reduceRdd.partitions.size), null, 
createFakeTaskInfo(), null))
+// late task finish from the first attempt
+runEvent(CompletionEvent(taskSets(1).tasks(2), Success,
+  makeMapStatus(hostB, reduceRdd.partitions.size), null, 
createFakeTaskInfo(), null))
+
+// What should happen now is that we submit stage 2.  However, we 
might not see an error
+// b/c of DAGScheduler's error handling (it tends to swallow errors 
and just log them).  But
+// we can check some conditions.
+// Note that the really important thing here is not so much that we 
submit stage 2 *immediately*
+// but that we don't end up with some error from these interleaved 
completions.  It would also
+// be OK (though sub-optimal) if stage 2 simply waited until the 
resubmission of stage 1 had
+// all its tasks complete
+
+// check that we have all the map output for stage 0 (it should have 
been there even before
+// the last round of completions from stage 1, but just to double 
check it hasn't been messed
+// up)
+(0 until 3).foreach { reduceIdx =
+  val arr = mapOutputTracker.getServerStatuses(0, reduceIdx)
+  assert(arr != null)
+  assert(arr.nonEmpty)
+}
+
+// and check we have all the map output for stage 1
+(0 until 1).foreach { reduceIdx =
+  val arr = mapOutputTracker.getServerStatuses(1, reduceIdx)
+  assert(arr != null)
+  assert(arr.nonEmpty)
--- End diff --

same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8906][SQL] Move all internal data sourc...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7565#issuecomment-123394190
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8922] [Documentation, MLlib] Add @since...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7429#issuecomment-123394101
  
  [Test build #37958 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37958/console)
 for   PR 7429 at commit 
[`59dc104`](https://github.com/apache/spark/commit/59dc104c0336bc09501b172faffd04e8b5c567d0).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9189][CORE] Takes locality and the sum ...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7536#issuecomment-123370835
  
  [Test build #37951 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37951/console)
 for   PR 7536 at commit 
[`cb72d0f`](https://github.com/apache/spark/commit/cb72d0f2ce1432ad58246fbeae60c4565fbb4ce7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8357] Fix unsafe memory leak on empty i...

2015-07-21 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/7560#issuecomment-123378564
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9193] Avoid assigning tasks to lost e...

2015-07-21 Thread GraceH
Github user GraceH commented on the pull request:

https://github.com/apache/spark/pull/7528#issuecomment-123378688
  
Thanks @squito.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8364][SPARKR] Add crosstab to SparkR Da...

2015-07-21 Thread shivaram
Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/7318#issuecomment-123386484
  
Hmm okay. Lets leave it as `crosstab` in this PR -- Before the release I'll 
try to do one more pass over the API and we can revisit this if required. Other 
than the minor unit test comment this looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8906][SQL] Move all internal data sourc...

2015-07-21 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/7565#issuecomment-123393312
  
Jenkins, retest this please. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9193] Avoid assigning tasks to lost e...

2015-07-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7528


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9189][CORE] Takes locality and the sum ...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7536#issuecomment-123366165
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9193] Avoid assigning tasks to lost e...

2015-07-21 Thread squito
Github user squito commented on the pull request:

https://github.com/apache/spark/pull/7528#issuecomment-123369818
  
yeah, I don't love the idea of adding things w/out tests, but in this case 
I suppose its best left for the future.  lgtm pending the tests passing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9019][YARN] Add RM delegation token to ...

2015-07-21 Thread bolkedebruin
Github user bolkedebruin commented on the pull request:

https://github.com/apache/spark/pull/7489#issuecomment-123377041
  
True, but SPARK-8988 mentiones that Node Report API is not available in a 
secure cluster. Which is not true (my patch basically enables it). So I am - 
personally - fine with removing it, but being not available in a secure cluster 
should not be the reason I would say.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8357] Fix unsafe memory leak on empty i...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7560#issuecomment-123380341
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8671] [ML]. Added isotonic regression t...

2015-07-21 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/7517#issuecomment-123380348
  
@jkbradley Isotonic regression expects a single feature instead of a 
feature vector. Do we want to make it a `Regressor` and use `featuresCol` as a 
param? One common use case of isotonic regression is to calibrate probabilities 
output by logistic regression. However, logistic regression only outputs 
probabilities as vectors (of size 2). It would be hard to connect logistic 
regression with isotonic regression. Any suggestions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8357] Fix unsafe memory leak on empty i...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7560#issuecomment-123380296
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8922] [Documentation, MLlib] Add @since...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7429#issuecomment-123383741
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8922] [Documentation, MLlib] Add @since...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7429#issuecomment-123383681
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8922] [Documentation, MLlib] Add @since...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7429#issuecomment-123384128
  
  [Test build #37958 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37958/consoleFull)
 for   PR 7429 at commit 
[`59dc104`](https://github.com/apache/spark/commit/59dc104c0336bc09501b172faffd04e8b5c567d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9152][SQL] Implement code generation fo...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7561#issuecomment-123392901
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8906][SQL] Move all internal data sourc...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7565#issuecomment-123394206
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9209] Using executor allocation, a exec...

2015-07-21 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/7559#issuecomment-123397022
  
add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9152][SQL] Implement code generation fo...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7561#issuecomment-123355527
  
  [Test build #37955 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37955/consoleFull)
 for   PR 7561 at commit 
[`aea58e0`](https://github.com/apache/spark/commit/aea58e0737a60a9f3dcdab49c2c8dfd66d1f8e49).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9121][SparkR] Get rid of the warnings a...

2015-07-21 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/7567#discussion_r35114903
  
--- Diff: R/pkg/inst/tests/test_sparkSQL.R ---
@@ -21,10 +21,10 @@ context(SparkSQL functions)
 
 # Utility function for easily checking the values of a StructField
 checkStructField - function(actual, expectedName, expectedType, 
expectedNullable) {
-  expect_equal(class(actual), structField)
-  expect_equal(actual$name(), expectedName)
-  expect_equal(actual$dataType.toString(), expectedType)
-  expect_equal(actual$nullable(), expectedNullable)
+  testthat::expect_equal(class(actual), structField)
--- End diff --

We don't need to do this -- instead we can include `library(testthat)` in 
our `lint-R` script as Jenkins and developers who run unit tests should have 
this package installed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8695][CORE][MLlib] TreeAggregation shou...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7397#issuecomment-123369725
  
  [Test build #1148 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1148/console)
 for   PR 7397 at commit 
[`041620c`](https://github.com/apache/spark/commit/041620c93dc72010bb0907c0c5363808878d2496).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8695][CORE][MLlib] TreeAggregation shou...

2015-07-21 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/7397#issuecomment-123371365
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9193] Avoid assigning tasks to lost e...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7528#issuecomment-123381479
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9193] Avoid assigning tasks to lost e...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7528#issuecomment-123381194
  
  [Test build #37954 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37954/console)
 for   PR 7528 at commit 
[`ecc1da6`](https://github.com/apache/spark/commit/ecc1da60869554211fa053908778a2abc1656160).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8922] [Documentation, MLlib] Add @since...

2015-07-21 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/7429#issuecomment-123382351
  
@coderxiang Could you help review this PR? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8922] [Documentation, MLlib] Add @since...

2015-07-21 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/7429#issuecomment-123382296
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8915] [Documentation, MLlib] Added @sin...

2015-07-21 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/7371#issuecomment-123382153
  
Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9019][YARN] Add RM delegation token to ...

2015-07-21 Thread harishreedharan
Github user harishreedharan commented on the pull request:

https://github.com/apache/spark/pull/7489#issuecomment-123382315
  
That exception message is misleading. The message catch block assumes the
failure happened because the API is not available but it is because of
security issues.

On Tuesday, July 21, 2015, bolkedebruin notificati...@github.com wrote:

 True, but SPARK-8988 mentiones that Node Report API is not available in a
 secure cluster. Which is not true (my patch basically enables it). So I am
 - personally - fine with removing it, but being not available in a secure
 cluster should not be the reason I would say.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/7489#issuecomment-123377041.



-- 

Thanks,
Hari



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8906][SQL] Move all internal data sourc...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7565#issuecomment-123394368
  
  [Test build #37959 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37959/consoleFull)
 for   PR 7565 at commit 
[`7661aff`](https://github.com/apache/spark/commit/7661aff472de1bcddc91d9bd325d8572abf69474).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5989] [MLlib] Model save/load for LDA

2015-07-21 Thread MechCoder
Github user MechCoder commented on the pull request:

https://github.com/apache/spark/pull/6948#issuecomment-123395868
  
Sounds good,


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9152][SQL] Implement code generation fo...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7561#issuecomment-123352192
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9152][SQL] Implement code generation fo...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7561#issuecomment-123351206
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9152][SQL] Implement code generation fo...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7561#issuecomment-123351305
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9152][SQL] Implement code generation fo...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7561#issuecomment-123351651
  
  [Test build #43 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/43/consoleFull)
 for   PR 7561 at commit 
[`aea58e0`](https://github.com/apache/spark/commit/aea58e0737a60a9f3dcdab49c2c8dfd66d1f8e49).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9189][CORE] Takes locality and the sum ...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7536#issuecomment-123366100
  
  [Test build #42 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/42/console)
 for   PR 7536 at commit 
[`cb72d0f`](https://github.com/apache/spark/commit/cb72d0f2ce1432ad58246fbeae60c4565fbb4ce7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9081][SPARK-9168][SQL] nanvl dropna/f...

2015-07-21 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/7523#issuecomment-123369477
  
Merging this into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9189][CORE] Takes locality and the sum ...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7536#issuecomment-123370978
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8695][CORE][MLlib] TreeAggregation shou...

2015-07-21 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/7397#issuecomment-123376608
  
@piganesh You don't need to make a new PR for updates. You can push new 
commits to your remote branch, which you used to create the PR. Please address 
@srowen 's comment and remove `(...)` around `numPartitions`. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

2015-07-21 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r35121122
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -749,7 +834,7 @@ class DAGSchedulerSuite
*|  \   |
*|   \  |
*|\ |
-   *   reduceRdd1reduceRdd2
+   *   reduceRdd1reduceRddi2
--- End diff --

looks like an accidental change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

2015-07-21 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r35121200
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -17,6 +17,8 @@
 
 package org.apache.spark.scheduler
 
+import org.apache.spark.shuffle.MetadataFetchFailedException
+
--- End diff --

this is not used, delete


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5259][CORE]Make sure shuffle metadata a...

2015-07-21 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r35121783
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -487,8 +487,8 @@ private[spark] class TaskSetManager(
   // a good proxy to task serialization time.
   // val timeTaken = clock.getTime() - startTime
   val taskName = stask ${info.id} in stage ${taskSet.id}
-  logInfo(Starting %s (TID %d, %s, %s, %d bytes).format(
-  taskName, taskId, host, taskLocality, serializedTask.limit))
+  logInfo(sStarting $taskName (TID $taskId, $host, 
${task.partitionId}, +
+s$taskLocality, ${serializedTask.limit} bytes))
--- End diff --

I like the inclusion of the partitionId in the msg, but can you add a 
partition label in there, eg

```scala
logInfo(sStarting $taskName (TID $taskId, $host, partition 
${task.partitionId}, +
s$taskLocality, ${serializedTask.limit} bytes))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8935][SQL] Implement code generation fo...

2015-07-21 Thread yjshen
Github user yjshen commented on the pull request:

https://github.com/apache/spark/pull/7365#issuecomment-123396367
  
More comments on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9019][YARN] Add RM delegation token to ...

2015-07-21 Thread bolkedebruin
Github user bolkedebruin commented on the pull request:

https://github.com/apache/spark/pull/7489#issuecomment-123396052
  
@harishreedharan your SPARK-8988 trace looks very much like (or is exactly 
the same) as mine when testing on a CDH 5.4 cluster. With my patch the token is 
added and getDriverLogUrls works in a secure setup (you might say SPARK-9019 
duplicates SPARK-8988)

So if you mean the error happens because of a security issue because of a 
missing token then I would say my patch fixes that issue. If there are other 
issues for some reason then maybe it is indeed smart to remove the call in 
general as @tgravescs and change it it to get the information from the 
environment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4751] Dynamic allocation in standa...

2015-07-21 Thread CodingCat
Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/7532#discussion_r35173556
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala ---
@@ -256,4 +272,33 @@ private[spark] class AppClient(
   endpoint = null
 }
   }
+
+  /**
+   * Request executors from the Master by specifying the total number 
desired,
+   * including existing pending and running executors.
+   *
+   * @return whether the request is acknowledged.
+   */
+  def requestTotalExecutors(requestedTotal: Int): Boolean = {
--- End diff --

is it necessary to validate the value of `requestedTotal`, like `= 0`? 
though negative numbers does not bring any impact on the correctness of the 
program (if I understand code correctly)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8484] [ML]. Added TrainValidationSplit ...

2015-07-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7337#discussion_r35173552
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.tuning
+
+import scala.reflect.ClassTag
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.ml.evaluation.Evaluator
+import org.apache.spark.ml.{Estimator, Model}
+import org.apache.spark.ml.param.{DoubleParam, ParamMap, ParamValidators}
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.rdd.{RDD, PartitionwiseSampledRDD}
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.util.Utils
+import org.apache.spark.util.random.BernoulliCellSampler
+
+/**
+ * Params for [[TrainValidatorSplit]] and [[TrainValidatorSplitModel]].
+ */
+private[ml] trait TrainValidatorSplitParams extends ValidatorParams {
+  /**
+   * Param for ratio between train and validation data. Must be between 0 
and 1.
+   * Default: 0.75
+   * @group param
+   */
+  val trainRatio: DoubleParam = new DoubleParam(this, trainRatio,
+ratio between training set and validation set (= 0  = 1), 
ParamValidators.inRange(0, 1))
+
+  /** @group getParam */
+  def getTrainRatio: Double = $(trainRatio)
+
+  setDefault(trainRatio - 0.75)
+}
+
+/**
+ * :: Experimental ::
+ * Validation for hyper-parameter tuning.
+ * Randomly splits the input dataset into train and validation sets,
+ * and uses evaluation metric on the validation set to select the best 
model.
+ * Similar to [[CrossValidator]], but only splits the set once.
+ */
+@Experimental
+class TrainValidatorSplit(override val uid: String) extends 
Estimator[TrainValidatorSplitModel]
+  with TrainValidatorSplitParams with Logging {
+
+  def this() = this(Identifiable.randomUID(tvs))
+
+  /** @group setParam */
+  def setEstimator(value: Estimator[_]): this.type = set(estimator, value)
+
+  /** @group setParam */
+  def setEstimatorParamMaps(value: Array[ParamMap]): this.type = 
set(estimatorParamMaps, value)
+
+  /** @group setParam */
+  def setEvaluator(value: Evaluator): this.type = set(evaluator, value)
+
+  /** @group setParam */
+  def setTrainRatio(value: Double): this.type = set(trainRatio, value)
+
+  private[this] def sample[T: ClassTag](
+  rdd: RDD[T],
+  lb: Double,
+  ub: Double,
+  seed: Int = Utils.random.nextInt()): (RDD[T], RDD[T]) = {
--- End diff --

Should the method be a one-liner: `val (train, validation) = 
df.randomSplit([trainRatio, 1 - trainRatio], seed)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8484] [ML]. Added TrainValidationSplit ...

2015-07-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7337#discussion_r35173547
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.tuning
+
+import scala.reflect.ClassTag
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.ml.evaluation.Evaluator
+import org.apache.spark.ml.{Estimator, Model}
+import org.apache.spark.ml.param.{DoubleParam, ParamMap, ParamValidators}
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.rdd.{RDD, PartitionwiseSampledRDD}
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.util.Utils
+import org.apache.spark.util.random.BernoulliCellSampler
+
+/**
+ * Params for [[TrainValidatorSplit]] and [[TrainValidatorSplitModel]].
+ */
+private[ml] trait TrainValidatorSplitParams extends ValidatorParams {
--- End diff --

`TrainValidatorSplit` - `TrainValidationSplit`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9180] fix spark-shell to accept --name ...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7512#issuecomment-123525589
  
  [Test build #38007 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38007/consoleFull)
 for   PR 7512 at commit 
[`e24991a`](https://github.com/apache/spark/commit/e24991a6195bf21ff765ccbc02cb8f64b14437f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7254][MLlib] Run PowerIterationClusteri...

2015-07-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/6054#discussion_r35174305
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala
 ---
@@ -152,7 +152,28 @@ class PowerIterationClustering private[clustering] (
 }
 this
   }
-
+ 
+  /**
+   * Run the PIC algorithm on Graph.
+   *
+   * @param graph an affinity matrix represented as graph, which is the 
matrix A in the PIC paper.
+   * The similarity s,,ij,, represented as the edge 
between vertices (i, j) must
--- End diff --

fix indentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7254][MLlib] Run PowerIterationClusteri...

2015-07-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/6054#discussion_r35174307
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala
 ---
@@ -213,6 +234,29 @@ object PowerIterationClustering extends Logging {
   case class Assignment(id: Long, cluster: Int)
 
   /**
+   * Normalizes the affinity graph (A) and returns the normalized affinity 
matrix (W).
+   */
+  private[clustering]
+  def normalize(graph: Graph[Double, Double]): Graph[Double, Double] = {
+val vD = graph.aggregateMessages[Double](
+  sendMsg = ctx = {
+val i = ctx.srcId
+val j = ctx.dstId
+val s = ctx.attr
+if (s  0.0) {
+  throw new SparkException(Similarity must be nonnegative but 
found s($i, $j) = $s.)
+}
+ctx.sendToSrc(s)
--- End diff --

Add `if s  0.0`? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8850] [SQL] [WIP] Enable Unsafe mode by...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7564#issuecomment-123534982
  
**[Test build #37990 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37990/console)**
 for PR 7564 at commit 
[`5464206`](https://github.com/apache/spark/commit/54642067fd49794cb29882a7cdc0fb0bb16180b1)
 after a configured wait of `175m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8850] [SQL] [WIP] Enable Unsafe mode by...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7564#issuecomment-123535090
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8484] [ML]. Added TrainValidationSplit ...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7337#issuecomment-123539941
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8484] [ML]. Added TrainValidationSplit ...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7337#issuecomment-123539981
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8881] Fix algorithm for scheduling exec...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7274#issuecomment-123548432
  
  [Test build #38019 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38019/consoleFull)
 for   PR 7274 at commit 
[`da0f491`](https://github.com/apache/spark/commit/da0f491a930e8b7f7a761ff666afaac5cbb13aaa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8232][SQL] Add sort_array support

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7581#issuecomment-123548392
  
  [Test build #38018 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38018/consoleFull)
 for   PR 7581 at commit 
[`f7974ce`](https://github.com/apache/spark/commit/f7974ceb971e0e9d6f37d526ad7b6efe1e172ea1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4366] [SQL] Aggregation Improvement

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7458#issuecomment-123558966
  
  [Test build #38023 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38023/consoleFull)
 for   PR 7458 at commit 
[`35b0520`](https://github.com/apache/spark/commit/35b05207dc329173b6c59778830fbbf59752128a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4366] [SQL] Aggregation Improvement

2015-07-21 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/7458#issuecomment-123559602
  
I will merge this one once it passes jenkins to unblock other work. If you 
have any comments to this, feel free to leave them at here. I will address them 
in a follow-up PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9024] Unsafe HashJoin/HashOuterJoin/Has...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7480#issuecomment-123561145
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9121][SparkR] Get rid of the warnings a...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7567#issuecomment-123561274
  
**[Test build #38009 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38009/console)**
 for PR 7567 at commit 
[`c8cfd63`](https://github.com/apache/spark/commit/c8cfd63cdca66a9429565e9546a1d4f05a913c60)
 after a configured wait of `175m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9024] Unsafe HashJoin/HashOuterJoin/Has...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7480#issuecomment-123561237
  
  [Test build #38028 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38028/consoleFull)
 for   PR 7480 at commit 
[`6294b1e`](https://github.com/apache/spark/commit/6294b1e3de357c94646c323eba2d4bde80971c45).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9024] Unsafe HashJoin/HashOuterJoin/Has...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7480#issuecomment-123561157
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9238][SQL]two extra useless entries for...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7582#issuecomment-123561073
  
  [Test build #38026 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38026/consoleFull)
 for   PR 7582 at commit 
[`8bddd01`](https://github.com/apache/spark/commit/8bddd0143cff2f24e17a5b3ed53103f6fd59e4fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8264][SQL]add substring_index function

2015-07-21 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/7533#discussion_r35181361
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -356,6 +358,92 @@ case class StringInstr(str: Expression, substr: 
Expression)
 }
 
 /**
+ * Returns the substring from string str before count occurrences of the 
delimiter delim.
+ * If count is positive, everything the left of the final delimiter 
(counting from left) is
+ * returned. If count is negative, every to the right of the final 
delimiter (counting from the
+ * right) is returned. substring_index performs a case-sensitive match 
when searching for delim.
+ */
+case class Substring_index(strExpr: Expression, delimExpr: Expression, 
countExpr: Expression)
+ extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[DataType] = Seq(StringType, StringType, 
IntegerType)
+  override def nullable: Boolean = strExpr.nullable || delimExpr.nullable 
|| countExpr.nullable
+  override def children: Seq[Expression] = Seq(strExpr, delimExpr, 
countExpr)
+  override def prettyName: String = substring_index
+  override def toString: String = ssubstring_index($strExpr, $delimExpr, 
$countExpr)
+
+  override def eval(input: InternalRow): Any = {
+val str = strExpr.eval(input)
+val delim = delimExpr.eval(input)
+val count = countExpr.eval(input)
+if (str == null || delim == null || count == null) {
+  null
+} else {
+  subStrIndex(
+str.asInstanceOf[UTF8String],
+delim.asInstanceOf[UTF8String],
+count.asInstanceOf[Int])
+}
+  }
+
+  private def lastOrdinalIndexOf(
+str: UTF8String, searchStr: UTF8String, ordinal: Int, lastIndex: 
Boolean = false): Int = {
+ordinalIndexOf(str, searchStr, ordinal, true)
+  }
+
+  private def ordinalIndexOf(
+  str: UTF8String, searchStr: UTF8String, ordinal: Int, lastIndex: 
Boolean = false): Int = {
+if (str == null || searchStr == null || ordinal = 0) {
+  return -1
+}
+val strNumChars = str.numChars()
+if (searchStr.numBytes() == 0) {
+  return if (lastIndex) {strNumChars} else {0}
+}
+var found = 0
+var index = if (lastIndex) {strNumChars} else {0}
+do {
+  if (lastIndex) {
+index = str.lastIndexOf(searchStr, index - 1)
+  } else {
+index = str.indexOf(searchStr, index + 1)
+  }
+  if (index  0) {
+return index
+  }
+  found += 1
+} while (found  ordinal)
+index
+  }
+
+  private def subStrIndex(strUtf8: UTF8String, delimUtf8: UTF8String, 
count: Int): UTF8String = {
+if (strUtf8 == null || delimUtf8 == null || count == null) {
+  return null
+}
+if (strUtf8.numBytes() == 0 || delimUtf8.numBytes() == 0 || count == 
0) {
+  return UTF8String.fromString()
--- End diff --

`UTF8String.EMPTY_UTF8`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4233] [SPARK-4367] [SPARK-3947] [SPARK-...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7458#issuecomment-123562244
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP] [SPARK-8176] [SPARK-8197] [SQL] Udf to_d...

2015-07-21 Thread adrian-wang
GitHub user adrian-wang reopened a pull request:

https://github.com/apache/spark/pull/6988

[WIP] [SPARK-8176] [SPARK-8197] [SQL] Udf to_date/ trunc

I'll add unit test/function registry/codegen after #6782 get in.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark udftodatetruc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6988.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6988


commit 662e2bfdad48640016d6afc0e67eb628a71549b1
Author: Daoyuan Wang daoyuan.w...@intel.com
Date:   2015-06-24T11:57:48Z

to_date

commit 450159cd9e6d645350192cfeee6f950c32406960
Author: Daoyuan Wang daoyuan.w...@intel.com
Date:   2015-06-24T13:27:33Z

udf trunc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9053][SparkR] Fix spaces around parens,...

2015-07-21 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/7584#discussion_r35183469
  
--- Diff: R/pkg/inst/tests/test_sparkSQL.R ---
@@ -664,10 +664,10 @@ test_that(column binary mathfunctions, {
   expect_equal(collect(select(df, atan2(df$a, df$b)))[2, ATAN2(a, b)], 
atan2(2, 6))
   expect_equal(collect(select(df, atan2(df$a, df$b)))[3, ATAN2(a, b)], 
atan2(3, 7))
   expect_equal(collect(select(df, atan2(df$a, df$b)))[4, ATAN2(a, b)], 
atan2(4, 8))
-  expect_equal(collect(select(df, hypot(df$a, df$b)))[1, HYPOT(a, b)], 
sqrt(1^2 + 5^2))
-  expect_equal(collect(select(df, hypot(df$a, df$b)))[2, HYPOT(a, b)], 
sqrt(2^2 + 6^2))
-  expect_equal(collect(select(df, hypot(df$a, df$b)))[3, HYPOT(a, b)], 
sqrt(3^2 + 7^2))
-  expect_equal(collect(select(df, hypot(df$a, df$b)))[4, HYPOT(a, b)], 
sqrt(4^2 + 8^2))
+  expect_equal(collect(select(df, hypot(df$a, df$b)))[1, HYPOT(a, b)], 
sqrt(1 ^ 2 + 5 ^ 2))
--- End diff --

I'm not sure we should change these. Its more readable to have `1^2` rather 
than `1 ^ 2`. Could we add a style ignore around these 4 lines alone ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8231][SQL] Add array_contains

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7580#issuecomment-123526078
  
  [Test build #38008 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38008/consoleFull)
 for   PR 7580 at commit 
[`6e01e53`](https://github.com/apache/spark/commit/6e01e53eccc41b4216f7ab2f0f8e7f879aaf689c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6485] [MLlib] [Python] Add CoordinateMa...

2015-07-21 Thread dusenberrymw
Github user dusenberrymw commented on the pull request:

https://github.com/apache/spark/pull/7554#issuecomment-123526084
  
@mengxr Thanks for the thoughts!  I'll trim this PR down to just the Python 
wrappers, and then open another JIRA up for further discussion on adding a 
DistributedMatrices class to Scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8231][SQL] Add array_contains

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7580#issuecomment-123526251
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9154][SQL] Rename formatString to forma...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7579#issuecomment-123526235
  
  [Test build #37998 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37998/console)
 for   PR 7579 at commit 
[`53ee54f`](https://github.com/apache/spark/commit/53ee54f570660caa5cfbbad1e4cd42e6f0e2adf7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4751] Dynamic allocation in standa...

2015-07-21 Thread CodingCat
Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/7532#discussion_r35173745
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1387,8 +1374,6 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
* This is currently only supported in YARN mode. Return whether the 
request is received.
--- End diff --

outdated comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8269][SQL]string function: initcap

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7208#issuecomment-123527415
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9232] [SQL] Duplicate code in JSONRelat...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7576#issuecomment-123527437
  
  [Test build #1151 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1151/consoleFull)
 for   PR 7576 at commit 
[`ea80803`](https://github.com/apache/spark/commit/ea808034de4a5e358535cc82e58501e85f4d9d9d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8269][SQL]string function: initcap

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7208#issuecomment-123527425
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9121][SparkR] Get rid of the warnings a...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7567#issuecomment-123532059
  
  [Test build #37997 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37997/console)
 for   PR 7567 at commit 
[`1a03987`](https://github.com/apache/spark/commit/1a0398735a113869d75bbce2d864d109ff7f0920).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8935][SQL] Implement code generation fo...

2015-07-21 Thread yjshen
Github user yjshen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7365#discussion_r35175062
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -418,51 +418,518 @@ case class Cast(child: Expression, dataType: 
DataType)
   protected override def nullSafeEval(input: Any): Any = cast(input)
 
   override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): 
String = {
-// TODO: Add support for more data types.
-(child.dataType, dataType) match {
+val nullSafeCast = nullSafeCastFunction(child.dataType, dataType, ctx)
+if (nullSafeCast != null) {
+  val eval = child.gen(ctx)
+  eval.code +
+castCode(ctx, eval.primitive, eval.isNull, ev.primitive, 
ev.isNull, dataType, nullSafeCast)
+} else {
+  super.genCode(ctx, ev)
+}
+  }
+
+  // three function arguments are: child.primitive, result.primitive and 
result.isNull
+  // it returns the code snippets to be put in null safe evaluation region
+  private[this] type CastFunction = (String, String, String) = String
+
+  private[this] def nullSafeCastFunction(
+  from: DataType,
+  to: DataType,
+  ctx: CodeGenContext): CastFunction = to match {
+
+case _ if from == NullType = (c, evPrim, evNull) = s$evNull = true;
+case _ if to == from = (c, evPrim, evNull) = s$evPrim = $c;
+case StringType = castToStringCode(from, ctx)
+case BinaryType = castToBinaryCode(from)
+case DateType = castToDateCode(from, ctx)
+case decimal: DecimalType = castToDecimalCode(from, decimal)
+case TimestampType = castToTimestampCode(from, ctx)
+case IntervalType = castToIntervalCode(from)
+case BooleanType = castToBooleanCode(from)
+case ByteType = castToByteCode(from)
+case ShortType = castToShortCode(from)
+case IntegerType = castToIntCode(from)
+case FloatType = castToFloatCode(from)
+case LongType = castToLongCode(from)
+case DoubleType = castToDoubleCode(from)
+
+case array: ArrayType = castArrayCode(from.asInstanceOf[ArrayType], 
array, ctx)
+case map: MapType = castMapCode(from.asInstanceOf[MapType], map, ctx)
+case struct: StructType = 
castStructCode(from.asInstanceOf[StructType], struct, ctx)
+case other = null
+  }
+
+  private[this] def castCode(ctx: CodeGenContext, childPrim: String, 
childNull: String,
+resultPrim: String, resultNull: String, resultType: DataType, cast: 
CastFunction): String = {
+s
+  boolean $resultNull = $childNull;
+  ${ctx.javaType(resultType)} $resultPrim = 
${ctx.defaultValue(resultType)};
+  if (!${childNull}) {
+${cast(childPrim, resultPrim, resultNull)}
+  }
+
+  }
+
+  private[this] def castToStringCode(from: DataType, ctx: CodeGenContext): 
CastFunction = {
+from match {
+  case BinaryType =
+(c, evPrim, evNull) = s$evPrim = UTF8String.fromBytes($c);
+  case DateType =
+(c, evPrim, evNull) = s$evPrim = UTF8String.fromString(
+  
org.apache.spark.sql.catalyst.util.DateTimeUtils.dateToString($c));
+  case TimestampType =
+(c, evPrim, evNull) = s$evPrim = UTF8String.fromString(
+  
org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c));
+  case _ =
+(c, evPrim, evNull) = s$evPrim = 
UTF8String.fromString(String.valueOf($c));
+}
+  }
+
+  private[this] def castToBinaryCode(from: DataType): CastFunction = from 
match {
+case StringType =
+  (c, evPrim, evNull) = s$evPrim = $c.getBytes();
+  }
+
+  private[this] def castToDateCode(
+  from: DataType,
+  ctx: CodeGenContext): CastFunction = from match {
+case StringType =
+  val intOpt = ctx.freshName(intOpt)
+  (c, evPrim, evNull) = s
+scala.OptionInteger $intOpt =
+  
org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDate($c);
+if ($intOpt.isDefined()) {
+  $evPrim = ((Integer) $intOpt.get()).intValue();
+} else {
+  $evNull = true;
+}
+   
+case TimestampType =
+  (c, evPrim, evNull) =
+s$evPrim = 
org.apache.spark.sql.catalyst.util.DateTimeUtils.millisToDays($c / 1000L);;
+case _ =
+  (c, evPrim, evNull) = s$evNull = true;
+  }
+
+  private[this] def changePrecision(d: String, decimalType: DecimalType,
+  evPrim: String, evNull: String): String = {
+decimalType match {
+  case DecimalType.Unlimited =
+s$evPrim = $d;
+  case DecimalType.Fixed(precision, scale) =
+s
+  if 

[GitHub] spark pull request: [SPARK-8231][SQL] Add array_contains

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7580#issuecomment-123531919
  
  [Test build #38011 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38011/console)
 for   PR 7580 at commit 
[`fe88d63`](https://github.com/apache/spark/commit/fe88d631cd9c27b7a2dd0871978536a5ba4f3d03).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ArrayContains(left: Expression, right: Expression) extends 
BinaryExpression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8231][SQL] Add array_contains

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7580#issuecomment-123531926
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9121][SparkR] Get rid of the warnings a...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7567#issuecomment-123532304
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8881] Fix algorithm for scheduling exec...

2015-07-21 Thread nishkamravi2
Github user nishkamravi2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/7274#discussion_r35176367
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -543,59 +544,108 @@ private[master] class Master(
* multiple executors from the same application may be launched on the 
same worker if the worker
* has enough cores and memory. Otherwise, each executor grabs all the 
cores available on the
* worker by default, in which case only one executor may be launched on 
each worker.
+   *
+   * It is important to allocate coresPerExecutor on each worker at a time 
(instead of 1 core
+   * at a time). Consider the following example: cluster has 4 workers 
with 16 cores each.
+   * User requests 3 executors (spark.cores.max = 48, spark.executor.cores 
= 16). If 1 core is
+   * allocated at a time, 12 cores from each worker would be assigned to 
each executor.
+   * Since 12  16, no executors would launch [SPARK-8881].
*/
-  private def startExecutorsOnWorkers(): Unit = {
-// Right now this is a very simple FIFO scheduler. We keep trying to 
fit in the first app
-// in the queue, then the second app, etc.
+  private[master] def scheduleExecutorsOnWorkers(
+  app: ApplicationInfo,
+  usableWorkers: Array[WorkerInfo],
+  spreadOutApps: Boolean): Array[Int] = {
+// If the number of cores per executor is not specified, then we can 
just schedule
+// 1 core at a time since we expect a single executor to be launched 
on each worker
+val coresPerExecutor = app.desc.coresPerExecutor.getOrElse(1)
+val memoryPerExecutor = app.desc.memoryPerExecutorMB
+val numUsable = usableWorkers.length
+val assignedCores = new Array[Int](numUsable) // Number of cores to 
give to each worker
+val assignedMemory = new Array[Int](numUsable) // Amount of memory to 
give to each worker
+var coresToAssign = math.min(app.coresLeft, 
usableWorkers.map(_.coresFree).sum)
+var pos = 0
+var lastCoresToAssign = coresToAssign
 if (spreadOutApps) {
-  // Try to spread out each app among all the workers, until it has 
all its cores
-  for (app - waitingApps if app.coresLeft  0) {
-val usableWorkers = workers.toArray.filter(_.state == 
WorkerState.ALIVE)
-  .filter(worker = worker.memoryFree = 
app.desc.memoryPerExecutorMB 
-worker.coresFree = app.desc.coresPerExecutor.getOrElse(1))
-  .sortBy(_.coresFree).reverse
-val numUsable = usableWorkers.length
-val assigned = new Array[Int](numUsable) // Number of cores to 
give on each node
-var toAssign = math.min(app.coresLeft, 
usableWorkers.map(_.coresFree).sum)
-var pos = 0
-while (toAssign  0) {
-  if (usableWorkers(pos).coresFree - assigned(pos)  0) {
-toAssign -= 1
-assigned(pos) += 1
-  }
-  pos = (pos + 1) % numUsable
+  // Try to spread out executors among workers (sparse scheduling)
+  while (coresToAssign  0) {
+if (usableWorkers(pos).coresFree - assignedCores(pos) = 
coresPerExecutor 
+usableWorkers(pos).memoryFree - assignedMemory(pos) = 
memoryPerExecutor) {
+  coresToAssign -= coresPerExecutor
+  assignedCores(pos) += coresPerExecutor
+  assignedMemory(pos) += memoryPerExecutor
 }
-// Now that we've decided how many cores to give on each node, 
let's actually give them
-for (pos - 0 until numUsable if assigned(pos)  0) {
-  allocateWorkerResourceToExecutors(app, assigned(pos), 
usableWorkers(pos))
+pos = (pos + 1) % numUsable
+if (pos == 0) {
+  if (lastCoresToAssign == coresToAssign) {
+return assignedCores
+  }
+  lastCoresToAssign = coresToAssign
 }
   }
 } else {
-  // Pack each app into as few workers as possible until we've 
assigned all its cores
-  for (worker - workers if worker.coresFree  0  worker.state == 
WorkerState.ALIVE) {
-for (app - waitingApps if app.coresLeft  0) {
-  allocateWorkerResourceToExecutors(app, app.coresLeft, worker)
+  // Pack executors into as few workers as possible (dense scheduling)
+  while (coresToAssign  0) {
+while (usableWorkers(pos).coresFree - assignedCores(pos) = 
coresPerExecutor 
+   usableWorkers(pos).memoryFree - assignedMemory(pos) = 
memoryPerExecutor 
+   coresToAssign  0) {
+  coresToAssign -= coresPerExecutor
+  assignedCores(pos) += coresPerExecutor
+  assignedMemory(pos) += memoryPerExecutor
+}
+pos = (pos 

[GitHub] spark pull request: [SPARK-9216][Streaming] Define KinesisBackedBl...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7578#issuecomment-123544537
  
  [Test build #38016 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38016/consoleFull)
 for   PR 7578 at commit 
[`575bdbc`](https://github.com/apache/spark/commit/575bdbcc5ccf766ecaf324623e5d1204f7634224).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9216][Streaming] Define KinesisBackedBl...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7578#issuecomment-123552016
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9216][Streaming] Define KinesisBackedBl...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7578#issuecomment-123552029
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9222] [MLlib] Make class instantiation ...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7573#issuecomment-123560226
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL][minor] remove literal in agg group expre...

2015-07-21 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/7583

[SQL][minor] remove literal in agg group expressions during analysis

a follow-up of https://github.com/apache/spark/pull/4169

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7583.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7583


commit a93293edc6b89b1be48686063bd12a25e72841d1
Author: Wenchen Fan cloud0...@outlook.com
Date:   2015-07-22T04:29:32Z

remove literal in agg group expressions during analysis




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >