[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-12 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/715#issuecomment-42799372
  
Looks good to me too !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-12 Thread aarondav
Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/715#discussion_r12515148
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -259,19 +238,30 @@ private[spark] class Executor(
 }
 
 case t: Throwable = {
-  val serviceTime = System.currentTimeMillis() - taskStart
-  val metrics = attemptedTask.flatMap(t = t.metrics)
-  for (m - metrics) {
-m.executorRunTime = serviceTime
-m.jvmGCTime = gcTime - startGCTime
-  }
-  val reason = ExceptionFailure(t.getClass.getName, t.toString, 
t.getStackTrace, metrics)
-  execBackend.statusUpdate(taskId, TaskState.FAILED, 
ser.serialize(reason))
+  // Attempt to exit cleanly by informing the driver of our 
failure.
+  // If anything goes wrong (or this was a fatal exception), we 
will delegate to
+  // the default uncaught exception handler, which will terminate 
the Executor.
+  try {
+logError(Exception in task ID  + taskId, t)
+
+val serviceTime = System.currentTimeMillis() - taskStart
+val metrics = attemptedTask.flatMap(t = t.metrics)
+for (m - metrics) {
+  m.executorRunTime = serviceTime
+  m.jvmGCTime = gcTime - startGCTime
+}
+val reason = ExceptionFailure(t.getClass.getName, t.toString, 
t.getStackTrace, metrics)
+execBackend.statusUpdate(taskId, TaskState.FAILED, 
ser.serialize(reason))
 
-  // TODO: Should we exit the whole executor here? On the one 
hand, the failed task may
-  // have left some weird state around depending on when the 
exception was thrown, but on
-  // the other hand, maybe we could detect that when future tasks 
fail and exit then.
-  logError(Exception in task ID  + taskId, t)
+// Don't forcibly exit unless the exception was inherently 
fatal, to avoid
+// stopping other tasks unnecessarily.
+if (Utils.isFatalError(t)) {
+  ExecutorUncaughtExceptionHandler.uncaughtException(t)
+}
+  } catch {
+case t2: Throwable =
+  ExecutorUncaughtExceptionHandler.uncaughtException(t2)
--- End diff --

Hmm, good point. I kind of like being explicit over relying on the globally 
set uncaught exception handler. I could be happy with getting rid of this and 
replacing it with a comment, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixed streaming examples docs to use run-examp...

2014-05-12 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/722#discussion_r12514182
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala
 ---
@@ -35,8 +35,8 @@ import org.apache.spark.SparkConf
  *   numThreads is the number of threads the kafka consumer should use
  *
  * Example:
- *`./bin/spark-submit examples.jar \
- *--class org.apache.spark.examples.streaming.KafkaWordCount local[2] 
zoo01,zoo02,zoo03 \
+ *`bin/run-example \
+ *org.apache.spark.examples.streaming.KafkaWordCount local[2] 
zoo01,zoo02,zoo03 \
--- End diff --

this is outdated. KafkaWordCount no longer takes in `master`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...

2014-05-12 Thread aarondav
Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/715#discussion_r12515152
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -259,19 +238,30 @@ private[spark] class Executor(
 }
 
 case t: Throwable = {
-  val serviceTime = System.currentTimeMillis() - taskStart
-  val metrics = attemptedTask.flatMap(t = t.metrics)
-  for (m - metrics) {
-m.executorRunTime = serviceTime
-m.jvmGCTime = gcTime - startGCTime
-  }
-  val reason = ExceptionFailure(t.getClass.getName, t.toString, 
t.getStackTrace, metrics)
-  execBackend.statusUpdate(taskId, TaskState.FAILED, 
ser.serialize(reason))
+  // Attempt to exit cleanly by informing the driver of our 
failure.
+  // If anything goes wrong (or this was a fatal exception), we 
will delegate to
+  // the default uncaught exception handler, which will terminate 
the Executor.
+  try {
+logError(Exception in task ID  + taskId, t)
+
+val serviceTime = System.currentTimeMillis() - taskStart
+val metrics = attemptedTask.flatMap(t = t.metrics)
+for (m - metrics) {
+  m.executorRunTime = serviceTime
+  m.jvmGCTime = gcTime - startGCTime
+}
+val reason = ExceptionFailure(t.getClass.getName, t.toString, 
t.getStackTrace, metrics)
+execBackend.statusUpdate(taskId, TaskState.FAILED, 
ser.serialize(reason))
 
-  // TODO: Should we exit the whole executor here? On the one 
hand, the failed task may
-  // have left some weird state around depending on when the 
exception was thrown, but on
-  // the other hand, maybe we could detect that when future tasks 
fail and exit then.
-  logError(Exception in task ID  + taskId, t)
+// Don't forcibly exit unless the exception was inherently 
fatal, to avoid
+// stopping other tasks unnecessarily.
+if (Utils.isFatalError(t)) {
+  ExecutorUncaughtExceptionHandler.uncaughtException(t)
+}
+  } catch {
+case t2: Throwable =
+  ExecutorUncaughtExceptionHandler.uncaughtException(t2)
--- End diff --

Actually just realized we basically already have that comment, just 
interpreted in a different way :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-12 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/736

use Iterator#size in RDD#count

in RDD#count, we used while loop to get the size of Iterator because that 
Iterator#size used a for loop, which was slightly slower in that version of 
Scala. But for now, the current version of scala will translate the for loop in 
Iterator#size into `foreach`, which uses while loop to iterate the Iterator. So 
we can use Iterator#size directly now.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/736.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #736


commit 1ebef72d8ea60a65645ad4c73ed03c9c41aa2c85
Author: Wenchen Fan(Cloud) cloud0...@gmail.com
Date:   2014-05-12T07:32:59Z

use Iterator#size in RDD#count




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: fix broken in link in python docs

2014-05-12 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/650#issuecomment-42752309
  
Thanks Andy - I've merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-1461] Deferred Expression Evaluation (s...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/446#issuecomment-42805496
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-1461] Deferred Expression Evaluation (s...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/446#issuecomment-42805513
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/736#issuecomment-42804113
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve build configuration � �

2014-05-12 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/590#issuecomment-42809856
  
@srowen 
Has been removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve build configuration � �

2014-05-12 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/590#issuecomment-42809563
  
@srowen 
In some cases,`commons-lang`  has multiple version dependency.
`fairscheduler.xml`,`hive-site.xml` should be ignored


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve build configuration � �

2014-05-12 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/590#issuecomment-42808115
  
This still has some changes that I don't know are intended. commons-lang 
2.5 should not be a dependency now. I don't know that conf XML files should be 
ignore by git?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-1461] Deferred Expression Evaluation (s...

2014-05-12 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/446#issuecomment-42805599
  
Rebase to the latest master, can you test it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve build configuration � �

2014-05-12 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/590#issuecomment-42809838
  
Where does Spark use commons-lang though? It uses commons-lang3. You would 
declare it as a dependency if it were used, or to resolve a version conflict, 
but is there evidence of the latter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve build configuration � �

2014-05-12 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/590#issuecomment-42810565
  
Yea, are they colliding in the assembly jar? or does Maven resolve to 2.5? 
the latter should be fine. If they're colliding, then I agree that we may have 
to manually manage it for tidiness, and state why in a comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1688] Propagate PySpark worker stderr t...

2014-05-12 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/603#discussion_r12390525
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -161,46 +131,38 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
 workerEnv.put(PYTHONPATH, pythonPath)
 daemon = pb.start()
 
-// Redirect the stderr to ours
-new Thread(stderr reader for  + pythonExec) {
-  setDaemon(true)
-  override def run() {
-scala.util.control.Exception.ignoring(classOf[IOException]) {
-  // FIXME: We copy the stream on the level of bytes to avoid 
encoding problems.
-  val in = daemon.getErrorStream
-  val buf = new Array[Byte](1024)
-  var len = in.read(buf)
-  while (len != -1) {
-System.err.write(buf, 0, len)
-len = in.read(buf)
-  }
-}
-  }
-}.start()
-
 val in = new DataInputStream(daemon.getInputStream)
 daemonPort = in.readInt()
 
-// Redirect further stdout output to our stderr
-new Thread(stdout reader for  + pythonExec) {
-  setDaemon(true)
-  override def run() {
-scala.util.control.Exception.ignoring(classOf[IOException]) {
-  // FIXME: We copy the stream on the level of bytes to avoid 
encoding problems.
-  val buf = new Array[Byte](1024)
-  var len = in.read(buf)
-  while (len != -1) {
-System.err.write(buf, 0, len)
-len = in.read(buf)
-  }
-}
-  }
-}.start()
+// Redirect worker stdout and stderr
+redirectWorkerStreams(in, daemon.getErrorStream)
+
   } catch {
-case e: Throwable = {
+case e: Throwable =
+
+  // If the daemon exists, wait for it to finish and get its stderr
+  val stderr = Option(daemon)
+.flatMap { d = Utils.getStderr(d, PROCESS_WAIT_TIMEOUT_MS) }
+.getOrElse()
+
   stopDaemon()
-  throw e
-}
+
+  if (stderr != ) {
+val formattedStderr = stderr.replace(\n, \n  )
+val errorMessage = s
+  |Error from python worker:
+  |  $formattedStderr
+  |PYTHONPATH was:
+  |  $pythonPath
+  |$e
+
+// Append error message from python daemon, but keep original 
stack trace
--- End diff --

We're not hiding the exception; all we're doing is tacking a message on top 
of it. Not exactly sure what you mean?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1745] Move interrupted flag from TaskCo...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/675#issuecomment-42462673
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14776/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Docs] Update YARN docs

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/701#issuecomment-42613881
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-1461] Deferred Expression Evaluation (s...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/446#issuecomment-42811549
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1544 Add support for deep decision trees...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/475#issuecomment-42459534
  
 Build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-1461] Deferred Expression Evaluation (s...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/446#issuecomment-42811551
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14896/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Use numpy directly for matrix multiply.

2014-05-12 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/687#issuecomment-42512204
  
LGTM. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1668: Add implicit preference as an opti...

2014-05-12 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/597#discussion_r12388603
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala ---
@@ -88,7 +92,27 @@ object MovieLensALS {
 
 val ratings = sc.textFile(params.input).map { line =
   val fields = line.split(::)
-  Rating(fields(0).toInt, fields(1).toInt, fields(2).toDouble)
+  if (params.implicitPrefs) {
+/**
--- End diff --

This is not JavaDoc, so please remove the last `*`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1754] [SQL] Add missing arithmetic DSL ...

2014-05-12 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/689#discussion_r12416177
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala
 ---
@@ -381,6 +381,30 @@ class ExpressionEvaluationSuite extends FunSuite {
 checkEvaluation(Add(c1, Literal(null, IntegerType)), null, row)
 checkEvaluation(Add(Literal(null, IntegerType), c2), null, row)
 checkEvaluation(Add(Literal(null, IntegerType), Literal(null, 
IntegerType)), null, row)
+
+checkEvaluation(-c1, -1, row)
+checkEvaluation(c1 + c2, 3, row)
+checkEvaluation(c1 - c2, -1, row)
+checkEvaluation(c1 * c2, 2, row)
+checkEvaluation(c1 / c2, 0, row)
+checkEvaluation(c1 % c2, 1, row)
+  }
+
+  test(BinaryPredicate) {
--- End diff --

Ah, I see, you are right.
I'll remove the test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-05-12 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/686#discussion_r12408196
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1055,10 +1055,16 @@ class DAGScheduler(
   // This is the only job that uses this stage, so fail the stage 
if it is running.
   val stage = stageIdToStage(stageId)
   if (runningStages.contains(stage)) {
-taskScheduler.cancelTasks(stageId, shouldInterruptThread)
-val stageInfo = stageToInfos(stage)
-stageInfo.stageFailed(failureReason)
-
listenerBus.post(SparkListenerStageCompleted(stageToInfos(stage)))
+try { // cancelTasks will fail if a SchedulerBackend does not 
implement killTask
+  taskScheduler.cancelTasks(stageId, shouldInterruptThread)
+} catch {
+  case e: UnsupportedOperationException =
+logInfo(sCould not cancel tasks for stage $stageId, e)
+} finally {
+  val stageInfo = stageToInfos(stage)
+  stageInfo.stageFailed(failureReason)
--- End diff --

Why do this part even when the SchedulerBackend doesn't support 
cancellation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve build configuration � �

2014-05-12 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/590#issuecomment-42811589
  
@srowen 
I will submit  a new Pull Request to solve this problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve build configuration � �

2014-05-12 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/590#issuecomment-42809975
  
```
[INFO] |  +- org.apache.hadoop:hadoop-client:jar:1.0.4:compile
[INFO] |  |  \- org.apache.hadoop:hadoop-core:jar:1.0.4:compile
[INFO] |  | +- xmlenc:xmlenc:jar:0.52:compile
[INFO] |  | +- org.apache.commons:commons-math:jar:2.1:compile
[INFO] |  | +- 
commons-configuration:commons-configuration:jar:1.6:compile
[INFO] |  | |  +- 
commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |  | |  +- commons-lang:commons-lang:jar:2.4:compile
[INFO] |  | |  +- commons-digester:commons-digester:jar:1.8:compile
[INFO] |  | |  |  \- 
commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] |  | |  \- 
commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
[INFO] |  | +- commons-el:commons-el:jar:1.0:compile
[INFO] |  | +- hsqldb:hsqldb:jar:1.8.0.10:compile
[INFO] |  | \- oro:oro:jar:2.0.8:compile
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...

2014-05-12 Thread larvaboy
GitHub user larvaboy opened a pull request:

https://github.com/apache/spark/pull/737

Implement ApproximateCountDistinct for SparkSql

Add the implementation for ApproximateCountDistinct to SparkSql. We use the 
HyperLogLog algorithm implemented in stream-lib, and do the count in two 
phases: 1) counting the number of distinct elements in each partitions, and 2) 
merge the HyperLogLog results from different partitions.

A simple serializer and test cases are added as well.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/larvaboy/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/737.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #737


commit 871abec814fa15e9507a98ca1b4718429781efd7
Author: larvaboy larva...@gmail.com
Date:   2014-05-10T23:20:10Z

Fix a couple of minor typos.

commit f73651c8dc23fdd83b1bfb35bda135449f84c5c5
Author: larvaboy larva...@gmail.com
Date:   2014-05-11T23:15:35Z

Fix a minor typo in the toString method of the Count case class.

commit 25b46046c5e7a772dd25f2bd7ae711c9dabd3959
Author: larvaboy larva...@gmail.com
Date:   2014-05-12T09:25:59Z

Add SparkSql serializer for HyperLogLog.

commit 80f1da4a48d3929272a4436aee26531f03eab4aa
Author: larvaboy larva...@gmail.com
Date:   2014-05-12T09:38:16Z

Add ApproximateCountDistinct aggregates and functions.

We use stream-lib's HyperLogLog to approximately count the number of
distinct elements in each partition, and merge the HyperLogLogs to
compute the final result.

If the expressions can not be successfully broken apart, we fall back to
the exact CountDistinct.

commit 234a270a5e6766ad41b4fb49a54d42ddb4643264
Author: larvaboy larva...@gmail.com
Date:   2014-05-12T04:58:54Z

Add the parser for the approximate count.

commit cf73b921cfa901ffb40c848ca1961378475fea1a
Author: larvaboy larva...@gmail.com
Date:   2014-05-12T05:05:15Z

Add a test case for count distinct and approximate count distinct.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/737#issuecomment-42817556
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: L-BFGS Documentation

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/702#issuecomment-42735249
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1786: Edge Partition Serialization

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/724#issuecomment-42756922
  
Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1786: Edge Partition Serialization

2014-05-12 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/724#issuecomment-42790068
  
Alright, sounds good. @ankurdave or @rxin can you take a quick look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1690] Tolerating empty elements when sa...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/644#issuecomment-42753678
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14871/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1774] Respect SparkSubmit --jars on YAR...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/710#issuecomment-42700238
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Typo fix: fetchting - fetching

2014-05-12 Thread ash211
GitHub user ash211 opened a pull request:

https://github.com/apache/spark/pull/680

Typo fix: fetchting - fetching



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ash211/spark patch-3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/680.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #680


commit 9ce3746c31f4ad66b5aa0f82d9fd59bb8e92e759
Author: Andrew Ash and...@andrewash.com
Date:   2014-05-07T08:46:16Z

Typo fix: fetchting - fetching




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/143#issuecomment-42703885
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1565 (Addendum): Replace `run-example` w...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/704#issuecomment-42635008
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1460] Returning SchemaRDD instead of no...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/448#issuecomment-42393186
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Added SparkGCE Script for Version 0.9.1

2014-05-12 Thread sigmoidanalytics
Github user sigmoidanalytics commented on the pull request:

https://github.com/apache/spark/pull/681#issuecomment-42833253
  
Did any of the admin had chance to check it out? Let me know if you want me 
to modify anything in it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Merge addWithoutResize and rehashIfNeeded into...

2014-05-12 Thread ArcherShao
GitHub user ArcherShao opened a pull request:

https://github.com/apache/spark/pull/738

Merge addWithoutResize and rehashIfNeeded into one function.

It will be more safety to add an element, users do not need to  the 
function rehashIfNeeded() after addWithoutResize().

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ArcherShao/sparksc branch_graphx

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/738.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #738


commit a8ee11664da05691ce72c79115a409e27a250a8e
Author: ArcherShao hunany...@gmail.com
Date:   2014-05-12T14:05:03Z

Merge addWithoutResize and rehashIfNeeded into one function.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Merge addWithoutResize and rehashIfNeeded into...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/738#issuecomment-42837187
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: L-BFGS Documentation

2014-05-12 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/702#issuecomment-42844008
  
LGTM. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1803 Replaced colon in filenames with a ...

2014-05-12 Thread sslavic
GitHub user sslavic opened a pull request:

https://github.com/apache/spark/pull/739

SPARK-1803 Replaced colon in filenames with a dash

This patch replaces colon in several filenames with dash to make these 
filenames Windows compatible.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sslavic/spark SPARK-1803

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/739.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #739


commit 9e1467dd04dda4bf7886c33870232fdbb0bf70bd
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T14:25:50Z

Replaced colon in file name with dash

This patch replaces colon in file name with dash to make file name Windows 
compatible.

commit 2fc785454fb8e45095bcae47aecda7905969573b
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T14:27:14Z

Replaced colon in file name with dash

This patch replaces colon in file name with dash to make file name Windows 
compatible.

commit 84f5d2fd168829474a28e0b3a4edb75067df1c25
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T14:28:17Z

Replaced colon in file name with dash

This patch replaces colon in file name with dash to make file name Windows 
compatible.

commit ece0507fa498a232f24ad1ce903536a976cf271b
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T14:29:27Z

Replaced colon in file name with dash

This patch replaces colon in file name with dash to make file name Windows 
compatible.

commit 028e48af7ff105150a3f375c0076677efe78a7ac
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T14:30:20Z

Replaced colon in file name with dash

This patch replaces colon in file name with dash to make file name Windows 
compatible.

commit b58512617957a9a06aa1fb288815aa82e1ce40d2
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T14:32:23Z

Replaced colon in file name with dash

This patch replaces colon in file name with dash to make file name Windows 
compatible.

commit d6a3e2cf957582be7615f9db8ebd661fadcc9c78
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T14:34:37Z

Replaced colon in file name with dash

This patch replaces colon in file name with dash to make file name Windows 
compatible.

commit 004f8bb0496eb65c57098ade96e685995e8cd660
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T14:36:14Z

Replaced colon in file name with dash

This patch replaces colon in file name with dash to make file name Windows 
compatible.

commit 4774580ab7ac8d609b6505fb27afab2c3d20e1d1
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T14:40:25Z

Replaced colon in file name with dash

This patch replaces colon in file name with dash to make file name Windows 
compatible.

commit 40a962103a946b9135b5645e25c486fe43287bb2
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T14:41:38Z

Replaced colon in file name with dash

This patch replaces colon in file name with dash to make file name Windows 
compatible.

commit 401d99eb733d889e3adb5a9874b52b163d7a17ce
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T14:42:27Z

Replaced colon in file name with dash

This patch replaces colon in file name with dash to make file name Windows 
compatible.

commit a49801fc820b6e58b2ce49cdff211c1fd16648a5
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T14:45:46Z

Replaced colon in file name with dash

This patch replaces colon in file name with dash to make file name Windows 
compatible.

commit c5b5083e1f6b8fc9c9cb586e0ee44c997e4c03e8
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T14:51:56Z

Replaced colon in file name with dash

This patch replaces colon in file name with dash to make file name Windows 
compatible.

commit 8f5bf7fdb3743ea02093c64bb966f7d3c2d4a8fe
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T14:54:19Z

Replaced colon in file name with dash

This patch replaces colon in file name with dash to make file name Windows 
compatible.

commit 1c5dfff57129aaf0566d15016712798328d9e069
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T14:58:37Z

Replaced colon in file name with dash

This patch replaces colon in file name with dash to make file name Windows 
compatible.

commit 2b12776b5b2381f2da695c3e3fa272c2a8a89a2a
Author: Stevo Slavić ssla...@gmail.com
Date:   2014-05-12T15:01:25Z

Fixed a typo in file name

This patch fixes a typo in file name - 'Partiton' is replaced with 
'Partition'.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, 

[GitHub] spark pull request: Add a function that can build an EdgePartition...

2014-05-12 Thread ArcherShao
GitHub user ArcherShao opened a pull request:

https://github.com/apache/spark/pull/740

Add a function that can build an EdgePartition faster.

If user can make sure every edge add by the order, use this function to 
build an EdgePartition will be faster. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ArcherShao/sparksc branch_graphx_Edge

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/740.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #740


commit f73072127e6f4d99e9d2c03a850053cecbb1e2a7
Author: ArcherShao hunany...@gmail.com
Date:   2014-05-12T15:03:19Z

Add a function that can build an EdgePartion faster.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add a function that can build an EdgePartition...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/740#issuecomment-42845140
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: 【SPARK-1779】add warning when memoryFractio...

2014-05-12 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/714#issuecomment-42729062
  
Thanks, this looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1772 Stop catching Throwable, let Execut...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/715#issuecomment-42858235
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixed streaming examples docs to use run-examp...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/722#issuecomment-42739222
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1

2014-05-12 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/741#issuecomment-42861465
  
Jenkins, test this please. Thanks! This is just in time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1

2014-05-12 Thread berngp
GitHub user berngp opened a pull request:

https://github.com/apache/spark/pull/741

SPARK-1806: Upgrade Mesos dependency to 0.18.1

Enabled Mesos (0.18.1) dependency with shaded protobuf

Why is this needed?
Avoids any protobuf version collision between Mesos and any other
dependency in Spark e.g. Hadoop HDFS 2.2+ or 1.0.4.

Ticket: https://issues.apache.org/jira/browse/SPARK-1806

* Should close https://issues.apache.org/jira/browse/SPARK-1433

Author berngp

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/berngp/spark feature/SPARK-1806

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/741.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #741


commit 5d706469a1accada1c43471003c773d4e0e9
Author: Bernardo Gomez Palacio bernardo.gomezpala...@gmail.com
Date:   2014-05-09T20:37:09Z

SPARK-1806: Upgrade Mesos dependency to 0.18.1

Enabled Mesos (0.18.1) dependency with shaded protobuf

Why is this needed?
Avoids any protobuf version collision between Mesos and any other
dependency in Spark e.g. Hadoop HDFS 2.2+ or 1.0.4.

Ticket: https://issues.apache.org/jira/browse/SPARK-1806

* Should close https://issues.apache.org/jira/browse/SPARK-1433

Author berngp




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1772 Stop catching Throwable, let Execut...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/715#issuecomment-42863004
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14897/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1519] Support minPartitions param of wh...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/697#issuecomment-42584240
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1772 Stop catching Throwable, let Execut...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/715#issuecomment-42863001
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1755] Respect SparkSubmit --name on YAR...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/699#issuecomment-42594094
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1565, update examples to be used with sp...

2014-05-12 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/552#issuecomment-42527532
  
@pwendell Done !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/717#issuecomment-42719784
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14859/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/741#issuecomment-42861305
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...

2014-05-12 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/737#issuecomment-42862270
  
This patch duplicates some logic that already exists elsewhere in Spark - 
would you mind updating it to use this class?: 

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/SerializableHyperLogLog.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1786: Edge Partition Serialization

2014-05-12 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/724#issuecomment-42865046
  
To fix this we can just add the 
`org.apache.spark.graphx.util.collection.PrimitiveKeyOpenHashMap` class here:
https://github.com/apache/spark/blob/master/project/MimaBuild.scala#L77

Joey - mind re-opening this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/741#issuecomment-42866824
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1772 Stop catching Throwable, let Execut...

2014-05-12 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/715#issuecomment-42866864
  
This passed all tests except for the (bogus) MIMA issue, so I'll merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/741#issuecomment-42866826
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14898/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1

2014-05-12 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/741#issuecomment-42867063
  
This is an incorrect failure due to a bad merge in master. I'm going to 
merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1

2014-05-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/741


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1620] Handle uncaught exceptions in fun...

2014-05-12 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/622#issuecomment-42861387
  
Yes, I'll do a little refactoring after 
https://github.com/apache/spark/pull/715 is merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1786: Reopening PR 724

2014-05-12 Thread jegonzal
GitHub user jegonzal opened a pull request:

https://github.com/apache/spark/pull/742

SPARK-1786: Reopening PR 724 

Addressing issue in MimaBuild.scala.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jegonzal/spark edge_partition_serialization

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/742.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #742


commit 67dac22884b098b72c277dbe6e344da796a5321c
Author: Joseph E. Gonzalez joseph.e.gonza...@gmail.com
Date:   2014-05-10T01:54:56Z

Making EdgePartition serializable.

commit bb7f548542d58ee6ac2dbdf868fea165fdf4f415
Author: Ankur Dave ankurd...@gmail.com
Date:   2014-05-10T03:09:48Z

Add failing test for EdgePartition Kryo serialization

commit b0a525a7f48a6b13cf8687e5e6d8ba3d3bf852f5
Author: Ankur Dave ankurd...@gmail.com
Date:   2014-05-10T03:12:38Z

Disable reference tracking to fix serialization test

commit d8b70fbca17534eb8f60e8feb4a9fdd5996fdcd8
Author: Joseph E. Gonzalez joseph.e.gonza...@gmail.com
Date:   2014-05-12T18:20:49Z

addressing missing exclusion in MimaBuild.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/686#issuecomment-42868032
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1786: Reopening PR 724

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/742#issuecomment-42869225
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/741#issuecomment-42861915
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...

2014-05-12 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/737#issuecomment-42869356
  
@pwendell, I don't think that will work as Spark SQL does its own 
serialization for shuffles sometimes using Kryo and I don't think that 
SerializableHyperLogLog works with Kryo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1786: Reopening PR 724

2014-05-12 Thread jegonzal
Github user jegonzal commented on the pull request:

https://github.com/apache/spark/pull/742#issuecomment-42868913
  
@ankurdave and @pwendell I am reopening the PR 724 to address the issue 
with MimaBuild.  I believe I made the required changes but how can I verify?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1786: Reopening PR 724

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/742#issuecomment-42869243
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...

2014-05-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/737#discussion_r12545901
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
 ---
@@ -269,6 +308,34 @@ case class CountFunction(expr: Expression, base: 
AggregateExpression) extends Ag
   override def eval(input: Row): Any = count
 }
 
+case class ApproxCountDistinctPartitionFunction(expr: Expression, base: 
AggregateExpression)
+extends AggregateFunction {
+  def this() = this(null, null) // Required for serialization.
+
+  private val hyperLogLog = new HyperLogLog(ApproxCountDistinct.RelativeSD)
+
+  override def update(input: Row): Unit = {
+val evaluatedExpr = expr.eval(input)
+Option(evaluatedExpr).foreach(hyperLogLog.offer(_))
--- End diff --

I'm normally all for the Option pattern, but in this case you are probably 
incurring more object allocations that we want to in the critical path of query 
execution.  I'd just use an `if` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add a function that can build an EdgePartition...

2014-05-12 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/740#issuecomment-42869409
  
Jenkins, add to whitelist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add a function that can build an EdgePartition...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/740#issuecomment-42869892
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...

2014-05-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/737#discussion_r12546014
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
 ---
@@ -269,6 +308,34 @@ case class CountFunction(expr: Expression, base: 
AggregateExpression) extends Ag
   override def eval(input: Row): Any = count
 }
 
+case class ApproxCountDistinctPartitionFunction(expr: Expression, base: 
AggregateExpression)
+extends AggregateFunction {
+  def this() = this(null, null) // Required for serialization.
+
+  private val hyperLogLog = new HyperLogLog(ApproxCountDistinct.RelativeSD)
+
+  override def update(input: Row): Unit = {
+val evaluatedExpr = expr.eval(input)
+Option(evaluatedExpr).foreach(hyperLogLog.offer(_))
+  }
+
+  override def eval(input: Row): Any = hyperLogLog
+}
+
+case class ApproxCountDistinctMergeFunction(expr: Expression, base: 
AggregateExpression)
+extends AggregateFunction {
+  def this() = this(null, null) // Required for serialization.
+
+  private val hyperLogLog = new HyperLogLog(ApproxCountDistinct.RelativeSD)
+
+  override def update(input: Row): Unit = {
+val evaluatedExpr = expr.eval(input)
+
Option(evaluatedExpr.asInstanceOf[HyperLogLog]).foreach(hyperLogLog.addAll(_))
--- End diff --

Will this ever be null?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...

2014-05-12 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/737#discussion_r12546829
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
 ---
@@ -166,10 +167,48 @@ case class CountDistinct(expressions: 
Seq[Expression]) extends AggregateExpressi
   override def references = expressions.flatMap(_.references).toSet
   override def nullable = false
   override def dataType = IntegerType
-  override def toString = sCOUNT(DISTINCT ${expressions.mkString(,)}})
+  override def toString = sCOUNT(DISTINCT ${expressions.mkString(,)})
   override def newInstance() = new CountDistinctFunction(expressions, this)
 }
 
+case class ApproxCountDistinctPartition(child: Expression)
+extends AggregateExpression with trees.UnaryNode[Expression] {
--- End diff --

style feedback: 2 space indenting for extends (We only do 4 space 
indenting for arguments)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...

2014-05-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/737#discussion_r12546769
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
 ---
@@ -166,10 +167,48 @@ case class CountDistinct(expressions: 
Seq[Expression]) extends AggregateExpressi
   override def references = expressions.flatMap(_.references).toSet
   override def nullable = false
   override def dataType = IntegerType
-  override def toString = sCOUNT(DISTINCT ${expressions.mkString(,)}})
+  override def toString = sCOUNT(DISTINCT ${expressions.mkString(,)})
   override def newInstance() = new CountDistinctFunction(expressions, this)
 }
 
+case class ApproxCountDistinctPartition(child: Expression)
+extends AggregateExpression with trees.UnaryNode[Expression] {
+  override def references = child.references
+  override def nullable = false
+  override def dataType = child.dataType
+  override def toString = sAPPROXIMATE COUNT(DISTINCT $child)
+  override def newInstance() = new 
ApproxCountDistinctPartitionFunction(child, this)
+}
+
+case class ApproxCountDistinctMerge(child: Expression)
+extends AggregateExpression with trees.UnaryNode[Expression] {
+  override def references = child.references
+  override def nullable = false
+  override def dataType = IntegerType
+  override def toString = sAPPROXIMATE COUNT(DISTINCT $child)
+  override def newInstance() = new ApproxCountDistinctMergeFunction(child, 
this)
+}
+
+object ApproxCountDistinct {
+  val RelativeSD = 0.05
--- End diff --

Having a default here is reasonable, but we should probably expose this to 
the user as well.  Maybe two versions in the parser?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add a function that can build an EdgePartition...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/740#issuecomment-42872474
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1786: Reopening PR 724

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/742#issuecomment-42873807
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14900/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1786: Reopening PR 724

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/742#issuecomment-42873806
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1786: Reopening PR 724

2014-05-12 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/742#discussion_r12547490
  
--- Diff: project/MimaBuild.scala ---
@@ -75,6 +75,8 @@ object MimaBuild {
   excludeSparkClass(rdd.ClassTags) ++
   excludeSparkClass(util.XORShiftRandom) ++
   excludeSparkClass(graphx.EdgeRDD) ++
+  
excludeSparkClass(graphx.util.collection.PrimitiveKeyOpenHashMap)
--- End diff --

You need to have `++` operators here... as is I think this has removed a 
bunch of the other MIMA checks :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/741#issuecomment-42861900
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add a function that can build an EdgePartition...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/740#issuecomment-42872475
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14901/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Typo: resond - respond

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/743#issuecomment-42875483
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Spark-1461] Deferred Expression Evaluation (s...

2014-05-12 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/446#issuecomment-42874850
  
@pwendell any idea what is wrong with MIMA?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1786: Reopening PR 724

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/742#issuecomment-42874869
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...

2014-05-12 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/737#issuecomment-42870675
  
Bypassing SerializableHyperLogLog has a few benefits:
1. Less memory usage because we don't need the wrapper.
2. Works with Spark SQL's internal serializer.
3. stream-lib will actually make HyperLogLog serializable next release - so 
SerializableHyperLogLog will be gone 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Typo: resond - respond

2014-05-12 Thread ash211
GitHub user ash211 opened a pull request:

https://github.com/apache/spark/pull/743

Typo: resond - respond



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ash211/spark patch-4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/743.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #743


commit c959f3be4ee85f41392875760612671f452bc843
Author: Andrew Ash and...@andrewash.com
Date:   2014-05-12T19:16:16Z

Typo: resond - respond




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Typo: resond - respond

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/743#issuecomment-42875468
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1772 Stop catching Throwable, let Execut...

2014-05-12 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/715#issuecomment-42858109
  
Addressed all comments and took RFC out of the PR title.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1786: Reopening PR 724

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/742#issuecomment-42879473
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...

2014-05-12 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/737#discussion_r12546858
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
 ---
@@ -166,10 +167,48 @@ case class CountDistinct(expressions: 
Seq[Expression]) extends AggregateExpressi
   override def references = expressions.flatMap(_.references).toSet
   override def nullable = false
   override def dataType = IntegerType
-  override def toString = sCOUNT(DISTINCT ${expressions.mkString(,)}})
+  override def toString = sCOUNT(DISTINCT ${expressions.mkString(,)})
   override def newInstance() = new CountDistinctFunction(expressions, this)
 }
 
+case class ApproxCountDistinctPartition(child: Expression)
+extends AggregateExpression with trees.UnaryNode[Expression] {
+  override def references = child.references
+  override def nullable = false
+  override def dataType = child.dataType
+  override def toString = sAPPROXIMATE COUNT(DISTINCT $child)
+  override def newInstance() = new 
ApproxCountDistinctPartitionFunction(child, this)
+}
+
+case class ApproxCountDistinctMerge(child: Expression)
+extends AggregateExpression with trees.UnaryNode[Expression] {
+  override def references = child.references
+  override def nullable = false
+  override def dataType = IntegerType
+  override def toString = sAPPROXIMATE COUNT(DISTINCT $child)
+  override def newInstance() = new ApproxCountDistinctMergeFunction(child, 
this)
+}
+
+object ApproxCountDistinct {
+  val RelativeSD = 0.05
+}
+
+case class ApproxCountDistinct(child: Expression)
+extends PartialAggregate with trees.UnaryNode[Expression] {
+  override def references = child.references
+  override def nullable = false
+  override def dataType = IntegerType
+  override def toString = sAPPROXIMATE COUNT(DISTINCT $child)
+
+  override def asPartial: SplitEvaluation = {
+val partialCount = Alias(ApproxCountDistinctPartition(child),
+ PartialApproxCountDistinct)()
--- End diff --

style feedback: just indent this line using 2 spaces instead of aligning 
them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1802. Audit dependency graph when Spark ...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/744#issuecomment-42879578
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1802. Audit dependency graph when Spark ...

2014-05-12 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/744

SPARK-1802. Audit dependency graph when Spark is built with -Phive

This initial commit resolves the conflicts in the Hive profiles as noted in 
https://issues.apache.org/jira/browse/SPARK-1802 . 

Most of the fix was to note that Hive drags in Avro, and so if the hive 
module depends on Spark's version of the `avro-*` dependencies, it will pull in 
our exclusions as needed too. But I found we need to copy some exclusions 
between the two Avro dependencies to get this right. And then had to squash 
some commons-logging intrusions.

This turned up another annoying find, that `hive-exec` is basically an 
assembly artifact that _also_ packages all of its transitive dependencies. 
This means the final assembly shows lots of collisions between itself and its 
dependencies, and even other project dependencies. I have a TODO to examine 
whether that is going to be a deal-breaker or not.

In the meantime I'm going to tack on a second commit to this PR that will 
also fix some similar, last collisions in the YARN profile.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-1802

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/744.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #744


commit a856604cfc67cb58146ada01fda6dbbb2515fa00
Author: Sean Owen so...@cloudera.com
Date:   2014-05-12T10:08:21Z

Resolve JAR version conflicts specific to Hive profile




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1772 Stop catching Throwable, let Execut...

2014-05-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/715


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1786: Reopening PR 724

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/742#issuecomment-42879475
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14902/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1802. Audit dependency graph when Spark ...

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/744#issuecomment-42879595
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1786: Reopening PR 724

2014-05-12 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/742#issuecomment-42881037
  
Thanks - I pulled this in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1798. Tests should clean up temp files

2014-05-12 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/732#discussion_r12550787
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/util/MLUtilsSuite.scala ---
@@ -90,7 +92,7 @@ class MLUtilsSuite extends FunSuite with 
LocalSparkContext {
 assert(multiclassPoints(1).label === -1.0)
 assert(multiclassPoints(2).label === -1.0)
 
-deleteQuietly(tempDir)
+Utils.deleteRecursively(tempDir)
--- End diff --

This changes the behavior to not swallow exceptions. This was added 
originally by @mengxr... is there a reason this squashes exceptions? 


https://github.com/jegonzal/spark/commit/98750a74#diff-006677f6b8222b96d21bc3e46ac9fe77R161


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   4   >