[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/715#issuecomment-42799372 Looks good to me too ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/715#discussion_r12515148 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -259,19 +238,30 @@ private[spark] class Executor( } case t: Throwable = { - val serviceTime = System.currentTimeMillis() - taskStart - val metrics = attemptedTask.flatMap(t = t.metrics) - for (m - metrics) { -m.executorRunTime = serviceTime -m.jvmGCTime = gcTime - startGCTime - } - val reason = ExceptionFailure(t.getClass.getName, t.toString, t.getStackTrace, metrics) - execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason)) + // Attempt to exit cleanly by informing the driver of our failure. + // If anything goes wrong (or this was a fatal exception), we will delegate to + // the default uncaught exception handler, which will terminate the Executor. + try { +logError(Exception in task ID + taskId, t) + +val serviceTime = System.currentTimeMillis() - taskStart +val metrics = attemptedTask.flatMap(t = t.metrics) +for (m - metrics) { + m.executorRunTime = serviceTime + m.jvmGCTime = gcTime - startGCTime +} +val reason = ExceptionFailure(t.getClass.getName, t.toString, t.getStackTrace, metrics) +execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason)) - // TODO: Should we exit the whole executor here? On the one hand, the failed task may - // have left some weird state around depending on when the exception was thrown, but on - // the other hand, maybe we could detect that when future tasks fail and exit then. - logError(Exception in task ID + taskId, t) +// Don't forcibly exit unless the exception was inherently fatal, to avoid +// stopping other tasks unnecessarily. +if (Utils.isFatalError(t)) { + ExecutorUncaughtExceptionHandler.uncaughtException(t) +} + } catch { +case t2: Throwable = + ExecutorUncaughtExceptionHandler.uncaughtException(t2) --- End diff -- Hmm, good point. I kind of like being explicit over relying on the globally set uncaught exception handler. I could be happy with getting rid of this and replacing it with a comment, though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fixed streaming examples docs to use run-examp...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/722#discussion_r12514182 --- Diff: examples/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala --- @@ -35,8 +35,8 @@ import org.apache.spark.SparkConf * numThreads is the number of threads the kafka consumer should use * * Example: - *`./bin/spark-submit examples.jar \ - *--class org.apache.spark.examples.streaming.KafkaWordCount local[2] zoo01,zoo02,zoo03 \ + *`bin/run-example \ + *org.apache.spark.examples.streaming.KafkaWordCount local[2] zoo01,zoo02,zoo03 \ --- End diff -- this is outdated. KafkaWordCount no longer takes in `master` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [RFC] SPARK-1772 Stop catching Throwable, let ...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/715#discussion_r12515152 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -259,19 +238,30 @@ private[spark] class Executor( } case t: Throwable = { - val serviceTime = System.currentTimeMillis() - taskStart - val metrics = attemptedTask.flatMap(t = t.metrics) - for (m - metrics) { -m.executorRunTime = serviceTime -m.jvmGCTime = gcTime - startGCTime - } - val reason = ExceptionFailure(t.getClass.getName, t.toString, t.getStackTrace, metrics) - execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason)) + // Attempt to exit cleanly by informing the driver of our failure. + // If anything goes wrong (or this was a fatal exception), we will delegate to + // the default uncaught exception handler, which will terminate the Executor. + try { +logError(Exception in task ID + taskId, t) + +val serviceTime = System.currentTimeMillis() - taskStart +val metrics = attemptedTask.flatMap(t = t.metrics) +for (m - metrics) { + m.executorRunTime = serviceTime + m.jvmGCTime = gcTime - startGCTime +} +val reason = ExceptionFailure(t.getClass.getName, t.toString, t.getStackTrace, metrics) +execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason)) - // TODO: Should we exit the whole executor here? On the one hand, the failed task may - // have left some weird state around depending on when the exception was thrown, but on - // the other hand, maybe we could detect that when future tasks fail and exit then. - logError(Exception in task ID + taskId, t) +// Don't forcibly exit unless the exception was inherently fatal, to avoid +// stopping other tasks unnecessarily. +if (Utils.isFatalError(t)) { + ExecutorUncaughtExceptionHandler.uncaughtException(t) +} + } catch { +case t2: Throwable = + ExecutorUncaughtExceptionHandler.uncaughtException(t2) --- End diff -- Actually just realized we basically already have that comment, just interpreted in a different way :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: use Iterator#size in RDD#count
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/736 use Iterator#size in RDD#count in RDD#count, we used while loop to get the size of Iterator because that Iterator#size used a for loop, which was slightly slower in that version of Scala. But for now, the current version of scala will translate the for loop in Iterator#size into `foreach`, which uses while loop to iterate the Iterator. So we can use Iterator#size directly now. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/736.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #736 commit 1ebef72d8ea60a65645ad4c73ed03c9c41aa2c85 Author: Wenchen Fan(Cloud) cloud0...@gmail.com Date: 2014-05-12T07:32:59Z use Iterator#size in RDD#count --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: fix broken in link in python docs
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/650#issuecomment-42752309 Thanks Andy - I've merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Spark-1461] Deferred Expression Evaluation (s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/446#issuecomment-42805496 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Spark-1461] Deferred Expression Evaluation (s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/446#issuecomment-42805513 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: use Iterator#size in RDD#count
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/736#issuecomment-42804113 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve build configuration � �
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/590#issuecomment-42809856 @srowen Has been removed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve build configuration � �
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/590#issuecomment-42809563 @srowen In some cases,`commons-lang` has multiple version dependency. `fairscheduler.xml`,`hive-site.xml` should be ignored --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve build configuration � �
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/590#issuecomment-42808115 This still has some changes that I don't know are intended. commons-lang 2.5 should not be a dependency now. I don't know that conf XML files should be ignore by git? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Spark-1461] Deferred Expression Evaluation (s...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/446#issuecomment-42805599 Rebase to the latest master, can you test it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve build configuration � �
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/590#issuecomment-42809838 Where does Spark use commons-lang though? It uses commons-lang3. You would declare it as a dependency if it were used, or to resolve a version conflict, but is there evidence of the latter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve build configuration � �
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/590#issuecomment-42810565 Yea, are they colliding in the assembly jar? or does Maven resolve to 2.5? the latter should be fine. If they're colliding, then I agree that we may have to manually manage it for tidiness, and state why in a comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1688] Propagate PySpark worker stderr t...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/603#discussion_r12390525 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -161,46 +131,38 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String workerEnv.put(PYTHONPATH, pythonPath) daemon = pb.start() -// Redirect the stderr to ours -new Thread(stderr reader for + pythonExec) { - setDaemon(true) - override def run() { -scala.util.control.Exception.ignoring(classOf[IOException]) { - // FIXME: We copy the stream on the level of bytes to avoid encoding problems. - val in = daemon.getErrorStream - val buf = new Array[Byte](1024) - var len = in.read(buf) - while (len != -1) { -System.err.write(buf, 0, len) -len = in.read(buf) - } -} - } -}.start() - val in = new DataInputStream(daemon.getInputStream) daemonPort = in.readInt() -// Redirect further stdout output to our stderr -new Thread(stdout reader for + pythonExec) { - setDaemon(true) - override def run() { -scala.util.control.Exception.ignoring(classOf[IOException]) { - // FIXME: We copy the stream on the level of bytes to avoid encoding problems. - val buf = new Array[Byte](1024) - var len = in.read(buf) - while (len != -1) { -System.err.write(buf, 0, len) -len = in.read(buf) - } -} - } -}.start() +// Redirect worker stdout and stderr +redirectWorkerStreams(in, daemon.getErrorStream) + } catch { -case e: Throwable = { +case e: Throwable = + + // If the daemon exists, wait for it to finish and get its stderr + val stderr = Option(daemon) +.flatMap { d = Utils.getStderr(d, PROCESS_WAIT_TIMEOUT_MS) } +.getOrElse() + stopDaemon() - throw e -} + + if (stderr != ) { +val formattedStderr = stderr.replace(\n, \n ) +val errorMessage = s + |Error from python worker: + | $formattedStderr + |PYTHONPATH was: + | $pythonPath + |$e + +// Append error message from python daemon, but keep original stack trace --- End diff -- We're not hiding the exception; all we're doing is tacking a message on top of it. Not exactly sure what you mean? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1745] Move interrupted flag from TaskCo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/675#issuecomment-42462673 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14776/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Docs] Update YARN docs
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/701#issuecomment-42613881 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Spark-1461] Deferred Expression Evaluation (s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/446#issuecomment-42811549 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1544 Add support for deep decision trees...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/475#issuecomment-42459534 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Spark-1461] Deferred Expression Evaluation (s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/446#issuecomment-42811551 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14896/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Use numpy directly for matrix multiply.
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/687#issuecomment-42512204 LGTM. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1668: Add implicit preference as an opti...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/597#discussion_r12388603 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala --- @@ -88,7 +92,27 @@ object MovieLensALS { val ratings = sc.textFile(params.input).map { line = val fields = line.split(::) - Rating(fields(0).toInt, fields(1).toInt, fields(2).toDouble) + if (params.implicitPrefs) { +/** --- End diff -- This is not JavaDoc, so please remove the last `*`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1754] [SQL] Add missing arithmetic DSL ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/689#discussion_r12416177 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala --- @@ -381,6 +381,30 @@ class ExpressionEvaluationSuite extends FunSuite { checkEvaluation(Add(c1, Literal(null, IntegerType)), null, row) checkEvaluation(Add(Literal(null, IntegerType), c2), null, row) checkEvaluation(Add(Literal(null, IntegerType), Literal(null, IntegerType)), null, row) + +checkEvaluation(-c1, -1, row) +checkEvaluation(c1 + c2, 3, row) +checkEvaluation(c1 - c2, -1, row) +checkEvaluation(c1 * c2, 2, row) +checkEvaluation(c1 / c2, 0, row) +checkEvaluation(c1 % c2, 1, row) + } + + test(BinaryPredicate) { --- End diff -- Ah, I see, you are right. I'll remove the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/686#discussion_r12408196 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1055,10 +1055,16 @@ class DAGScheduler( // This is the only job that uses this stage, so fail the stage if it is running. val stage = stageIdToStage(stageId) if (runningStages.contains(stage)) { -taskScheduler.cancelTasks(stageId, shouldInterruptThread) -val stageInfo = stageToInfos(stage) -stageInfo.stageFailed(failureReason) - listenerBus.post(SparkListenerStageCompleted(stageToInfos(stage))) +try { // cancelTasks will fail if a SchedulerBackend does not implement killTask + taskScheduler.cancelTasks(stageId, shouldInterruptThread) +} catch { + case e: UnsupportedOperationException = +logInfo(sCould not cancel tasks for stage $stageId, e) +} finally { + val stageInfo = stageToInfos(stage) + stageInfo.stageFailed(failureReason) --- End diff -- Why do this part even when the SchedulerBackend doesn't support cancellation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve build configuration � �
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/590#issuecomment-42811589 @srowen I will submit a new Pull Request to solve this problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve build configuration � �
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/590#issuecomment-42809975 ``` [INFO] | +- org.apache.hadoop:hadoop-client:jar:1.0.4:compile [INFO] | | \- org.apache.hadoop:hadoop-core:jar:1.0.4:compile [INFO] | | +- xmlenc:xmlenc:jar:0.52:compile [INFO] | | +- org.apache.commons:commons-math:jar:2.1:compile [INFO] | | +- commons-configuration:commons-configuration:jar:1.6:compile [INFO] | | | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | | | +- commons-lang:commons-lang:jar:2.4:compile [INFO] | | | +- commons-digester:commons-digester:jar:1.8:compile [INFO] | | | | \- commons-beanutils:commons-beanutils:jar:1.7.0:compile [INFO] | | | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile [INFO] | | +- commons-el:commons-el:jar:1.0:compile [INFO] | | +- hsqldb:hsqldb:jar:1.8.0.10:compile [INFO] | | \- oro:oro:jar:2.0.8:compile ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...
GitHub user larvaboy opened a pull request: https://github.com/apache/spark/pull/737 Implement ApproximateCountDistinct for SparkSql Add the implementation for ApproximateCountDistinct to SparkSql. We use the HyperLogLog algorithm implemented in stream-lib, and do the count in two phases: 1) counting the number of distinct elements in each partitions, and 2) merge the HyperLogLog results from different partitions. A simple serializer and test cases are added as well. You can merge this pull request into a Git repository by running: $ git pull https://github.com/larvaboy/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/737.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #737 commit 871abec814fa15e9507a98ca1b4718429781efd7 Author: larvaboy larva...@gmail.com Date: 2014-05-10T23:20:10Z Fix a couple of minor typos. commit f73651c8dc23fdd83b1bfb35bda135449f84c5c5 Author: larvaboy larva...@gmail.com Date: 2014-05-11T23:15:35Z Fix a minor typo in the toString method of the Count case class. commit 25b46046c5e7a772dd25f2bd7ae711c9dabd3959 Author: larvaboy larva...@gmail.com Date: 2014-05-12T09:25:59Z Add SparkSql serializer for HyperLogLog. commit 80f1da4a48d3929272a4436aee26531f03eab4aa Author: larvaboy larva...@gmail.com Date: 2014-05-12T09:38:16Z Add ApproximateCountDistinct aggregates and functions. We use stream-lib's HyperLogLog to approximately count the number of distinct elements in each partition, and merge the HyperLogLogs to compute the final result. If the expressions can not be successfully broken apart, we fall back to the exact CountDistinct. commit 234a270a5e6766ad41b4fb49a54d42ddb4643264 Author: larvaboy larva...@gmail.com Date: 2014-05-12T04:58:54Z Add the parser for the approximate count. commit cf73b921cfa901ffb40c848ca1961378475fea1a Author: larvaboy larva...@gmail.com Date: 2014-05-12T05:05:15Z Add a test case for count distinct and approximate count distinct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/737#issuecomment-42817556 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: L-BFGS Documentation
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/702#issuecomment-42735249 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1786: Edge Partition Serialization
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/724#issuecomment-42756922 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1786: Edge Partition Serialization
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/724#issuecomment-42790068 Alright, sounds good. @ankurdave or @rxin can you take a quick look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1690] Tolerating empty elements when sa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/644#issuecomment-42753678 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14871/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1774] Respect SparkSubmit --jars on YAR...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/710#issuecomment-42700238 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Typo fix: fetchting - fetching
GitHub user ash211 opened a pull request: https://github.com/apache/spark/pull/680 Typo fix: fetchting - fetching You can merge this pull request into a Git repository by running: $ git pull https://github.com/ash211/spark patch-3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/680.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #680 commit 9ce3746c31f4ad66b5aa0f82d9fd59bb8e92e759 Author: Andrew Ash and...@andrewash.com Date: 2014-05-07T08:46:16Z Typo fix: fetchting - fetching --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-897: preemptively serialize closures
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-42703885 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1565 (Addendum): Replace `run-example` w...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/704#issuecomment-42635008 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1460] Returning SchemaRDD instead of no...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/448#issuecomment-42393186 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Added SparkGCE Script for Version 0.9.1
Github user sigmoidanalytics commented on the pull request: https://github.com/apache/spark/pull/681#issuecomment-42833253 Did any of the admin had chance to check it out? Let me know if you want me to modify anything in it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Merge addWithoutResize and rehashIfNeeded into...
GitHub user ArcherShao opened a pull request: https://github.com/apache/spark/pull/738 Merge addWithoutResize and rehashIfNeeded into one function. It will be more safety to add an element, users do not need to the function rehashIfNeeded() after addWithoutResize(). You can merge this pull request into a Git repository by running: $ git pull https://github.com/ArcherShao/sparksc branch_graphx Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/738.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #738 commit a8ee11664da05691ce72c79115a409e27a250a8e Author: ArcherShao hunany...@gmail.com Date: 2014-05-12T14:05:03Z Merge addWithoutResize and rehashIfNeeded into one function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Merge addWithoutResize and rehashIfNeeded into...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/738#issuecomment-42837187 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: L-BFGS Documentation
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/702#issuecomment-42844008 LGTM. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1803 Replaced colon in filenames with a ...
GitHub user sslavic opened a pull request: https://github.com/apache/spark/pull/739 SPARK-1803 Replaced colon in filenames with a dash This patch replaces colon in several filenames with dash to make these filenames Windows compatible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sslavic/spark SPARK-1803 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/739.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #739 commit 9e1467dd04dda4bf7886c33870232fdbb0bf70bd Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T14:25:50Z Replaced colon in file name with dash This patch replaces colon in file name with dash to make file name Windows compatible. commit 2fc785454fb8e45095bcae47aecda7905969573b Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T14:27:14Z Replaced colon in file name with dash This patch replaces colon in file name with dash to make file name Windows compatible. commit 84f5d2fd168829474a28e0b3a4edb75067df1c25 Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T14:28:17Z Replaced colon in file name with dash This patch replaces colon in file name with dash to make file name Windows compatible. commit ece0507fa498a232f24ad1ce903536a976cf271b Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T14:29:27Z Replaced colon in file name with dash This patch replaces colon in file name with dash to make file name Windows compatible. commit 028e48af7ff105150a3f375c0076677efe78a7ac Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T14:30:20Z Replaced colon in file name with dash This patch replaces colon in file name with dash to make file name Windows compatible. commit b58512617957a9a06aa1fb288815aa82e1ce40d2 Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T14:32:23Z Replaced colon in file name with dash This patch replaces colon in file name with dash to make file name Windows compatible. commit d6a3e2cf957582be7615f9db8ebd661fadcc9c78 Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T14:34:37Z Replaced colon in file name with dash This patch replaces colon in file name with dash to make file name Windows compatible. commit 004f8bb0496eb65c57098ade96e685995e8cd660 Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T14:36:14Z Replaced colon in file name with dash This patch replaces colon in file name with dash to make file name Windows compatible. commit 4774580ab7ac8d609b6505fb27afab2c3d20e1d1 Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T14:40:25Z Replaced colon in file name with dash This patch replaces colon in file name with dash to make file name Windows compatible. commit 40a962103a946b9135b5645e25c486fe43287bb2 Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T14:41:38Z Replaced colon in file name with dash This patch replaces colon in file name with dash to make file name Windows compatible. commit 401d99eb733d889e3adb5a9874b52b163d7a17ce Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T14:42:27Z Replaced colon in file name with dash This patch replaces colon in file name with dash to make file name Windows compatible. commit a49801fc820b6e58b2ce49cdff211c1fd16648a5 Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T14:45:46Z Replaced colon in file name with dash This patch replaces colon in file name with dash to make file name Windows compatible. commit c5b5083e1f6b8fc9c9cb586e0ee44c997e4c03e8 Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T14:51:56Z Replaced colon in file name with dash This patch replaces colon in file name with dash to make file name Windows compatible. commit 8f5bf7fdb3743ea02093c64bb966f7d3c2d4a8fe Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T14:54:19Z Replaced colon in file name with dash This patch replaces colon in file name with dash to make file name Windows compatible. commit 1c5dfff57129aaf0566d15016712798328d9e069 Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T14:58:37Z Replaced colon in file name with dash This patch replaces colon in file name with dash to make file name Windows compatible. commit 2b12776b5b2381f2da695c3e3fa272c2a8a89a2a Author: Stevo SlaviÄ ssla...@gmail.com Date: 2014-05-12T15:01:25Z Fixed a typo in file name This patch fixes a typo in file name - 'Partiton' is replaced with 'Partition'. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working,
[GitHub] spark pull request: Add a function that can build an EdgePartition...
GitHub user ArcherShao opened a pull request: https://github.com/apache/spark/pull/740 Add a function that can build an EdgePartition faster. If user can make sure every edge add by the order, use this function to build an EdgePartition will be faster. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ArcherShao/sparksc branch_graphx_Edge Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/740.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #740 commit f73072127e6f4d99e9d2c03a850053cecbb1e2a7 Author: ArcherShao hunany...@gmail.com Date: 2014-05-12T15:03:19Z Add a function that can build an EdgePartion faster. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add a function that can build an EdgePartition...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/740#issuecomment-42845140 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: 【SPARK-1779】add warning when memoryFractio...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/714#issuecomment-42729062 Thanks, this looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1772 Stop catching Throwable, let Execut...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/715#issuecomment-42858235 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fixed streaming examples docs to use run-examp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/722#issuecomment-42739222 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/741#issuecomment-42861465 Jenkins, test this please. Thanks! This is just in time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1
GitHub user berngp opened a pull request: https://github.com/apache/spark/pull/741 SPARK-1806: Upgrade Mesos dependency to 0.18.1 Enabled Mesos (0.18.1) dependency with shaded protobuf Why is this needed? Avoids any protobuf version collision between Mesos and any other dependency in Spark e.g. Hadoop HDFS 2.2+ or 1.0.4. Ticket: https://issues.apache.org/jira/browse/SPARK-1806 * Should close https://issues.apache.org/jira/browse/SPARK-1433 Author berngp You can merge this pull request into a Git repository by running: $ git pull https://github.com/berngp/spark feature/SPARK-1806 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/741.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #741 commit 5d706469a1accada1c43471003c773d4e0e9 Author: Bernardo Gomez Palacio bernardo.gomezpala...@gmail.com Date: 2014-05-09T20:37:09Z SPARK-1806: Upgrade Mesos dependency to 0.18.1 Enabled Mesos (0.18.1) dependency with shaded protobuf Why is this needed? Avoids any protobuf version collision between Mesos and any other dependency in Spark e.g. Hadoop HDFS 2.2+ or 1.0.4. Ticket: https://issues.apache.org/jira/browse/SPARK-1806 * Should close https://issues.apache.org/jira/browse/SPARK-1433 Author berngp --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1772 Stop catching Throwable, let Execut...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/715#issuecomment-42863004 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14897/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1519] Support minPartitions param of wh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/697#issuecomment-42584240 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1772 Stop catching Throwable, let Execut...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/715#issuecomment-42863001 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1755] Respect SparkSubmit --name on YAR...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/699#issuecomment-42594094 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1565, update examples to be used with sp...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/552#issuecomment-42527532 @pwendell Done ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/717#issuecomment-42719784 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14859/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/741#issuecomment-42861305 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/737#issuecomment-42862270 This patch duplicates some logic that already exists elsewhere in Spark - would you mind updating it to use this class?: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/SerializableHyperLogLog.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1786: Edge Partition Serialization
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/724#issuecomment-42865046 To fix this we can just add the `org.apache.spark.graphx.util.collection.PrimitiveKeyOpenHashMap` class here: https://github.com/apache/spark/blob/master/project/MimaBuild.scala#L77 Joey - mind re-opening this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/741#issuecomment-42866824 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1772 Stop catching Throwable, let Execut...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/715#issuecomment-42866864 This passed all tests except for the (bogus) MIMA issue, so I'll merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/741#issuecomment-42866826 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14898/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/741#issuecomment-42867063 This is an incorrect failure due to a bad merge in master. I'm going to merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/741 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1620] Handle uncaught exceptions in fun...
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/622#issuecomment-42861387 Yes, I'll do a little refactoring after https://github.com/apache/spark/pull/715 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1786: Reopening PR 724
GitHub user jegonzal opened a pull request: https://github.com/apache/spark/pull/742 SPARK-1786: Reopening PR 724 Addressing issue in MimaBuild.scala. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jegonzal/spark edge_partition_serialization Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/742.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #742 commit 67dac22884b098b72c277dbe6e344da796a5321c Author: Joseph E. Gonzalez joseph.e.gonza...@gmail.com Date: 2014-05-10T01:54:56Z Making EdgePartition serializable. commit bb7f548542d58ee6ac2dbdf868fea165fdf4f415 Author: Ankur Dave ankurd...@gmail.com Date: 2014-05-10T03:09:48Z Add failing test for EdgePartition Kryo serialization commit b0a525a7f48a6b13cf8687e5e6d8ba3d3bf852f5 Author: Ankur Dave ankurd...@gmail.com Date: 2014-05-10T03:12:38Z Disable reference tracking to fix serialization test commit d8b70fbca17534eb8f60e8feb4a9fdd5996fdcd8 Author: Joseph E. Gonzalez joseph.e.gonza...@gmail.com Date: 2014-05-12T18:20:49Z addressing missing exclusion in MimaBuild.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/686#issuecomment-42868032 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1786: Reopening PR 724
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/742#issuecomment-42869225 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/741#issuecomment-42861915 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/737#issuecomment-42869356 @pwendell, I don't think that will work as Spark SQL does its own serialization for shuffles sometimes using Kryo and I don't think that SerializableHyperLogLog works with Kryo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1786: Reopening PR 724
Github user jegonzal commented on the pull request: https://github.com/apache/spark/pull/742#issuecomment-42868913 @ankurdave and @pwendell I am reopening the PR 724 to address the issue with MimaBuild. I believe I made the required changes but how can I verify? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1786: Reopening PR 724
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/742#issuecomment-42869243 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/737#discussion_r12545901 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala --- @@ -269,6 +308,34 @@ case class CountFunction(expr: Expression, base: AggregateExpression) extends Ag override def eval(input: Row): Any = count } +case class ApproxCountDistinctPartitionFunction(expr: Expression, base: AggregateExpression) +extends AggregateFunction { + def this() = this(null, null) // Required for serialization. + + private val hyperLogLog = new HyperLogLog(ApproxCountDistinct.RelativeSD) + + override def update(input: Row): Unit = { +val evaluatedExpr = expr.eval(input) +Option(evaluatedExpr).foreach(hyperLogLog.offer(_)) --- End diff -- I'm normally all for the Option pattern, but in this case you are probably incurring more object allocations that we want to in the critical path of query execution. I'd just use an `if` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add a function that can build an EdgePartition...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/740#issuecomment-42869409 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add a function that can build an EdgePartition...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/740#issuecomment-42869892 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/737#discussion_r12546014 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala --- @@ -269,6 +308,34 @@ case class CountFunction(expr: Expression, base: AggregateExpression) extends Ag override def eval(input: Row): Any = count } +case class ApproxCountDistinctPartitionFunction(expr: Expression, base: AggregateExpression) +extends AggregateFunction { + def this() = this(null, null) // Required for serialization. + + private val hyperLogLog = new HyperLogLog(ApproxCountDistinct.RelativeSD) + + override def update(input: Row): Unit = { +val evaluatedExpr = expr.eval(input) +Option(evaluatedExpr).foreach(hyperLogLog.offer(_)) + } + + override def eval(input: Row): Any = hyperLogLog +} + +case class ApproxCountDistinctMergeFunction(expr: Expression, base: AggregateExpression) +extends AggregateFunction { + def this() = this(null, null) // Required for serialization. + + private val hyperLogLog = new HyperLogLog(ApproxCountDistinct.RelativeSD) + + override def update(input: Row): Unit = { +val evaluatedExpr = expr.eval(input) + Option(evaluatedExpr.asInstanceOf[HyperLogLog]).foreach(hyperLogLog.addAll(_)) --- End diff -- Will this ever be null? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/737#discussion_r12546829 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala --- @@ -166,10 +167,48 @@ case class CountDistinct(expressions: Seq[Expression]) extends AggregateExpressi override def references = expressions.flatMap(_.references).toSet override def nullable = false override def dataType = IntegerType - override def toString = sCOUNT(DISTINCT ${expressions.mkString(,)}}) + override def toString = sCOUNT(DISTINCT ${expressions.mkString(,)}) override def newInstance() = new CountDistinctFunction(expressions, this) } +case class ApproxCountDistinctPartition(child: Expression) +extends AggregateExpression with trees.UnaryNode[Expression] { --- End diff -- style feedback: 2 space indenting for extends (We only do 4 space indenting for arguments) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/737#discussion_r12546769 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala --- @@ -166,10 +167,48 @@ case class CountDistinct(expressions: Seq[Expression]) extends AggregateExpressi override def references = expressions.flatMap(_.references).toSet override def nullable = false override def dataType = IntegerType - override def toString = sCOUNT(DISTINCT ${expressions.mkString(,)}}) + override def toString = sCOUNT(DISTINCT ${expressions.mkString(,)}) override def newInstance() = new CountDistinctFunction(expressions, this) } +case class ApproxCountDistinctPartition(child: Expression) +extends AggregateExpression with trees.UnaryNode[Expression] { + override def references = child.references + override def nullable = false + override def dataType = child.dataType + override def toString = sAPPROXIMATE COUNT(DISTINCT $child) + override def newInstance() = new ApproxCountDistinctPartitionFunction(child, this) +} + +case class ApproxCountDistinctMerge(child: Expression) +extends AggregateExpression with trees.UnaryNode[Expression] { + override def references = child.references + override def nullable = false + override def dataType = IntegerType + override def toString = sAPPROXIMATE COUNT(DISTINCT $child) + override def newInstance() = new ApproxCountDistinctMergeFunction(child, this) +} + +object ApproxCountDistinct { + val RelativeSD = 0.05 --- End diff -- Having a default here is reasonable, but we should probably expose this to the user as well. Maybe two versions in the parser? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add a function that can build an EdgePartition...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/740#issuecomment-42872474 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1786: Reopening PR 724
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/742#issuecomment-42873807 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14900/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1786: Reopening PR 724
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/742#issuecomment-42873806 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1786: Reopening PR 724
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/742#discussion_r12547490 --- Diff: project/MimaBuild.scala --- @@ -75,6 +75,8 @@ object MimaBuild { excludeSparkClass(rdd.ClassTags) ++ excludeSparkClass(util.XORShiftRandom) ++ excludeSparkClass(graphx.EdgeRDD) ++ + excludeSparkClass(graphx.util.collection.PrimitiveKeyOpenHashMap) --- End diff -- You need to have `++` operators here... as is I think this has removed a bunch of the other MIMA checks :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1806: Upgrade Mesos dependency to 0.18.1
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/741#issuecomment-42861900 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add a function that can build an EdgePartition...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/740#issuecomment-42872475 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14901/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Typo: resond - respond
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/743#issuecomment-42875483 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Spark-1461] Deferred Expression Evaluation (s...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/446#issuecomment-42874850 @pwendell any idea what is wrong with MIMA? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1786: Reopening PR 724
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/742#issuecomment-42874869 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/737#issuecomment-42870675 Bypassing SerializableHyperLogLog has a few benefits: 1. Less memory usage because we don't need the wrapper. 2. Works with Spark SQL's internal serializer. 3. stream-lib will actually make HyperLogLog serializable next release - so SerializableHyperLogLog will be gone --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Typo: resond - respond
GitHub user ash211 opened a pull request: https://github.com/apache/spark/pull/743 Typo: resond - respond You can merge this pull request into a Git repository by running: $ git pull https://github.com/ash211/spark patch-4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/743.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #743 commit c959f3be4ee85f41392875760612671f452bc843 Author: Andrew Ash and...@andrewash.com Date: 2014-05-12T19:16:16Z Typo: resond - respond --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Typo: resond - respond
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/743#issuecomment-42875468 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1772 Stop catching Throwable, let Execut...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/715#issuecomment-42858109 Addressed all comments and took RFC out of the PR title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1786: Reopening PR 724
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/742#issuecomment-42879473 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Implement ApproximateCountDistinct for SparkSq...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/737#discussion_r12546858 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala --- @@ -166,10 +167,48 @@ case class CountDistinct(expressions: Seq[Expression]) extends AggregateExpressi override def references = expressions.flatMap(_.references).toSet override def nullable = false override def dataType = IntegerType - override def toString = sCOUNT(DISTINCT ${expressions.mkString(,)}}) + override def toString = sCOUNT(DISTINCT ${expressions.mkString(,)}) override def newInstance() = new CountDistinctFunction(expressions, this) } +case class ApproxCountDistinctPartition(child: Expression) +extends AggregateExpression with trees.UnaryNode[Expression] { + override def references = child.references + override def nullable = false + override def dataType = child.dataType + override def toString = sAPPROXIMATE COUNT(DISTINCT $child) + override def newInstance() = new ApproxCountDistinctPartitionFunction(child, this) +} + +case class ApproxCountDistinctMerge(child: Expression) +extends AggregateExpression with trees.UnaryNode[Expression] { + override def references = child.references + override def nullable = false + override def dataType = IntegerType + override def toString = sAPPROXIMATE COUNT(DISTINCT $child) + override def newInstance() = new ApproxCountDistinctMergeFunction(child, this) +} + +object ApproxCountDistinct { + val RelativeSD = 0.05 +} + +case class ApproxCountDistinct(child: Expression) +extends PartialAggregate with trees.UnaryNode[Expression] { + override def references = child.references + override def nullable = false + override def dataType = IntegerType + override def toString = sAPPROXIMATE COUNT(DISTINCT $child) + + override def asPartial: SplitEvaluation = { +val partialCount = Alias(ApproxCountDistinctPartition(child), + PartialApproxCountDistinct)() --- End diff -- style feedback: just indent this line using 2 spaces instead of aligning them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1802. Audit dependency graph when Spark ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/744#issuecomment-42879578 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1802. Audit dependency graph when Spark ...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/744 SPARK-1802. Audit dependency graph when Spark is built with -Phive This initial commit resolves the conflicts in the Hive profiles as noted in https://issues.apache.org/jira/browse/SPARK-1802 . Most of the fix was to note that Hive drags in Avro, and so if the hive module depends on Spark's version of the `avro-*` dependencies, it will pull in our exclusions as needed too. But I found we need to copy some exclusions between the two Avro dependencies to get this right. And then had to squash some commons-logging intrusions. This turned up another annoying find, that `hive-exec` is basically an assembly artifact that _also_ packages all of its transitive dependencies. This means the final assembly shows lots of collisions between itself and its dependencies, and even other project dependencies. I have a TODO to examine whether that is going to be a deal-breaker or not. In the meantime I'm going to tack on a second commit to this PR that will also fix some similar, last collisions in the YARN profile. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-1802 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/744.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #744 commit a856604cfc67cb58146ada01fda6dbbb2515fa00 Author: Sean Owen so...@cloudera.com Date: 2014-05-12T10:08:21Z Resolve JAR version conflicts specific to Hive profile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1772 Stop catching Throwable, let Execut...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/715 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1786: Reopening PR 724
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/742#issuecomment-42879475 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14902/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1802. Audit dependency graph when Spark ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/744#issuecomment-42879595 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1786: Reopening PR 724
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/742#issuecomment-42881037 Thanks - I pulled this in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1798. Tests should clean up temp files
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/732#discussion_r12550787 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/MLUtilsSuite.scala --- @@ -90,7 +92,7 @@ class MLUtilsSuite extends FunSuite with LocalSparkContext { assert(multiclassPoints(1).label === -1.0) assert(multiclassPoints(2).label === -1.0) -deleteQuietly(tempDir) +Utils.deleteRecursively(tempDir) --- End diff -- This changes the behavior to not swallow exceptions. This was added originally by @mengxr... is there a reason this squashes exceptions? https://github.com/jegonzal/spark/commit/98750a74#diff-006677f6b8222b96d21bc3e46ac9fe77R161 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---