[GitHub] spark pull request: [SPARK-4939] revive offers periodically in Loc...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4147 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5578][SQL][DataFrame] Provide a conveni...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4345 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4795][Core] Redesign the primitive typ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3642 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4587] [mllib] ML model import/export
Github user selvinsource commented on a diff in the pull request: https://github.com/apache/spark/pull/4233#discussion_r24066997 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -17,14 +17,17 @@ package org.apache.spark.mllib.classification +import org.apache.spark.SparkContext import org.apache.spark.annotation.Experimental +import org.apache.spark.mllib.classification.impl.GLMClassificationModel import org.apache.spark.mllib.linalg.BLAS.dot import org.apache.spark.mllib.linalg.{DenseVector, Vector} import org.apache.spark.mllib.optimization._ import org.apache.spark.mllib.regression._ -import org.apache.spark.mllib.util.{DataValidators, MLUtils} +import org.apache.spark.mllib.util.{DataValidators, Exportable, Importable} --- End diff -- Good for me too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5484] Checkpoint every 25 iterations in...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/4273#issuecomment-72804614 And also, this issue seems to be related to SPARK-5561. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5068] [SQL] Fix bug query data when pat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4356#issuecomment-72805635 [Test build #26729 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26729/consoleFull) for PR 4356 at commit [`1f033cd`](https://github.com/apache/spark/commit/1f033cd8901bd97c8a4677e284847a2e975c6987). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5068] [SQL] Fix bug query data when pat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4356#issuecomment-72805642 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26729/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5068] [SQL] Fix bug query data when pat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4356#issuecomment-72799255 [Test build #26729 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26729/consoleFull) for PR 4356 at commit [`1f033cd`](https://github.com/apache/spark/commit/1f033cd8901bd97c8a4677e284847a2e975c6987). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/3779#discussion_r24065502 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -506,13 +506,59 @@ private[spark] class TaskSetManager( * Get the level we can launch tasks according to delay scheduling, based on current wait time. */ private def getAllowedLocalityLevel(curTime: Long): TaskLocality.TaskLocality = { -while (curTime - lastLaunchTime = localityWaits(currentLocalityIndex) -currentLocalityIndex myLocalityLevels.length - 1) -{ - // Jump to the next locality level, and remove our waiting time for the current one since - // we don't want to count it again on the next one - lastLaunchTime += localityWaits(currentLocalityIndex) - currentLocalityIndex += 1 +// Remove the scheduled or finished tasks lazily +def hasNotScheduledTasks(taskIndexes: ArrayBuffer[Int]): Boolean = { + var indexOffset = taskIndexes.size + while (indexOffset 0) { +indexOffset -= 1 +val index = taskIndexes(indexOffset) +if (copiesRunning(index) == 0 !successful(index)) { + return true +} else { + taskIndexes.remove(indexOffset) +} + } + false +} +// It removes the empty lists after check +def hasMoreTasks(pendingTasks: HashMap[String, ArrayBuffer[Int]]): Boolean = { + val emptyKeys = new ArrayBuffer[String] + val hasTasks = pendingTasks.exists{ +case (id: String, tasks: ArrayBuffer[Int]) = + if (hasNotScheduledTasks(tasks)) { +true + } else { +emptyKeys += id +false + } + } + emptyKeys.foreach(x = pendingTasks.remove(x)) + hasTasks +} + +while (currentLocalityIndex myLocalityLevels.length - 1) { + val moreTasks = myLocalityLevels(currentLocalityIndex) match { +case TaskLocality.PROCESS_LOCAL = hasMoreTasks(pendingTasksForExecutor) +case TaskLocality.NODE_LOCAL = hasMoreTasks(pendingTasksForHost) +case TaskLocality.NO_PREF = pendingTasksWithNoPrefs.isEmpty +case TaskLocality.RACK_LOCAL = hasMoreTasks(pendingTasksForRack) + } + if (!moreTasks) { +// Move to next locality level if there is no task for current level +lastLaunchTime = curTime +logDebug(sNo tasks for locality level ${myLocalityLevels(currentLocalityIndex)} + + smove to ${myLocalityLevels(currentLocalityIndex + 1)}) +currentLocalityIndex += 1 + } else if (curTime - lastLaunchTime = localityWaits(currentLocalityIndex)) { +// Jump to the next locality level, and remove our waiting time for the current one since +// we don't want to count it again on the next one --- End diff -- I know this existed before your change, but can you change this comment to: Jump to the next locality level, and reset lastLaunchTime so that the next locality wait timer doesn't immediately expire? I think that would make this easier to understand --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3779#discussion_r24066639 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -506,13 +506,59 @@ private[spark] class TaskSetManager( * Get the level we can launch tasks according to delay scheduling, based on current wait time. */ private def getAllowedLocalityLevel(curTime: Long): TaskLocality.TaskLocality = { -while (curTime - lastLaunchTime = localityWaits(currentLocalityIndex) -currentLocalityIndex myLocalityLevels.length - 1) -{ - // Jump to the next locality level, and remove our waiting time for the current one since - // we don't want to count it again on the next one - lastLaunchTime += localityWaits(currentLocalityIndex) - currentLocalityIndex += 1 +// Remove the scheduled or finished tasks lazily +def hasNotScheduledTasks(taskIndexes: ArrayBuffer[Int]): Boolean = { + var indexOffset = taskIndexes.size + while (indexOffset 0) { +indexOffset -= 1 +val index = taskIndexes(indexOffset) +if (copiesRunning(index) == 0 !successful(index)) { + return true +} else { + taskIndexes.remove(indexOffset) +} + } + false +} +// It removes the empty lists after check +def hasMoreTasks(pendingTasks: HashMap[String, ArrayBuffer[Int]]): Boolean = { + val emptyKeys = new ArrayBuffer[String] + val hasTasks = pendingTasks.exists{ +case (id: String, tasks: ArrayBuffer[Int]) = + if (hasNotScheduledTasks(tasks)) { +true + } else { +emptyKeys += id +false + } + } + emptyKeys.foreach(x = pendingTasks.remove(x)) --- End diff -- Also, the key could be executorId, host, or rackId --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/3779#issuecomment-72803247 @kayousterhout @markhamstra @JoshRosen I should had addressed all your comments, they are very helpful, thanks you all! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3779#issuecomment-72803322 [Test build #26732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26732/consoleFull) for PR 3779 at commit [`1550668`](https://github.com/apache/spark/commit/1550668b047f8702a6c7b19aad56cfd7a56e47c3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3803#issuecomment-72799033 [Test build #26725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26725/consoleFull) for PR 3803 at commit [`b676534`](https://github.com/apache/spark/commit/b676534067a626260b6921ba17a04b6e03ff587a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class FileInputDStream[K, V, F : NewInputFormat[K,V]](` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/3779#discussion_r24065454 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -506,13 +506,59 @@ private[spark] class TaskSetManager( * Get the level we can launch tasks according to delay scheduling, based on current wait time. */ private def getAllowedLocalityLevel(curTime: Long): TaskLocality.TaskLocality = { -while (curTime - lastLaunchTime = localityWaits(currentLocalityIndex) -currentLocalityIndex myLocalityLevels.length - 1) -{ - // Jump to the next locality level, and remove our waiting time for the current one since - // we don't want to count it again on the next one - lastLaunchTime += localityWaits(currentLocalityIndex) - currentLocalityIndex += 1 +// Remove the scheduled or finished tasks lazily +def hasNotScheduledTasks(taskIndexes: ArrayBuffer[Int]): Boolean = { + var indexOffset = taskIndexes.size + while (indexOffset 0) { +indexOffset -= 1 +val index = taskIndexes(indexOffset) +if (copiesRunning(index) == 0 !successful(index)) { + return true +} else { + taskIndexes.remove(indexOffset) +} + } + false +} +// It removes the empty lists after check +def hasMoreTasks(pendingTasks: HashMap[String, ArrayBuffer[Int]]): Boolean = { + val emptyKeys = new ArrayBuffer[String] + val hasTasks = pendingTasks.exists{ +case (id: String, tasks: ArrayBuffer[Int]) = + if (hasNotScheduledTasks(tasks)) { +true + } else { +emptyKeys += id +false + } + } + emptyKeys.foreach(x = pendingTasks.remove(x)) + hasTasks +} + +while (currentLocalityIndex myLocalityLevels.length - 1) { + val moreTasks = myLocalityLevels(currentLocalityIndex) match { +case TaskLocality.PROCESS_LOCAL = hasMoreTasks(pendingTasksForExecutor) +case TaskLocality.NODE_LOCAL = hasMoreTasks(pendingTasksForHost) +case TaskLocality.NO_PREF = pendingTasksWithNoPrefs.isEmpty --- End diff -- +1 (is this a mistake?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3779#discussion_r24065965 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -506,13 +506,59 @@ private[spark] class TaskSetManager( * Get the level we can launch tasks according to delay scheduling, based on current wait time. */ private def getAllowedLocalityLevel(curTime: Long): TaskLocality.TaskLocality = { -while (curTime - lastLaunchTime = localityWaits(currentLocalityIndex) -currentLocalityIndex myLocalityLevels.length - 1) -{ - // Jump to the next locality level, and remove our waiting time for the current one since - // we don't want to count it again on the next one - lastLaunchTime += localityWaits(currentLocalityIndex) - currentLocalityIndex += 1 +// Remove the scheduled or finished tasks lazily +def hasNotScheduledTasks(taskIndexes: ArrayBuffer[Int]): Boolean = { + var indexOffset = taskIndexes.size + while (indexOffset 0) { +indexOffset -= 1 +val index = taskIndexes(indexOffset) +if (copiesRunning(index) == 0 !successful(index)) { + return true +} else { + taskIndexes.remove(indexOffset) +} + } + false +} +// It removes the empty lists after check +def hasMoreTasks(pendingTasks: HashMap[String, ArrayBuffer[Int]]): Boolean = { + val emptyKeys = new ArrayBuffer[String] + val hasTasks = pendingTasks.exists{ +case (id: String, tasks: ArrayBuffer[Int]) = + if (hasNotScheduledTasks(tasks)) { +true + } else { +emptyKeys += id +false + } + } + emptyKeys.foreach(x = pendingTasks.remove(x)) + hasTasks +} + +while (currentLocalityIndex myLocalityLevels.length - 1) { + val moreTasks = myLocalityLevels(currentLocalityIndex) match { +case TaskLocality.PROCESS_LOCAL = hasMoreTasks(pendingTasksForExecutor) +case TaskLocality.NODE_LOCAL = hasMoreTasks(pendingTasksForHost) +case TaskLocality.NO_PREF = pendingTasksWithNoPrefs.isEmpty --- End diff -- Oh, yes, it's really a mistake! will fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/3779#issuecomment-72800486 Echoing the comments from @markhamstra and @JoshRosen, this change looks correct but is very subtle, so it would be great to improve the commenting throughout to avoid confusion for others who read this code later. I suggested a few comment improvements; if you have time to make these soon, that would be great since I know you wanted this in for 1.3 and I know the merge window for that is closing rapidly. Thanks @davies! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5587][SQL] Support change database owne...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4357#issuecomment-72800621 [Test build #26730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26730/consoleFull) for PR 4357 at commit [`79413c6`](https://github.com/apache/spark/commit/79413c6eeb4031a676e278fd6aa10679c9ab48a5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3779#discussion_r24067444 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -506,13 +506,63 @@ private[spark] class TaskSetManager( * Get the level we can launch tasks according to delay scheduling, based on current wait time. */ private def getAllowedLocalityLevel(curTime: Long): TaskLocality.TaskLocality = { -while (curTime - lastLaunchTime = localityWaits(currentLocalityIndex) -currentLocalityIndex myLocalityLevels.length - 1) -{ - // Jump to the next locality level, and remove our waiting time for the current one since - // we don't want to count it again on the next one - lastLaunchTime += localityWaits(currentLocalityIndex) - currentLocalityIndex += 1 +// Remove the scheduled or finished tasks lazily +def tasksNeedToBeScheduledFrom(pendingTaskIds: ArrayBuffer[Int]): Boolean = { + var indexOffset = pendingTaskIds.size + while (indexOffset 0) { +indexOffset -= 1 +val index = pendingTaskIds(indexOffset) +if (copiesRunning(index) == 0 !successful(index)) { + return true +} else { + pendingTaskIds.remove(indexOffset) +} + } + false +} +// Walk through the list of tasks that can be scheduled at each location and returns true +// if there are any tasks that still need to be scheduled. Lazily cleans up tasks that have +// already been scheduled. +def noMoreTasksToRunIn(pendingTasks: HashMap[String, ArrayBuffer[Int]]): Boolean = { + val emptyKeys = new ArrayBuffer[String] + val hasTasks = pendingTasks.exists{ +case (id: String, tasks: ArrayBuffer[Int]) = + if (tasksNeedToBeScheduledFrom(tasks)) { +true + } else { +emptyKeys += id +false + } + } + emptyKeys.foreach(x = pendingTasks.remove(x)) + hasTasks +} + +while (currentLocalityIndex myLocalityLevels.length - 1) { + val moreTasks = myLocalityLevels(currentLocalityIndex) match { +case TaskLocality.PROCESS_LOCAL = noMoreTasksToRunIn(pendingTasksForExecutor) +case TaskLocality.NODE_LOCAL = noMoreTasksToRunIn(pendingTasksForHost) +case TaskLocality.NO_PREF = pendingTasksWithNoPrefs.nonEmpty --- End diff -- noMoreTasksToRunIn = moreTasksToRunIn() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2945][YARN][Doc]add doc for spark.execu...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/4350#issuecomment-72799695 Seems like the UT has broken, but I think it is unralted with this patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3779#discussion_r24066617 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -506,13 +506,59 @@ private[spark] class TaskSetManager( * Get the level we can launch tasks according to delay scheduling, based on current wait time. */ private def getAllowedLocalityLevel(curTime: Long): TaskLocality.TaskLocality = { -while (curTime - lastLaunchTime = localityWaits(currentLocalityIndex) -currentLocalityIndex myLocalityLevels.length - 1) -{ - // Jump to the next locality level, and remove our waiting time for the current one since - // we don't want to count it again on the next one - lastLaunchTime += localityWaits(currentLocalityIndex) - currentLocalityIndex += 1 +// Remove the scheduled or finished tasks lazily +def hasNotScheduledTasks(taskIndexes: ArrayBuffer[Int]): Boolean = { + var indexOffset = taskIndexes.size + while (indexOffset 0) { +indexOffset -= 1 +val index = taskIndexes(indexOffset) +if (copiesRunning(index) == 0 !successful(index)) { + return true +} else { + taskIndexes.remove(indexOffset) +} + } + false +} +// It removes the empty lists after check +def hasMoreTasks(pendingTasks: HashMap[String, ArrayBuffer[Int]]): Boolean = { + val emptyKeys = new ArrayBuffer[String] + val hasTasks = pendingTasks.exists{ +case (id: String, tasks: ArrayBuffer[Int]) = + if (hasNotScheduledTasks(tasks)) { +true + } else { +emptyKeys += id --- End diff -- The `id` could be executorId, host or rackId. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3779#discussion_r24067073 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -506,13 +506,63 @@ private[spark] class TaskSetManager( * Get the level we can launch tasks according to delay scheduling, based on current wait time. */ private def getAllowedLocalityLevel(curTime: Long): TaskLocality.TaskLocality = { -while (curTime - lastLaunchTime = localityWaits(currentLocalityIndex) -currentLocalityIndex myLocalityLevels.length - 1) -{ - // Jump to the next locality level, and remove our waiting time for the current one since - // we don't want to count it again on the next one - lastLaunchTime += localityWaits(currentLocalityIndex) - currentLocalityIndex += 1 +// Remove the scheduled or finished tasks lazily +def tasksNeedToBeScheduledFrom(pendingTaskIds: ArrayBuffer[Int]): Boolean = { + var indexOffset = pendingTaskIds.size + while (indexOffset 0) { +indexOffset -= 1 +val index = pendingTaskIds(indexOffset) +if (copiesRunning(index) == 0 !successful(index)) { + return true +} else { + pendingTaskIds.remove(indexOffset) +} + } + false +} +// Walk through the list of tasks that can be scheduled at each location and returns true +// if there are any tasks that still need to be scheduled. Lazily cleans up tasks that have +// already been scheduled. +def noMoreTasksToRunIn(pendingTasks: HashMap[String, ArrayBuffer[Int]]): Boolean = { + val emptyKeys = new ArrayBuffer[String] + val hasTasks = pendingTasks.exists{ +case (id: String, tasks: ArrayBuffer[Int]) = + if (tasksNeedToBeScheduledFrom(tasks)) { +true + } else { +emptyKeys += id +false + } + } + emptyKeys.foreach(x = pendingTasks.remove(x)) + hasTasks +} + +while (currentLocalityIndex myLocalityLevels.length - 1) { + val moreTasks = myLocalityLevels(currentLocalityIndex) match { +case TaskLocality.PROCESS_LOCAL = noMoreTasksToRunIn(pendingTasksForExecutor) +case TaskLocality.NODE_LOCAL = noMoreTasksToRunIn(pendingTasksForHost) +case TaskLocality.NO_PREF = pendingTasksWithNoPrefs.nonEmpty --- End diff -- Wait, are we going with moreTasks or noMoreTasks? I think this needs to be `val noMoreTasks`, `case TaskLocality.NO_PREF = pendingTasksWithNoPrefs.isEmpty` and `if (noMoreTasks)` a couple lines down -- or consistently reverse the boolean sense of all that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Minor changes for DataFrame Implementati...
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/4339 [SQL] Minor changes for DataFrame Implementation You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenghao-intel/spark dataframe_minor2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4339.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4339 commit f63614bee7fe098ddc6d41ab0b6bd31b0abb5eca Author: Cheng Hao hao.ch...@intel.com Date: 2015-02-03T14:44:45Z minor change for DataFrame Implementation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user dibbhatt commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24010956 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka + +import scala.util.control.NonFatal +import scala.util.Random +import scala.collection.mutable.ArrayBuffer +import java.util.Properties +import kafka.api._ +import kafka.common.{ErrorMapping, OffsetMetadataAndError, TopicAndPartition} +import kafka.consumer.{ConsumerConfig, SimpleConsumer} +import org.apache.spark.SparkException + +/** + * Convenience methods for interacting with a Kafka cluster. + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form + */ +private[spark] +class KafkaCluster(val kafkaParams: Map[String, String]) extends Serializable { + import KafkaCluster.{Err, LeaderOffset} + + val seedBrokers: Array[(String, Int)] = +kafkaParams.get(metadata.broker.list) + .orElse(kafkaParams.get(bootstrap.servers)) + .getOrElse(throw new SparkException(Must specify metadata.broker.list or bootstrap.servers)) + .split(,).map { hp = +val hpa = hp.split(:) +(hpa(0), hpa(1).toInt) + } + + // ConsumerConfig isn't serializable + @transient private var _config: ConsumerConfig = null + + def config: ConsumerConfig = this.synchronized { +if (_config == null) { + _config = KafkaCluster.consumerConfig(kafkaParams) +} +_config + } + + def connect(host: String, port: Int): SimpleConsumer = +new SimpleConsumer(host, port, config.socketTimeoutMs, + config.socketReceiveBufferBytes, config.clientId) + + def connectLeader(topic: String, partition: Int): Either[Err, SimpleConsumer] = +findLeader(topic, partition).right.map(hp = connect(hp._1, hp._2)) + + // Metadata api + // scalastyle:off + // https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI + // scalastyle:on + + def findLeader(topic: String, partition: Int): Either[Err, (String, Int)] = { +val req = TopicMetadataRequest(TopicMetadataRequest.CurrentVersion, + 0, config.clientId, Seq(topic)) +val errs = new Err +withBrokers(Random.shuffle(seedBrokers), errs) { consumer = + val resp: TopicMetadataResponse = consumer.send(req) + resp.topicsMetadata.find(_.topic == topic).flatMap { tm: TopicMetadata = +tm.partitionsMetadata.find(_.partitionId == partition) + }.foreach { pm: PartitionMetadata = +pm.leader.foreach { leader = + return Right((leader.host, leader.port)) +} + } +} +Left(errs) + } + + def findLeaders( + topicAndPartitions: Set[TopicAndPartition] +): Either[Err, Map[TopicAndPartition, (String, Int)]] = { +val topics = topicAndPartitions.map(_.topic) +val response = getPartitionMetadata(topics).right +val answer = response.flatMap { tms: Set[TopicMetadata] = + val leaderMap = tms.flatMap { tm: TopicMetadata = +tm.partitionsMetadata.flatMap { pm: PartitionMetadata = + val tp = TopicAndPartition(tm.topic, pm.partitionId) + if (topicAndPartitions(tp)) { +pm.leader.map { l = + tp - (l.host - l.port) +} + } else { +None + } +} + }.toMap + + if (leaderMap.keys.size == topicAndPartitions.size) { +Right(leaderMap) + } else { +val
[GitHub] spark pull request: [SQL] Minor changes for DataFrame Implementati...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4339#issuecomment-72664280 [Test build #26655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26655/consoleFull) for PR 4339 at commit [`f63614b`](https://github.com/apache/spark/commit/f63614bee7fe098ddc6d41ab0b6bd31b0abb5eca). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5498][SQL]fix bug when query the data w...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4289#issuecomment-72671869 Sorry for the late reply @jeanlyn ! I think it's a bug of Hive DDL, which probably was resolved in Hive 0.14 / 0.15, and I am not sure if we really want to fix that in Spark SQL. @yhuai , do you have any comment on this? However, in this particular case, another work around in your product: 1) Rename the existed table; 2) Create a new table with schema you altered, and also the partitions. 3) Manually move the previous data into the new table folder from the HDFS. 4) Drop the old table. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [STREAMING] SPARK-4986 Wait for receivers to d...
GitHub user cleaton opened a pull request: https://github.com/apache/spark/pull/4338 [STREAMING] SPARK-4986 Wait for receivers to deregister and receiver job to terminate A slow receiver might not have enough time to shutdown cleanly even when graceful shutdown is used. This PR extends graceful waiting to make sure all receivers have deregistered and that the receiver job has terminated. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cleaton/spark stopreceivers Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4338.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4338 commit d179372ae1d93adf67f8cd7ea818c443c584a8a0 Author: Jesper Lundgren jesper.lundg...@vpon.com Date: 2015-01-25T14:31:19Z Add graceful shutdown unit test covering slow receiver onStop Signed-off-by: Jesper Lundgren jesper.lundg...@vpon.com commit 9a9ff884d878824548f9a08d724f60b0c14f8310 Author: Jesper Lundgren jesper.lundg...@vpon.com Date: 2015-01-30T07:34:03Z wait for receivers to shutdown and receiver job to terminate Signed-off-by: Jesper Lundgren jesper.lundg...@vpon.com commit 3d0bd3501d24a8ab5e10e1867b464d471b3bbb67 Author: Jesper Lundgren jesper.lundg...@vpon.com Date: 2015-01-30T07:41:00Z switch boleans to match running status instead of terminated Signed-off-by: Jesper Lundgren jesper.lundg...@vpon.com --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [STREAMING] SPARK-4986 Wait for receivers to d...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4338#issuecomment-72662235 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Minor changes for dataframe implementati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4336#issuecomment-72667971 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26653/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Minor changes for dataframe implementati...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4336#issuecomment-72667963 [Test build #26653 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26653/consoleFull) for PR 4336 at commit [`3293408`](https://github.com/apache/spark/commit/3293408b812944735e03f2a41221851faffb3669). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [STREAMING] SPARK-4986 Wait for receivers to d...
Github user cleaton commented on the pull request: https://github.com/apache/spark/pull/3868#issuecomment-72662664 @tdas I have created a new master branch PR. You can find it here: https://github.com/apache/spark/pull/4338 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5498][SQL]fix bug when query the data w...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4289#discussion_r24010127 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala --- @@ -172,4 +172,19 @@ class InsertIntoHiveTableSuite extends QueryTest { sql(DROP TABLE hiveTableWithStructValue) } + + test(SPARK-5498:partition schema does not match table schema){ +val testData = TestHive.sparkContext.parallelize( + (1 to 10).map(i = TestData(i, i.toString))) +testData.registerTempTable(testData) +val tmpDir = Files.createTempDir() +sql(sCREATE TABLE table_with_partition(key int,value string) PARTITIONED by (ds string) location '${tmpDir.toURI.toString}' ) +sql(INSERT OVERWRITE TABLE table_with_partition partition (ds='1') SELECT key,value FROM testData) +sql(ALTER TABLE table_with_partition CHANGE COLUMN key key BIGINT) --- End diff -- I just checked the [Hive Document](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable) It says: `The CASCADE|RESTRICT clause is available in Hive 0.15.0. ALTER TABLE CHANGE COLUMN with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column change only to table metadata.` I guess in Hive 0.13.1, when table schema changed via `alter table`, only the table meta data will be updated, can you double check if above query works for Hive 0.13.1? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5559] [Streaming] [Test] Remove oppotun...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4337#issuecomment-72669182 [Test build #26654 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26654/consoleFull) for PR 4337 at commit [`8212e42`](https://github.com/apache/spark/commit/8212e42cfd79b5b92c7c664a9ecff7da68e062a5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5559] [Streaming] [Test] Remove oppotun...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4337#issuecomment-72669193 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26654/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1405][MLLIB] LDA on GraphX
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2388 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4047 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [minor] update streaming linear algorithms
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4329#issuecomment-72610576 This is a minor update. I've merged this into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72611319 [Test build #26622 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26622/consoleFull) for PR 3820 at commit [`b1e2a0d`](https://github.com/apache/spark/commit/b1e2a0d8b40f6651a0a2b36cdc9070e67e9d6bf3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r23988557 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import scala.collection.mutable + +import akka.actor.{ActorRef, Actor} + +import org.apache.spark._ +import org.apache.spark.util.{AkkaUtils, ActorLogReceive} + +private[spark] sealed trait OutputCommitCoordinationMessage extends Serializable + +private[spark] case class StageStarted(stage: Int) extends OutputCommitCoordinationMessage +private[spark] case class StageEnded(stage: Int) extends OutputCommitCoordinationMessage +private[spark] case object StopCoordinator extends OutputCommitCoordinationMessage + +private[spark] case class AskPermissionToCommitOutput( +stage: Int, +task: Long, +taskAttempt: Long) +extends OutputCommitCoordinationMessage + +private[spark] case class TaskCompleted( +stage: Int, +task: Long, +attempt: Long, +reason: TaskEndReason) +extends OutputCommitCoordinationMessage + +/** + * Authority that decides whether tasks can commit output to HDFS. + * + * This lives on the driver, but the actor allows the tasks that commit + * to Hadoop to invoke it. + */ +private[spark] class OutputCommitCoordinator(conf: SparkConf) extends Logging { + + // Initialized by SparkEnv + var coordinatorActor: Option[ActorRef] = None + private val timeout = AkkaUtils.askTimeout(conf) + private val maxAttempts = AkkaUtils.numRetries(conf) + private val retryInterval = AkkaUtils.retryWaitMs(conf) + + private type StageId = Int + private type TaskId = Long + private type TaskAttemptId = Long + private type CommittersByStageMap = mutable.Map[StageId, mutable.Map[TaskId, TaskAttemptId]] + + private val authorizedCommittersByStage: CommittersByStageMap = mutable.Map() + + def stageStart(stage: StageId) { +sendToActor(StageStarted(stage)) + } + def stageEnd(stage: StageId) { +sendToActor(StageEnded(stage)) + } + + def canCommit( + stage: StageId, + task: TaskId, + attempt: TaskAttemptId): Boolean = { +askActor(AskPermissionToCommitOutput(stage, task, attempt)) + } + + def taskCompleted( + stage: StageId, + task: TaskId, + attempt: TaskAttemptId, + reason: TaskEndReason) { +sendToActor(TaskCompleted(stage, task, attempt, reason)) + } + + def stop() { +sendToActor(StopCoordinator) +coordinatorActor = None +authorizedCommittersByStage.foreach(_._2.clear) +authorizedCommittersByStage.clear + } + + private def handleStageStart(stage: StageId): Unit = { +authorizedCommittersByStage(stage) = mutable.HashMap[TaskId, TaskAttemptId]() + } + + private def handleStageEnd(stage: StageId): Unit = { +authorizedCommittersByStage.remove(stage) + } + + private def handleAskPermissionToCommit( + stage: StageId, + task: TaskId, + attempt: TaskAttemptId): + Boolean = { +authorizedCommittersByStage.get(stage) match { + case Some(authorizedCommitters) = +authorizedCommitters.get(stage) match { --- End diff -- Alas, StageId and TaskID are mere type aliases, not actual types, so it looks like there's a subtle typo here that could have been caught by stronger typechecking: `authorizedCommitters` is indexed by TaskID, not StageID. I think this is why the streaming checkpoint suite tests were failing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working,
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23988445 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils { createStream[K, V, U, T]( jssc.ssc, kafkaParams.toMap, Map(topics.mapValues(_.intValue()).toSeq: _*), storageLevel) } + + /** A batch-oriented interface for consuming from Kafka. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param sc SparkContext object + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + * @param offsetRanges Each OffsetRange in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + */ + @Experimental + def createRDD[ +K: ClassTag, +V: ClassTag, +U : Decoder[_]: ClassTag, +T : Decoder[_]: ClassTag] ( + sc: SparkContext, + kafkaParams: Map[String, String], + offsetRanges: Array[OffsetRange] + ): RDD[(K, V)] with HasOffsetRanges = { --- End diff -- I noticed that KafkaRDD isn't exposed, so maybe this is why. Not sure I see a big issue with exposing KafkaRDD and its constructor given that it's basically the same level of visibility as this static factory function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23988871 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka + +import kafka.common.TopicAndPartition + +/** Something that has a collection of OffsetRanges */ +trait HasOffsetRanges { + def offsetRanges: Array[OffsetRange] +} + +/** Represents a range of offsets from a single Kafka TopicAndPartition */ +final class OffsetRange private( +/** kafka topic name */ +val topic: String, +/** kafka partition id */ +val partition: Int, +/** inclusive starting offset */ +val fromOffset: Long, +/** exclusive ending offset */ +val untilOffset: Long) extends Serializable { + import OffsetRange.OffsetRangeTuple + + /** this is to avoid ClassNotFoundException during checkpoint restore */ + private[streaming] + def toTuple: OffsetRangeTuple = (topic, partition, fromOffset, untilOffset) +} + +object OffsetRange { + private[spark] + type OffsetRangeTuple = (String, Int, Long, Long) + + def create(topic: String, partition: Int, fromOffset: Long, untilOffset: Long): OffsetRange = --- End diff -- It's confusing to have both `create` and the apply methods here. Why not just have one way of creating these? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r23988843 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -105,24 +106,61 @@ class SparkHadoopWriter(@transient jobConf: JobConf) def commit() { val taCtxt = getTaskContext() val cmtr = getOutputCommitter() -if (cmtr.needsTaskCommit(taCtxt)) { + +// Called after we have decided to commit +def performCommit(): Unit = { try { cmtr.commitTask(taCtxt) -logInfo (taID + : Committed) +logInfo (s$taID: Committed) } catch { -case e: IOException = { +case e: IOException = logError(Error committing the output of task: + taID.value, e) cmtr.abortTask(taCtxt) throw e + } +} + +// First, check whether the task's output has already been committed by some other attempt +if (cmtr.needsTaskCommit(taCtxt)) { + // The task output needs to be committed, but we don't know whether some other task attempt + // might be racing to commit the same output partition. Therefore, coordinate with the driver + // in order to determine whether this attempt can commit. + val shouldCoordinateWithDriver: Boolean = { +val sparkConf = SparkEnv.get.conf +// We only need to coordinate with the driver if there are multiple concurrent task +// attempts, which should only occur if speculation is enabled --- End diff -- Can anyone think of cases where this assumption would be violated? Can this ever be violated due to, say, transitive network failures? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5551][SQL] Create type alias for Schema...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4327 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][DataFrame] Remove DataFrameApi, Expressi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4328 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4795][Core] Redesign the primitive typ...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/3642#discussion_r23989500 --- Diff: graphx/src/test/scala/org/apache/spark/graphx/lib/ShortestPathsSuite.scala --- @@ -40,7 +40,7 @@ class ShortestPathsSuite extends FunSuite with LocalSparkContext { val graph = Graph.fromEdgeTuples(edges, 1) val landmarks = Seq(1, 4).map(_.toLong) val results = ShortestPaths.run(graph, landmarks).vertices.collect.map { -case (v, spMap) = (v, spMap.mapValues(_.get)) --- End diff -- Do you have any example? I added implicit to them and compiled codes successfully. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5549] Define TaskContext interface in S...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4324#issuecomment-72613931 Going to merge since tests passed previously, and the latest failure was due to a flaky test in streaming. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4068#issuecomment-72614165 [Test build #26641 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26641/consoleFull) for PR 4068 at commit [`3295858`](https://github.com/apache/spark/commit/3295858b6d7e1738c11837207f564b4f2c4c503c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4325#issuecomment-72614700 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26639/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4325#issuecomment-72614694 [Test build #26639 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26639/consoleFull) for PR 4325 at commit [`e46735c`](https://github.com/apache/spark/commit/e46735c612879bb46317efe73155b4611bb51afc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5426][SQL] Add SparkSQL Java API helper...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4243#issuecomment-72614630 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23989829 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils { createStream[K, V, U, T]( jssc.ssc, kafkaParams.toMap, Map(topics.mapValues(_.intValue()).toSeq: _*), storageLevel) } + + /** A batch-oriented interface for consuming from Kafka. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param sc SparkContext object + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + * @param offsetRanges Each OffsetRange in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + */ + @Experimental + def createRDD[ +K: ClassTag, +V: ClassTag, +U : Decoder[_]: ClassTag, +T : Decoder[_]: ClassTag] ( + sc: SparkContext, + kafkaParams: Map[String, String], + offsetRanges: Array[OffsetRange] + ): RDD[(K, V)] with HasOffsetRanges = { +val messageHandler = (mmd: MessageAndMetadata[K, V]) = (mmd.key, mmd.message) +val kc = new KafkaCluster(kafkaParams) +val topics = offsetRanges.map(o = TopicAndPartition(o.topic, o.partition)).toSet +val leaders = kc.findLeaders(topics).fold( + errs = throw new SparkException(errs.mkString(\n)), + ok = ok +) +new KafkaRDD[K, V, U, T, (K, V)](sc, kafkaParams, offsetRanges, leaders, messageHandler) + } + + /** A batch-oriented interface for consuming from Kafka. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param sc SparkContext object + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + * @param offsetRanges Each OffsetRange in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + * @param leaders Kafka leaders for each offset range in batch + * @param messageHandler function for translating each message into the desired type --- End diff -- How is this message handler different than having the user just call a map function on a returned RDD? It seems a little risky because this is exposing a Kafka class in the byte code signature, which they could relocate in a future release in a way that causes this to break for callers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23989786 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils { createStream[K, V, U, T]( jssc.ssc, kafkaParams.toMap, Map(topics.mapValues(_.intValue()).toSeq: _*), storageLevel) } + + /** A batch-oriented interface for consuming from Kafka. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param sc SparkContext object + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + * @param offsetRanges Each OffsetRange in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + */ + @Experimental + def createRDD[ +K: ClassTag, +V: ClassTag, +U : Decoder[_]: ClassTag, +T : Decoder[_]: ClassTag] ( + sc: SparkContext, + kafkaParams: Map[String, String], + offsetRanges: Array[OffsetRange] + ): RDD[(K, V)] with HasOffsetRanges = { +val messageHandler = (mmd: MessageAndMetadata[K, V]) = (mmd.key, mmd.message) +val kc = new KafkaCluster(kafkaParams) +val topics = offsetRanges.map(o = TopicAndPartition(o.topic, o.partition)).toSet +val leaders = kc.findLeaders(topics).fold( + errs = throw new SparkException(errs.mkString(\n)), + ok = ok +) +new KafkaRDD[K, V, U, T, (K, V)](sc, kafkaParams, offsetRanges, leaders, messageHandler) + } + + /** A batch-oriented interface for consuming from Kafka. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param sc SparkContext object + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + * @param offsetRanges Each OffsetRange in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + * @param leaders Kafka leaders for each offset range in batch + * @param messageHandler function for translating each message into the desired type + */ + @Experimental + def createRDD[ +K: ClassTag, +V: ClassTag, +U : Decoder[_]: ClassTag, +T : Decoder[_]: ClassTag, +R: ClassTag] ( + sc: SparkContext, + kafkaParams: Map[String, String], + offsetRanges: Array[OffsetRange], + leaders: Array[Leader], --- End diff -- Is this version of the constructor assuming that they caller has their own code for finding the leaders? From what I can tell we've locked down the utility function for doing that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4795][Core] Redesign the primitive typ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3642#issuecomment-72615909 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26636/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][DataFrame] Remove DataFrameApi, Expressi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4328#issuecomment-72615989 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26634/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][DataFrame] Remove DataFrameApi, Expressi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4328#issuecomment-72615985 [Test build #26634 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26634/consoleFull) for PR 4328 at commit [`723d600`](https://github.com/apache/spark/commit/723d60054b369573bbe8035cad266aef9300356b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23990370 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils { createStream[K, V, U, T]( jssc.ssc, kafkaParams.toMap, Map(topics.mapValues(_.intValue()).toSeq: _*), storageLevel) } + + /** A batch-oriented interface for consuming from Kafka. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param sc SparkContext object + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + * @param offsetRanges Each OffsetRange in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + */ + @Experimental + def createRDD[ +K: ClassTag, +V: ClassTag, +U : Decoder[_]: ClassTag, +T : Decoder[_]: ClassTag] ( + sc: SparkContext, + kafkaParams: Map[String, String], + offsetRanges: Array[OffsetRange] + ): RDD[(K, V)] with HasOffsetRanges = { +val messageHandler = (mmd: MessageAndMetadata[K, V]) = (mmd.key, mmd.message) +val kc = new KafkaCluster(kafkaParams) +val topics = offsetRanges.map(o = TopicAndPartition(o.topic, o.partition)).toSet +val leaders = kc.findLeaders(topics).fold( + errs = throw new SparkException(errs.mkString(\n)), + ok = ok +) +new KafkaRDD[K, V, U, T, (K, V)](sc, kafkaParams, offsetRanges, leaders, messageHandler) + } + + /** A batch-oriented interface for consuming from Kafka. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param sc SparkContext object + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + * @param offsetRanges Each OffsetRange in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + * @param leaders Kafka leaders for each offset range in batch + * @param messageHandler function for translating each message into the desired type + */ + @Experimental + def createRDD[ +K: ClassTag, +V: ClassTag, +U : Decoder[_]: ClassTag, +T : Decoder[_]: ClassTag, +R: ClassTag] ( + sc: SparkContext, + kafkaParams: Map[String, String], + offsetRanges: Array[OffsetRange], + leaders: Array[Leader], + messageHandler: MessageAndMetadata[K, V] = R + ): RDD[R] with HasOffsetRanges = { + +val leaderMap = leaders + .map(l = TopicAndPartition(l.topic, l.partition) - (l.host, l.port)) + .toMap +new KafkaRDD[K, V, U, T, R](sc, kafkaParams, offsetRanges, leaderMap, messageHandler) + } + + /** + * This stream can guarantee that each message from Kafka is included in transformations + * (as opposed to output actions) exactly once, even in most failure situations. + * + * Points to note: + * + * Failure Recovery - You must checkpoint this stream, or save offsets yourself and provide them + * as the fromOffsets parameter on restart. + * Kafka must have sufficient log retention to obtain messages after failure. + * + * Getting offsets from the stream - see programming guide + * +. * Zookeeper - This does not use Zookeeper to store offsets. For interop with Kafka monitors + * that depend on Zookeeper, you must store offsets in ZK yourself. + * + * End-to-end semantics - This does not guarantee that any output operation will push each record + * exactly once. To ensure end-to-end exactly-once semantics (that is, receiving exactly once and + * outputting exactly once), you have to either ensure that the output operation is + * idempotent, or transactionally store offsets with the output. See the programming guide for + * more details. + * + * @param ssc StreamingContext object + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in
[GitHub] spark pull request: [SPARK-4795][Core] Redesign the primitive typ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3642#issuecomment-72615904 [Test build #26636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26636/consoleFull) for PR 3642 at commit [`0b9017f`](https://github.com/apache/spark/commit/0b9017fef57e5512d539146fafd9aa1e12b966ae). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4047#issuecomment-72609253 LGTM. Merged into master. Thanks everyone for collaborating on LDA! @jkbradley Please create follow-up JIRAs and see who are interested in working on LDA features. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5550] [SQL] Support the case insensitiv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4326#issuecomment-72609562 [Test build #26625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26625/consoleFull) for PR 4326 at commit [`485cf66`](https://github.com/apache/spark/commit/485cf66a92ce31b53153b87f178e335cd52a2f97). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SimpleFunctionRegistry(val caseSensitive: Boolean) extends FunctionRegistry ` * `class StringKeyHashMap[T](normalizer: (String) = String) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][DataFrame] Remove DataFrameApi, Expressi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4328#issuecomment-72610285 [Test build #26626 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26626/consoleFull) for PR 4328 at commit [`e00ffcb`](https://github.com/apache/spark/commit/e00ffcb83bc3485ed7381ec2ab8bbc6d2878ee9e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4331#issuecomment-72610330 [Test build #26635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26635/consoleFull) for PR 4331 at commit [`3ab2661`](https://github.com/apache/spark/commit/3ab26614b5278edce6e8571e5c51fe0b67e3124e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [minor] update streaming linear algorithms
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4329#issuecomment-72610392 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26628/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4795][Core] Redesign the primitive typ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3642#issuecomment-72610354 [Test build #26636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26636/consoleFull) for PR 3642 at commit [`0b9017f`](https://github.com/apache/spark/commit/0b9017fef57e5512d539146fafd9aa1e12b966ae). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [minor] update streaming linear algorithms
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4329#issuecomment-72610387 [Test build #26628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26628/consoleFull) for PR 4329 at commit [`78731e1`](https://github.com/apache/spark/commit/78731e165b370eafd575c93d9601237e211ec75c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][DataFrame] Remove DataFrameApi, Expressi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4328#issuecomment-72610291 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26626/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/4325#discussion_r23988701 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DateUtils.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types + +import java.sql.Date +import java.util.{Calendar, TimeZone} + +import org.apache.spark.sql.catalyst.expressions.Cast + +/** + * helper function to convert between Int value of days since 1970-01-01 and java.sql.Date + */ +object DateUtils { --- End diff -- If it's not necessary, just do not make it public, or we can not change it anymore. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5182] [SPARK-5528] [SPARK-5509] [SPARK-...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/4308#discussion_r23988104 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -175,7 +175,10 @@ case class EqualTo(left: Expression, right: Expression) extends BinaryComparison null } else { val r = right.eval(input) - if (r == null) null else l == r + if (r == null) null + else if (left.dataType != BinaryType) l == r + else BinaryType.ordering.compare( +l.asInstanceOf[Array[Byte]], r.asInstanceOf[Array[Byte]]) == 0 --- End diff -- Filed SPARK-5553 to track this. I'd like to make sure equality comparison for binary types works properly in this PR. Also, we're already using `Ordering` to compare binary values in `LessThan` and `GreaterThan` etc., so at least this isn't a regression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23988318 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils { createStream[K, V, U, T]( jssc.ssc, kafkaParams.toMap, Map(topics.mapValues(_.intValue()).toSeq: _*), storageLevel) } + + /** A batch-oriented interface for consuming from Kafka. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param sc SparkContext object + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + * @param offsetRanges Each OffsetRange in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + */ + @Experimental + def createRDD[ +K: ClassTag, +V: ClassTag, +U : Decoder[_]: ClassTag, +T : Decoder[_]: ClassTag] ( + sc: SparkContext, + kafkaParams: Map[String, String], + offsetRanges: Array[OffsetRange] + ): RDD[(K, V)] with HasOffsetRanges = { --- End diff -- I've never seen a trait mixin in a return type. What does this actually mean? I looked at the compiled byte code and the byte code signature is still RDD. Can we just return a `KafkaRDD` here? If this is enforced somehow by the scala compiler, returning an interface here ties our hands in the future, because we can't add functionality to the returned type without breaking binary compatibility. For instance, we may want to return an RDD that has additional methods beyond just accessing its offset ranges. I ran a simple example and I couldn't see any byte code reference to the mixed in trait: ``` trait Trait {} class Class extends Trait {} object Object { def getTrait: Class with Trait = {new Class()} } javap -v Object public static Class getTrait(); flags: ACC_PUBLIC, ACC_STATIC Code: stack=1, locals=0, args_size=0 0: getstatic #16 // Field Object$.MODULE$:LObject$; 3: invokevirtual #18 // Method Object$.getTrait:()LClass; 6: areturn ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/4325#discussion_r23988264 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DateUtils.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types + +import java.sql.Date +import java.util.{Calendar, TimeZone} + +import org.apache.spark.sql.catalyst.expressions.Cast + +/** + * helper function to convert between Int value of days since 1970-01-01 and java.sql.Date + */ +object DateUtils { --- End diff -- I think this could be useful even outside of spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [minor] update streaming linear algorithms
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4329 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/3803#issuecomment-72612066 The Python parts look good to me, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-72611873 [Test build #26637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26637/consoleFull) for PR 4066 at commit [`92e6dc9`](https://github.com/apache/spark/commit/92e6dc96530351b54cb8eb9944d90b7664776a79). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/4068#issuecomment-72613023 Hi @yhuai , I have updated this PR introducing `UnresolvedGetField` to fix this issue. Do you have time to review it? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23989226 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -808,6 +810,7 @@ class DAGScheduler( // will be posted, which should always come after a corresponding SparkListenerStageSubmitted // event. stage.latestInfo = StageInfo.fromStage(stage, Some(partitionsToCompute.size)) +outputCommitCoordinator.stageStart(stage.id) --- End diff -- I think that this introduces a race between commit requests and the stage start event. If the listener bus is slow in delivering events, then it's possible that the output commit coordinator could receive a commit request via Akka for a stage that it doesn't know about yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5345][DEPLOY] Fix unstable test case in...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4133#issuecomment-72614051 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5549] Define TaskContext interface in S...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4324 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5345][DEPLOY] Fix unstable test case in...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4133#issuecomment-72614122 I think we've observed this test's flakiness even after fixing #4220, so we should continue investigating it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5345][DEPLOY] Fix unstable test case in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4133#issuecomment-72614113 [Test build #26640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26640/consoleFull) for PR 4133 at commit [`77678fe`](https://github.com/apache/spark/commit/77678fe01372ec272005900ae701ca553d0ee01e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4304#issuecomment-72614765 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26632/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4304#issuecomment-72614756 [Test build #26632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26632/consoleFull) for PR 4304 at commit [`8b29946`](https://github.com/apache/spark/commit/8b29946d0b1aabfcb393f71f9de6202cee51a39d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class PCA(val k: Int) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1405][MLLIB] LDA on GraphX
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2388#issuecomment-72616260 @witgo We've merged #4047 and closed this PR. Thanks for your contribution! Please create JIRAs and propose new features that can be added to the LDA implementation in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5550] [SQL] Support the case insensitiv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4326#issuecomment-72609568 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26625/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/4331 [SPARK-5554] [SQL] [PySpark] add more tests for DataFrame Add more tests and docs for DataFrame Python API, improve test coverage, fix bugs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark fix_df Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4331.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4331 commit 3ab26614b5278edce6e8571e5c51fe0b67e3124e Author: Davies Liu dav...@databricks.com Date: 2015-02-03T08:08:00Z add more tests for DataFrame --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5551][SQL] Create type alias for Schema...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4327#issuecomment-72610169 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26627/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5551][SQL] Create type alias for Schema...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4327#issuecomment-72610160 [Test build #26627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26627/consoleFull) for PR 4327 at commit [`e5a8ff3`](https://github.com/apache/spark/commit/e5a8ff3f7a5c6aebe84f180aca590d3ba41610f3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23988994 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/Leader.scala --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka + +import kafka.common.TopicAndPartition + +/** Host info for the leader of a Kafka TopicAndPartition */ +final class Leader private( +/** kafka topic name */ +val topic: String, +/** kafka partition id */ +val partition: Int, +/** kafka hostname */ +val host: String, +/** kafka host's port */ +val port: Int) extends Serializable + +object Leader { + def create(topic: String, partition: Int, host: String, port: Int): Leader = --- End diff -- Similar with offset ranges, can't we just have a single way to construct these? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23988976 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka + +import kafka.common.TopicAndPartition + +/** Something that has a collection of OffsetRanges */ +trait HasOffsetRanges { + def offsetRanges: Array[OffsetRange] +} + +/** Represents a range of offsets from a single Kafka TopicAndPartition */ +final class OffsetRange private( +/** kafka topic name */ +val topic: String, +/** kafka partition id */ +val partition: Int, +/** inclusive starting offset */ +val fromOffset: Long, +/** exclusive ending offset */ +val untilOffset: Long) extends Serializable { + import OffsetRange.OffsetRangeTuple + + /** this is to avoid ClassNotFoundException during checkpoint restore */ --- End diff -- This comment might be more helpful to include where `OffsetRangeTuple` is defined rather than here. I spent a long time trying to figure out why this extra class existed. Also, can you give a bit more detail. Not sure I see why you can't recover from a checkpoint safely provided that the recovering JVM has the class `OffsetRangeTuple` defined. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23989107 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka + +import kafka.common.TopicAndPartition + +/** Something that has a collection of OffsetRanges */ +trait HasOffsetRanges { + def offsetRanges: Array[OffsetRange] +} + +/** Represents a range of offsets from a single Kafka TopicAndPartition */ +final class OffsetRange private( +/** kafka topic name */ +val topic: String, +/** kafka partition id */ +val partition: Int, +/** inclusive starting offset */ +val fromOffset: Long, +/** exclusive ending offset */ +val untilOffset: Long) extends Serializable { + import OffsetRange.OffsetRangeTuple + + /** this is to avoid ClassNotFoundException during checkpoint restore */ + private[streaming] + def toTuple: OffsetRangeTuple = (topic, partition, fromOffset, untilOffset) +} + +object OffsetRange { + private[spark] + type OffsetRangeTuple = (String, Int, Long, Long) --- End diff -- Can you group this at the bottom with the related `apply` method? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-72613283 [Test build #26629 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26629/consoleFull) for PR 4066 at commit [`63a7707`](https://github.com/apache/spark/commit/63a7707cad01f4dcc2c74c4a6bffded9c887f9d4). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `case class AskPermissionToCommitOutput(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5549] Define TaskContext interface in S...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4324#issuecomment-72613282 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26633/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-72613292 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26629/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5549] Define TaskContext interface in S...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4324#issuecomment-72613272 [Test build #26633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26633/consoleFull) for PR 4324 at commit [`2480a17`](https://github.com/apache/spark/commit/2480a177df2023238778fb217fb9b3e4794a9b82). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4325#issuecomment-72613604 [Test build #26639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26639/consoleFull) for PR 4325 at commit [`e46735c`](https://github.com/apache/spark/commit/e46735c612879bb46317efe73155b4611bb51afc). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4795][Core] Redesign the primitive typ...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/3642#issuecomment-72614319 @zsxwing if we merge this one, are there any other usecases for importing SparkContext._ ? There will be no implicit methods/objects in SparkContext object. So people don't need to import `SparkContext._` for implicit methods/objects. Since the compatibility is very important, it's better to get more pairs of eyes to look at it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23989943 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,249 @@ object KafkaUtils { createStream[K, V, U, T]( jssc.ssc, kafkaParams.toMap, Map(topics.mapValues(_.intValue()).toSeq: _*), storageLevel) } + + /** A batch-oriented interface for consuming from Kafka. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param sc SparkContext object + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + * @param batch Each OffsetRange in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + */ + @Experimental + def createRDD[ +K: ClassTag, +V: ClassTag, +U : Decoder[_]: ClassTag, +T : Decoder[_]: ClassTag, +R: ClassTag] ( --- End diff -- Isn't the returned RDD of type `RDD[R]`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/4325#discussion_r23990278 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DateUtils.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types + +import java.sql.Date +import java.util.{Calendar, TimeZone} + +import org.apache.spark.sql.catalyst.expressions.Cast + +/** + * helper function to convert between Int value of days since 1970-01-01 and java.sql.Date + */ +object DateUtils { --- End diff -- Code gen need that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4325#issuecomment-72615808 [Test build #26642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26642/consoleFull) for PR 4325 at commit [`096e20d`](https://github.com/apache/spark/commit/096e20d5de068157910372a03a6face9edc829e6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/4325#issuecomment-72615722 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-72744869 [Test build #26677 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26677/consoleFull) for PR 4066 at commit [`dd00b7c`](https://github.com/apache/spark/commit/dd00b7c83fd0a4fa1cbd9115f2e0a8e69bc519b9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4331#issuecomment-72745468 [Test build #26672 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26672/consoleFull) for PR 4331 at commit [`467332c`](https://github.com/apache/spark/commit/467332cacca8754f04271a70bbaf15c8f2afd5c6). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class Dsl(object):` * `class ExamplePointUDT(UserDefinedType):` * `class SQLTests(ReusedPySparkTestCase):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5554] [SQL] [PySpark] add more tests fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4331#issuecomment-72745480 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26672/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org