[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88612839 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -33,6 +33,79 @@ import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData} import org.apache.spark.sql.types._ /** + * Common base class for [[StaticInvoke]], [[Invoke]], and [[NewInstance]]. + */ +trait InvokeLike extends Expression with NonSQLExpression { + + def arguments: Seq[Expression] + + def propagateNull: Boolean + + protected lazy val needNullCheck: Boolean = propagateNull && arguments.exists(_.nullable) + + /** + * Prepares codes for arguments. + * + * - generate codes for argument. + * - use ctx.splitExpressions() to not exceed 64kb JVM limit while preparing arguments. + * - avoid some of nullabilty checking which are not needed because the expression is not + * nullable. + * - when needNullCheck == true, short circuit if we found one of arguments is null because + * preparing rest of arguments can be skipped in the case. + * + * @param ctx a [[CodegenContext]] + * @param ev an [[ExprCode]] with unique terms. + * @return (code to prepare arguments, argument string, result of argument null check) + */ + def prepareArguments(ctx: CodegenContext, ev: ExprCode): (String, String, String) = { --- End diff -- Thanks, removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15889: [SPARK-18445][BUILD][DOCS] Fix the markdown for `...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15889#discussion_r88611836 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala --- @@ -234,6 +234,9 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)]) * In addition, users can control the partitioning of the output RDD, the serializer that is use * for the shuffle, and whether to perform map-side aggregation (if a mapper can produce multiple * items with the same key). + * + * @note V and C can be different -- for example, one might group a RDD of type (Int, Int) into + * an RDD of type (Int, List[Int]). --- End diff -- It seems fine now. ![2016-11-18 4 47 33](https://cloud.githubusercontent.com/assets/6477701/20422606/c57c1618-adae-11e6-8d65-b6f15a9ddb3a.png) ![2016-11-18 4 47 17](https://cloud.githubusercontent.com/assets/6477701/20422605/c579f63a-adae-11e6-98ca-b072924be75a.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r88611024 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -574,7 +574,8 @@ object ScalaReflection extends ScalaReflection { "cannot be used as field name\n" + walkedTypePath.mkString("\n")) } - val fieldValue = Invoke(inputObject, fieldName, dataTypeFor(fieldType)) + val fieldValue = Invoke(inputObject, fieldName, dataTypeFor(fieldType), +returnNullable = inputObject.nullable || !fieldType.typeSymbol.asClass.isPrimitive) --- End diff -- we use just write `returnNullable = !fieldType.typeSymbol.asClass.isPrimitive` and check `inputObject.nullable` in `Invoke.nullable` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15913 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68829/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15913 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15913 **[Test build #68829 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68829/consoleFull)** for PR 15913 at commit [`9a8c926`](https://github.com/apache/spark/commit/9a8c92691e7ec0b9d37eed0cf6f9dbcc4d4d622f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15620: [SPARK-18091] [SQL] Deep if expressions cause Generated ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15620 **[Test build #68832 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68832/consoleFull)** for PR 15620 at commit [`6059572`](https://github.com/apache/spark/commit/60595725e009b3ab5839e18694c96f1acf1c19b7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15620: [SPARK-18091] [SQL] Deep if expressions cause Generated ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15620 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15644: [SPARK-18117][CORE] Add test for TaskSetBlacklist
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/15644#discussion_r88609776 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala --- @@ -282,6 +317,188 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B assert(!failedTaskSet) } + test("scheduled tasks obey task and stage blacklists") { +taskScheduler = setupSchedulerWithMockTaskSetBlacklist() +(0 to 2).foreach {stageId => + val taskSet = FakeTask.createTaskSet(numTasks = 2, stageId = stageId, stageAttemptId = 0) + taskScheduler.submitTasks(taskSet) +} + +// Setup our mock blacklist: +// * stage 0 is blacklisted on node "host1" +// * stage 1 is blacklisted on executor "executor3" +// * stage 0, partition 0 is blacklisted on executor 0 +// (mocked methods default to returning false, ie. no blacklisting) + when(stageToMockTaskSetBlacklist(0).isNodeBlacklistedForTaskSet("host1")).thenReturn(true) + when(stageToMockTaskSetBlacklist(1).isExecutorBlacklistedForTaskSet("executor3")) + .thenReturn(true) + when(stageToMockTaskSetBlacklist(0).isExecutorBlacklistedForTask("executor0", 0)) + .thenReturn(true) + +val offers = IndexedSeq( + new WorkerOffer("executor0", "host0", 1), + new WorkerOffer("executor1", "host1", 1), + new WorkerOffer("executor2", "host1", 1), + new WorkerOffer("executor3", "host2", 10) +) +val firstTaskAttempts = taskScheduler.resourceOffers(offers).flatten +// We should schedule all tasks. +assert(firstTaskAttempts.size === 6) +// Whenever we schedule a task, we must consult the node and executor blacklist. (The test +// doesn't check exactly what checks are made because the offers get shuffled.) +(0 to 2).foreach { stageId => + verify(stageToMockTaskSetBlacklist(stageId), atLeast(1)) +.isNodeBlacklistedForTaskSet(anyString()) + verify(stageToMockTaskSetBlacklist(stageId), atLeast(1)) +.isExecutorBlacklistedForTaskSet(anyString()) +} + +def tasksForStage(stageId: Int): Seq[TaskDescription] = { + firstTaskAttempts.filter{_.name.contains(s"stage $stageId")} +} +tasksForStage(0).foreach { task => + // executors 1 & 2 blacklisted for node + // executor 0 blacklisted just for partition 0 + if (task.index == 0) { +assert(task.executorId === "executor3") + } else { +assert(Set("executor0", "executor3").contains(task.executorId)) + } +} +tasksForStage(1).foreach { task => + // executor 3 blacklisted + assert("executor3" != task.executorId) +} +// no restrictions on stage 2 + +// Finally, just make sure that we can still complete tasks as usual with blacklisting +// in effect. Finish each of the tasksets -- taskset 0 & 1 complete successfully, taskset 2 +// fails. +(0 to 2).foreach { stageId => + val tasks = tasksForStage(stageId) + val tsm = taskScheduler.taskSetManagerForAttempt(stageId, 0).get + val valueSer = SparkEnv.get.serializer.newInstance() + if (stageId == 2) { +// Just need to make one task fail 4 times. +var task = tasks(0) +val taskIndex = task.index +(0 until 4).foreach { attempt => + assert(task.attemptNumber === attempt) + tsm.handleFailedTask(task.taskId, TaskState.FAILED, TaskResultLost) + val nextAttempts = + taskScheduler.resourceOffers(IndexedSeq(WorkerOffer("executor4", "host4", 1))).flatten + if (attempt < 3) { +assert(nextAttempts.size === 1) +task = nextAttempts(0) +assert(task.index === taskIndex) + } else { +assert(nextAttempts.size === 0) + } +} +// End the other task of the taskset, doesn't matter whether it succeeds or fails. +val otherTask = tasks(1) +val result = new DirectTaskResult[Int](valueSer.serialize(otherTask.taskId), Seq()) +tsm.handleSuccessfulTask(otherTask.taskId, result) + } else { +tasks.foreach { task => + val result = new DirectTaskResult[Int](valueSer.serialize(task.taskId), Seq()) + tsm.handleSuccessfulTask(task.taskId, result) +} + } + assert(tsm.isZombie) +} + } + + /** + * Helper for performance tests. Takes the explicitly blacklisted nodes and executors; verifies + * that the blacklists are used efficiently to ensure scheduling is not O(numPendingTasks).
[GitHub] spark issue #15852: Spark-18187 [SQL] CompactibleFileStreamLog should not us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15852 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15852: Spark-18187 [SQL] CompactibleFileStreamLog should not us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15852 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68827/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15852: Spark-18187 [SQL] CompactibleFileStreamLog should not us...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15852 **[Test build #68827 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68827/consoleFull)** for PR 15852 at commit [`6537e7c`](https://github.com/apache/spark/commit/6537e7cf6186f6fcd4a5d21832572560cbd72895). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15644: [SPARK-18117][CORE] Add test for TaskSetBlacklist
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/15644#discussion_r88609206 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala --- @@ -282,6 +317,188 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B assert(!failedTaskSet) } + test("scheduled tasks obey task and stage blacklists") { +taskScheduler = setupSchedulerWithMockTaskSetBlacklist() +(0 to 2).foreach {stageId => + val taskSet = FakeTask.createTaskSet(numTasks = 2, stageId = stageId, stageAttemptId = 0) + taskScheduler.submitTasks(taskSet) +} + +// Setup our mock blacklist: +// * stage 0 is blacklisted on node "host1" +// * stage 1 is blacklisted on executor "executor3" +// * stage 0, partition 0 is blacklisted on executor 0 +// (mocked methods default to returning false, ie. no blacklisting) + when(stageToMockTaskSetBlacklist(0).isNodeBlacklistedForTaskSet("host1")).thenReturn(true) + when(stageToMockTaskSetBlacklist(1).isExecutorBlacklistedForTaskSet("executor3")) + .thenReturn(true) + when(stageToMockTaskSetBlacklist(0).isExecutorBlacklistedForTask("executor0", 0)) + .thenReturn(true) + +val offers = IndexedSeq( + new WorkerOffer("executor0", "host0", 1), + new WorkerOffer("executor1", "host1", 1), + new WorkerOffer("executor2", "host1", 1), + new WorkerOffer("executor3", "host2", 10) +) +val firstTaskAttempts = taskScheduler.resourceOffers(offers).flatten +// We should schedule all tasks. +assert(firstTaskAttempts.size === 6) +// Whenever we schedule a task, we must consult the node and executor blacklist. (The test +// doesn't check exactly what checks are made because the offers get shuffled.) +(0 to 2).foreach { stageId => + verify(stageToMockTaskSetBlacklist(stageId), atLeast(1)) +.isNodeBlacklistedForTaskSet(anyString()) + verify(stageToMockTaskSetBlacklist(stageId), atLeast(1)) +.isExecutorBlacklistedForTaskSet(anyString()) +} + +def tasksForStage(stageId: Int): Seq[TaskDescription] = { + firstTaskAttempts.filter{_.name.contains(s"stage $stageId")} +} +tasksForStage(0).foreach { task => + // executors 1 & 2 blacklisted for node + // executor 0 blacklisted just for partition 0 + if (task.index == 0) { +assert(task.executorId === "executor3") + } else { +assert(Set("executor0", "executor3").contains(task.executorId)) + } +} +tasksForStage(1).foreach { task => + // executor 3 blacklisted + assert("executor3" != task.executorId) +} +// no restrictions on stage 2 + +// Finally, just make sure that we can still complete tasks as usual with blacklisting +// in effect. Finish each of the tasksets -- taskset 0 & 1 complete successfully, taskset 2 +// fails. +(0 to 2).foreach { stageId => + val tasks = tasksForStage(stageId) + val tsm = taskScheduler.taskSetManagerForAttempt(stageId, 0).get + val valueSer = SparkEnv.get.serializer.newInstance() + if (stageId == 2) { +// Just need to make one task fail 4 times. +var task = tasks(0) +val taskIndex = task.index +(0 until 4).foreach { attempt => + assert(task.attemptNumber === attempt) + tsm.handleFailedTask(task.taskId, TaskState.FAILED, TaskResultLost) + val nextAttempts = + taskScheduler.resourceOffers(IndexedSeq(WorkerOffer("executor4", "host4", 1))).flatten + if (attempt < 3) { +assert(nextAttempts.size === 1) +task = nextAttempts(0) +assert(task.index === taskIndex) + } else { +assert(nextAttempts.size === 0) + } +} +// End the other task of the taskset, doesn't matter whether it succeeds or fails. +val otherTask = tasks(1) +val result = new DirectTaskResult[Int](valueSer.serialize(otherTask.taskId), Seq()) +tsm.handleSuccessfulTask(otherTask.taskId, result) + } else { +tasks.foreach { task => + val result = new DirectTaskResult[Int](valueSer.serialize(task.taskId), Seq()) + tsm.handleSuccessfulTask(task.taskId, result) +} + } + assert(tsm.isZombie) +} + } + + /** + * Helper for performance tests. Takes the explicitly blacklisted nodes and executors; verifies + * that the blacklists are used efficiently to ensure scheduling is not
[GitHub] spark pull request #15644: [SPARK-18117][CORE] Add test for TaskSetBlacklist
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/15644#discussion_r88608752 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala --- @@ -282,6 +317,188 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B assert(!failedTaskSet) } + test("scheduled tasks obey task and stage blacklists") { +taskScheduler = setupSchedulerWithMockTaskSetBlacklist() +(0 to 2).foreach {stageId => + val taskSet = FakeTask.createTaskSet(numTasks = 2, stageId = stageId, stageAttemptId = 0) + taskScheduler.submitTasks(taskSet) +} + +// Setup our mock blacklist: +// * stage 0 is blacklisted on node "host1" +// * stage 1 is blacklisted on executor "executor3" +// * stage 0, partition 0 is blacklisted on executor 0 +// (mocked methods default to returning false, ie. no blacklisting) + when(stageToMockTaskSetBlacklist(0).isNodeBlacklistedForTaskSet("host1")).thenReturn(true) + when(stageToMockTaskSetBlacklist(1).isExecutorBlacklistedForTaskSet("executor3")) + .thenReturn(true) + when(stageToMockTaskSetBlacklist(0).isExecutorBlacklistedForTask("executor0", 0)) + .thenReturn(true) + +val offers = IndexedSeq( + new WorkerOffer("executor0", "host0", 1), + new WorkerOffer("executor1", "host1", 1), + new WorkerOffer("executor2", "host1", 1), + new WorkerOffer("executor3", "host2", 10) +) +val firstTaskAttempts = taskScheduler.resourceOffers(offers).flatten +// We should schedule all tasks. +assert(firstTaskAttempts.size === 6) +// Whenever we schedule a task, we must consult the node and executor blacklist. (The test +// doesn't check exactly what checks are made because the offers get shuffled.) +(0 to 2).foreach { stageId => + verify(stageToMockTaskSetBlacklist(stageId), atLeast(1)) +.isNodeBlacklistedForTaskSet(anyString()) + verify(stageToMockTaskSetBlacklist(stageId), atLeast(1)) +.isExecutorBlacklistedForTaskSet(anyString()) +} + +def tasksForStage(stageId: Int): Seq[TaskDescription] = { + firstTaskAttempts.filter{_.name.contains(s"stage $stageId")} +} +tasksForStage(0).foreach { task => + // executors 1 & 2 blacklisted for node + // executor 0 blacklisted just for partition 0 + if (task.index == 0) { +assert(task.executorId === "executor3") + } else { +assert(Set("executor0", "executor3").contains(task.executorId)) + } +} +tasksForStage(1).foreach { task => + // executor 3 blacklisted + assert("executor3" != task.executorId) +} +// no restrictions on stage 2 + +// Finally, just make sure that we can still complete tasks as usual with blacklisting +// in effect. Finish each of the tasksets -- taskset 0 & 1 complete successfully, taskset 2 +// fails. +(0 to 2).foreach { stageId => + val tasks = tasksForStage(stageId) + val tsm = taskScheduler.taskSetManagerForAttempt(stageId, 0).get + val valueSer = SparkEnv.get.serializer.newInstance() + if (stageId == 2) { +// Just need to make one task fail 4 times. +var task = tasks(0) +val taskIndex = task.index +(0 until 4).foreach { attempt => + assert(task.attemptNumber === attempt) + tsm.handleFailedTask(task.taskId, TaskState.FAILED, TaskResultLost) + val nextAttempts = + taskScheduler.resourceOffers(IndexedSeq(WorkerOffer("executor4", "host4", 1))).flatten + if (attempt < 3) { +assert(nextAttempts.size === 1) +task = nextAttempts(0) +assert(task.index === taskIndex) + } else { +assert(nextAttempts.size === 0) + } +} +// End the other task of the taskset, doesn't matter whether it succeeds or fails. +val otherTask = tasks(1) +val result = new DirectTaskResult[Int](valueSer.serialize(otherTask.taskId), Seq()) +tsm.handleSuccessfulTask(otherTask.taskId, result) + } else { +tasks.foreach { task => + val result = new DirectTaskResult[Int](valueSer.serialize(task.taskId), Seq()) + tsm.handleSuccessfulTask(task.taskId, result) +} + } + assert(tsm.isZombie) +} + } + + /** + * Helper for performance tests. Takes the explicitly blacklisted nodes and executors; verifies + * that the blacklists are used efficiently to ensure scheduling is not O(numPendingTasks).
[GitHub] spark issue #15927: [SPARK-18500][SQL] Make GenericStrategy be able to prune...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15927 **[Test build #68831 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68831/consoleFull)** for PR 15927 at commit [`89bf202`](https://github.com/apache/spark/commit/89bf20275283d7690df4951e0a6ae54b941c78e4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15927: [SPARK-18500][SQL] Make GenericStrategy be able to prune...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/15927 @marmbrus Could you review this pr please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15927: [SPARK-18500][SQL] Make GenericStrategy be able t...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/15927 [SPARK-18500][SQL] Make GenericStrategy be able to prune plans by itself after placeholders are replaced. ## What changes were proposed in this pull request? This pr adds a functionality to `GenericStrategy` to be able to prune bad physical plans by itself after their placeholders are replaced. ## How was this patch tested? Added a test to check if the strategy can prune plans by itself. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-18500 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15927.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15927 commit 80ca645f6ebbbc5b4de0a9ff8196f5c08a44a825 Author: Takuya UESHINDate: 2016-11-18T04:19:47Z Make GenericStrategy be able to prune plans by itself after placeholders are replaced. commit 526973cfb1cb14052359d71ebf12409c78344cf5 Author: Takuya UESHIN Date: 2016-11-18T04:19:58Z Add a test to check if the strategy can prune plans by itself. commit 89bf20275283d7690df4951e0a6ae54b941c78e4 Author: Takuya UESHIN Date: 2016-11-18T04:31:31Z Modify to set back to original strategies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15644: [SPARK-18117][CORE] Add test for TaskSetBlacklist
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/15644#discussion_r88607110 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala --- @@ -282,6 +317,188 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B assert(!failedTaskSet) } + test("scheduled tasks obey task and stage blacklists") { +taskScheduler = setupSchedulerWithMockTaskSetBlacklist() +(0 to 2).foreach {stageId => + val taskSet = FakeTask.createTaskSet(numTasks = 2, stageId = stageId, stageAttemptId = 0) + taskScheduler.submitTasks(taskSet) +} + +// Setup our mock blacklist: +// * stage 0 is blacklisted on node "host1" +// * stage 1 is blacklisted on executor "executor3" +// * stage 0, partition 0 is blacklisted on executor 0 +// (mocked methods default to returning false, ie. no blacklisting) + when(stageToMockTaskSetBlacklist(0).isNodeBlacklistedForTaskSet("host1")).thenReturn(true) + when(stageToMockTaskSetBlacklist(1).isExecutorBlacklistedForTaskSet("executor3")) + .thenReturn(true) + when(stageToMockTaskSetBlacklist(0).isExecutorBlacklistedForTask("executor0", 0)) + .thenReturn(true) + +val offers = IndexedSeq( + new WorkerOffer("executor0", "host0", 1), + new WorkerOffer("executor1", "host1", 1), + new WorkerOffer("executor2", "host1", 1), + new WorkerOffer("executor3", "host2", 10) +) +val firstTaskAttempts = taskScheduler.resourceOffers(offers).flatten +// We should schedule all tasks. +assert(firstTaskAttempts.size === 6) +// Whenever we schedule a task, we must consult the node and executor blacklist. (The test +// doesn't check exactly what checks are made because the offers get shuffled.) +(0 to 2).foreach { stageId => + verify(stageToMockTaskSetBlacklist(stageId), atLeast(1)) +.isNodeBlacklistedForTaskSet(anyString()) + verify(stageToMockTaskSetBlacklist(stageId), atLeast(1)) +.isExecutorBlacklistedForTaskSet(anyString()) +} + +def tasksForStage(stageId: Int): Seq[TaskDescription] = { + firstTaskAttempts.filter{_.name.contains(s"stage $stageId")} +} +tasksForStage(0).foreach { task => + // executors 1 & 2 blacklisted for node + // executor 0 blacklisted just for partition 0 + if (task.index == 0) { +assert(task.executorId === "executor3") + } else { +assert(Set("executor0", "executor3").contains(task.executorId)) + } +} +tasksForStage(1).foreach { task => + // executor 3 blacklisted + assert("executor3" != task.executorId) +} +// no restrictions on stage 2 + +// Finally, just make sure that we can still complete tasks as usual with blacklisting +// in effect. Finish each of the tasksets -- taskset 0 & 1 complete successfully, taskset 2 +// fails. +(0 to 2).foreach { stageId => + val tasks = tasksForStage(stageId) + val tsm = taskScheduler.taskSetManagerForAttempt(stageId, 0).get + val valueSer = SparkEnv.get.serializer.newInstance() + if (stageId == 2) { +// Just need to make one task fail 4 times. +var task = tasks(0) +val taskIndex = task.index +(0 until 4).foreach { attempt => + assert(task.attemptNumber === attempt) + tsm.handleFailedTask(task.taskId, TaskState.FAILED, TaskResultLost) + val nextAttempts = + taskScheduler.resourceOffers(IndexedSeq(WorkerOffer("executor4", "host4", 1))).flatten + if (attempt < 3) { +assert(nextAttempts.size === 1) +task = nextAttempts(0) +assert(task.index === taskIndex) + } else { +assert(nextAttempts.size === 0) + } +} +// End the other task of the taskset, doesn't matter whether it succeeds or fails. +val otherTask = tasks(1) +val result = new DirectTaskResult[Int](valueSer.serialize(otherTask.taskId), Seq()) +tsm.handleSuccessfulTask(otherTask.taskId, result) + } else { +tasks.foreach { task => + val result = new DirectTaskResult[Int](valueSer.serialize(task.taskId), Seq()) + tsm.handleSuccessfulTask(task.taskId, result) +} + } + assert(tsm.isZombie) +} + } + + /** + * Helper for performance tests. Takes the explicitly blacklisted nodes and executors; verifies + * that the blacklists are used efficiently to ensure scheduling is not O(numPendingTasks).
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user dasbipulkumar commented on the issue: https://github.com/apache/spark/pull/15880 yes. As a Dev we can handle these things, but as Spark Sql is moving towards more abstracted APIs, it can be painful for general users. I hope that handing these issues can be planned in upcoming major releases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15620: [SPARK-18091] [SQL] Deep if expressions cause Generated ...
Github user kapilsingh5050 commented on the issue: https://github.com/apache/spark/pull/15620 I ran the tests on my machine and they passed. @cloud-fan Can you please trigger a re-run of test build? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15907: [SPARK-18458][CORE] Fix signed integer overflow p...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/15907#discussion_r88604890 --- Diff: core/src/test/scala/org/apache/spark/util/collection/unsafe/sort/RadixSortSuite.scala --- @@ -73,22 +73,22 @@ class RadixSortSuite extends SparkFunSuite with Logging { }, 2, 4, false, false, true)) - private def generateTestData(size: Int, rand: => Long): (Array[JLong], LongArray) = { -val ref = Array.tabulate[Long](size) { i => rand } -val extended = ref ++ Array.fill[Long](size)(0) + private def generateTestData(size: Long, rand: => Long): (Array[JLong], LongArray) = { +val ref = Array.tabulate[Long](size.toInt) { i => rand } +val extended = ref ++ Array.fill[Long](size.toInt)(0) (ref.map(i => new JLong(i)), new LongArray(MemoryBlock.fromLongArray(extended))) } - private def generateKeyPrefixTestData(size: Int, rand: => Long): (LongArray, LongArray) = { -val ref = Array.tabulate[Long](size * 2) { i => rand } -val extended = ref ++ Array.fill[Long](size * 2)(0) + private def generateKeyPrefixTestData(size: Long, rand: => Long): (LongArray, LongArray) = { +val ref = Array.tabulate[Long]((size * 2).toInt) { i => rand } +val extended = ref ++ Array.fill[Long]((size * 2).toInt)(0) (new LongArray(MemoryBlock.fromLongArray(ref)), new LongArray(MemoryBlock.fromLongArray(extended))) } - private def collectToArray(array: LongArray, offset: Int, length: Int): Array[Long] = { + private def collectToArray(array: LongArray, offset: Int, length: Long): Array[Long] = { var i = 0 -val out = new Array[Long](length) +val out = new Array[Long](length.toInt) --- End diff -- Sorry, I put this comment before pushed. Now, you can see [it](https://github.com/apache/spark/pull/15907/commits/3f9efdb036edb72993a6ee64f326ccab3da695ac#diff-c5436dae59341d3506e442c3eb7e07e7L91) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15780 **[Test build #68830 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68830/consoleFull)** for PR 15780 at commit [`e0d1fda`](https://github.com/apache/spark/commit/e0d1fdafc3e8944e5bf00fec342bb6872d075222). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15921: [SPARK-18493] Add missing python APIs: withWatermark and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15921 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68826/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15921: [SPARK-18493] Add missing python APIs: withWatermark and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15921 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15921: [SPARK-18493] Add missing python APIs: withWatermark and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15921 **[Test build #68826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68826/consoleFull)** for PR 15921 at commit [`da5de14`](https://github.com/apache/spark/commit/da5de1457fd5613847009a43dc7480cc62fedb95). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15913 **[Test build #68829 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68829/consoleFull)** for PR 15913 at commit [`9a8c926`](https://github.com/apache/spark/commit/9a8c92691e7ec0b9d37eed0cf6f9dbcc4d4d622f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15913 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15925: [SPARK-18436][SQL]isin with a empty list throw exception
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/15925 yes, I think it is better they are consistent. And exception should be throwed before connect to the sql server. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15889: [SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15889 **[Test build #68828 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68828/consoleFull)** for PR 15889 at commit [`1ef229a`](https://github.com/apache/spark/commit/1ef229aa72e63e7a9ad95d1412f2ce122c3a6d6e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15916 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r88602061 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +331,12 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val chunkSize = if (initialMemoryThreshold >= Integer.MAX_VALUE) { --- End diff -- Thanks for your feedback. Let us listen to @joshrosen 's advice --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15916 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68824/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15916 **[Test build #68824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68824/consoleFull)** for PR 15916 at commit [`fa1d1fd`](https://github.com/apache/spark/commit/fa1d1fd8736ba8297de30121acd392ef270b3704). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15907: [SPARK-18458][CORE] Fix signed integer overflow problem ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15907 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15907: [SPARK-18458][CORE] Fix signed integer overflow problem ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15907 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68821/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15898: [SPARK-18457][SQL] ORC and other columnar formats...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15898#discussion_r88601375 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala --- @@ -577,4 +579,28 @@ class OrcQuerySuite extends QueryTest with BeforeAndAfterAll with OrcTest { assert(spark.table(tableName).schema == schema.copy(fields = expectedFields)) } } + + test("Empty schema does not read data from ORC file") { +val data = Seq((1, 1), (2, 2)) +withOrcFile(data) { path => + val requestedSchema = StructType(Nil) + val conf = new Configuration() + val physicalSchema = OrcFileOperator.readSchema(Seq(path), Some(conf)).get + OrcRelation.setRequiredColumns(conf, physicalSchema, requestedSchema) + val maybeOrcReader = OrcFileOperator.getFileReader(path, Some(conf)) + assert(maybeOrcReader.isDefined) + val orcRecordReader = new SparkOrcNewRecordReader( +maybeOrcReader.get, conf, 0, maybeOrcReader.get.getContentLength) + + val recordsIterator = new RecordReaderIterator[OrcStruct](orcRecordReader) + try { +assert(recordsIterator.next().toString == "{null, null}") + } catch { +case e: Exception => fail(e) --- End diff -- why bother catching? the test case will fail anyway wouldn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15907: [SPARK-18458][CORE] Fix signed integer overflow problem ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15907 **[Test build #68821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68821/consoleFull)** for PR 15907 at commit [`f2e2079`](https://github.com/apache/spark/commit/f2e2079b7e43b1209b14d9689a3cca4e92d81b89). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15921: [SPARK-18493] Add missing python APIs: withWatermark and...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15921 @marmbrus Why don't we want to throw exceptions? Wouldn't it help users catch errors early. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15907: [SPARK-18458][CORE] Fix signed integer overflow problem ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15907 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68820/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15907: [SPARK-18458][CORE] Fix signed integer overflow problem ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15907 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15907: [SPARK-18458][CORE] Fix signed integer overflow problem ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15907 **[Test build #68820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68820/consoleFull)** for PR 15907 at commit [`3f9efdb`](https://github.com/apache/spark/commit/3f9efdb036edb72993a6ee64f326ccab3da695ac). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15899: [SPARK-18466] added withFilter method to RDD
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15899 I don't get it. The only thing you can do here is just a simple syntactic sugar, and the sugar doesn't even work in general. Isn't it more surprising to fail in some cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15925: [SPARK-18436][SQL]isin with a empty list throw exception
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15925 When we use Spark SQL interface, we issue an error. ``` spark-sql> select * from t1 where c1 in (); 16/11/17 21:08:08 INFO SparkSqlParser: Parsing command: select * from t1 where c1 in () Error in query: mismatched input 'from' expecting {, 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 9) ``` This is consistent with Hive. ``` hive> select * from t where col1 in (); NoViableAltException(307@[()* loopback of 438:5: ( ( KW_NOT precedenceEqualNegatableOperator notExpr= precedenceBitwiseOrExpression ) -> ^( KW_NOT ^( precedenceEqualNegatableOperator $precedenceEqualExpression $notExpr) ) | ( precedenceEqualOperator equalExpr= precedenceBitwiseOrExpression ) -> ^( precedenceEqualOperator $precedenceEqualExpression $equalExpr) | ( KW_NOT KW_IN LPAREN KW_SELECT )=> ( KW_NOT KW_IN subQueryExpression ) -> ^( KW_NOT ^( TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_IN ) subQueryExpression $precedenceEqualExpression) ) | ( KW_NOT KW_IN expressions ) -> ^( KW_NOT ^( TOK_FUNCTION KW_IN $precedenceEqualExpression expressions ) ) | ( KW_IN LPAREN KW_SELECT )=> ( KW_IN subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_IN ) subQueryExpression $precedenceEqualExpression) | ( KW_IN expressions ) -> ^( TOK_FUNCTION KW_IN $precedenceEqualExpression expressions ) | ( KW_NOT KW_BETWEEN (min= precedenceBitwiseOrExpression ) KW_AND (max= precedenceBitwiseO rExpression ) ) -> ^( TOK_FUNCTION Identifier["between"] KW_TRUE $left $min $max) | ( KW_BETWEEN (min= precedenceBitwiseOrExpression ) KW_AND (max= precedenceBitwiseOrExpression ) ) -> ^( TOK_FUNCTION Identifier["between"] KW_FALSE $left $min $max) )*]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA43.specialStateTransition(HiveParser_IdentifiersParser.java) ``` However, when we use DataFrame, no error is reported. Instead, we return an empty result set. ```Scala val df = Seq((1, "x"), (2, "y"), (3, "z")).toDF("a", "b") df.filter($"a".isin()).show() ``` @windpiger You want to make DataFrame APIs consistent with Spark SQL interface, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15926: [SPARK-16803] [SQL] SaveAsTable does not work when targe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15926 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68822/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15926: [SPARK-16803] [SQL] SaveAsTable does not work when targe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15926 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15926: [SPARK-16803] [SQL] SaveAsTable does not work when targe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15926 **[Test build #68822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68822/consoleFull)** for PR 15926 at commit [`061c0d3`](https://github.com/apache/spark/commit/061c0d34aa7b5b64b7fa490e969d11c7733bc26f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15852: Spark-18187 [SQL] CompactibleFileStreamLog should not us...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15852 **[Test build #68827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68827/consoleFull)** for PR 15852 at commit [`6537e7c`](https://github.com/apache/spark/commit/6537e7cf6186f6fcd4a5d21832572560cbd72895). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15916 Yeah, I see. As I said in previous comment, the memory is released at the end anyway. I would guess the default setting as true is to find potential memory leak during development. So turn it to false is a good idea? This patch is coming from #15874 which hits the exception by @sethah. Although `taskMemoryManager.cleanUpAllAllocatedMemory` can release memory for us, I think it is just a safety network. Operators should release memory themselves. If you still think this is not necessary, I can close this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15907: [SPARK-18458][CORE] Fix signed integer overflow problem ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15907 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68819/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15907: [SPARK-18458][CORE] Fix signed integer overflow problem ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15907 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15907: [SPARK-18458][CORE] Fix signed integer overflow problem ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15907 **[Test build #68819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68819/consoleFull)** for PR 15907 at commit [`022e5b3`](https://github.com/apache/spark/commit/022e5b319d95a066b04e2c418992073102b7d22a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15889: [SPARK-18445][BUILD][DOCS] Fix the markdown for `...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15889#discussion_r88599858 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala --- @@ -396,7 +396,7 @@ class JavaStreamingContext(val ssc: StreamingContext) extends Closeable { * Create an input stream from a queue of RDDs. In each batch, * it will process either one or all of the RDDs returned by the queue. * - * NOTE: + * @note --- End diff -- Yup, it seems keeping the original format at least. **Before** - Scala https://cloud.githubusercontent.com/assets/6477701/20419048/c067da80-ad96-11e6-84e1-6f54e6e559a7.png;> - Java https://cloud.githubusercontent.com/assets/6477701/20419049/c067f18c-ad96-11e6-9653-947e49f8b978.png;> **After** - Scala https://cloud.githubusercontent.com/assets/6477701/20419066/e5aefada-ad96-11e6-860c-d298f96bd18b.png;> - Java https://cloud.githubusercontent.com/assets/6477701/20419065/e5adbc88-ad96-11e6-8139-72059b3c7341.png;> --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15916 I'd set it maybe to false. You are just adding a completion listener, which is the same as taskMemoryManager.cleanUpAllAllocatedMemory anyway ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15925: [SPARK-18436][SQL]isin with a empty list throw exception
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15925 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68817/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15925: [SPARK-18436][SQL]isin with a empty list throw exception
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15925 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15925: [SPARK-18436][SQL]isin with a empty list throw exception
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15925 **[Test build #68817 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68817/consoleFull)** for PR 15925 at commit [`c3287ce`](https://github.com/apache/spark/commit/c3287ce769c68b1f8168f036435d50c6b0800851). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15916 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68818/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15916 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15916 **[Test build #68818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68818/consoleFull)** for PR 15916 at commit [`2f304f0`](https://github.com/apache/spark/commit/2f304f07efafb37eeae45a00ebd671fcedceb97a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15889: [SPARK-18445][BUILD][DOCS] Fix the markdown for `...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15889#discussion_r88598964 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -1014,7 +1015,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) * Output the RDD to any Hadoop-supported file system, using a Hadoop `OutputFormat` class * supporting the key and value types K and V in this RDD. * - * Note that, we should make sure our tasks are idempotent when speculation is enabled, i.e. do + * @note We should make sure our tasks are idempotent when speculation is enabled, i.e. do --- End diff -- Ah, I left and removed a useless comment. Yes, it seems it goes together. https://cloud.githubusercontent.com/assets/6477701/20418861/d2527450-ad94-11e6-9cac-fded62830d90.png;> https://cloud.githubusercontent.com/assets/6477701/20418862/d256caaa-ad94-11e6-9b4c-c0add3555d4a.png;> --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88598628 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -245,51 +311,35 @@ case class NewInstance( override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val javaType = ctx.javaType(dataType) -val argIsNulls = ctx.freshName("argIsNulls") -ctx.addMutableState("boolean[]", argIsNulls, - s"$argIsNulls = new boolean[${arguments.size}];") -val argValues = arguments.zipWithIndex.map { case (e, i) => - val argValue = ctx.freshName("argValue") - ctx.addMutableState(ctx.javaType(e.dataType), argValue, "") - argValue -} -val argCodes = arguments.zipWithIndex.map { case (e, i) => - val expr = e.genCode(ctx) - expr.code + s""" - $argIsNulls[$i] = ${expr.isNull}; - ${argValues(i)} = ${expr.value}; - """ -} -val argCode = ctx.splitExpressions(ctx.INPUT_ROW, argCodes) +val (argCode, argString, resultIsNull) = prepareArguments(ctx, ev) val outer = outerPointer.map(func => Literal.fromObject(func()).genCode(ctx)) var isNull = ev.isNull -val setIsNull = if (propagateNull && arguments.nonEmpty) { - s""" - boolean $isNull = false; - for (int idx = 0; idx < ${arguments.length}; idx++) { - if ($argIsNulls[idx]) { $isNull = true; break; } - } - """ +val prepareIsNull = if (needNullCheck) { + s"boolean $isNull = $resultIsNull;" } else { isNull = "false" "" } val constructorCall = outer.map { gen => - s"""${gen.value}.new ${cls.getSimpleName}(${argValues.mkString(", ")})""" + s"${gen.value}.new ${cls.getSimpleName}($argString)" }.getOrElse { - s"new $className(${argValues.mkString(", ")})" + s"new $className($argString)" } val code = s""" $argCode ${outer.map(_.code).getOrElse("")} - $setIsNull - final $javaType ${ev.value} = $isNull ? ${ctx.defaultValue(javaType)} : $constructorCall; - """ + $prepareIsNull +""" + + (if (needNullCheck) { +s"final $javaType ${ev.value} = $isNull ? ${ctx.defaultValue(javaType)} : $constructorCall;" --- End diff -- sorry I mean "java compiler"... I think janino is smart enough about it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68825/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68825/consoleFull)** for PR 15874 at commit [`2c264b7`](https://github.com/apache/spark/commit/2c264b7660d8be68428f573be67f2720ee9a3c51). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15924: [Spark-18498] [SQL] Revise HDFSMetadataLog API for bette...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15924 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15924: [Spark-18498] [SQL] Revise HDFSMetadataLog API for bette...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15924 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68815/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15924: [Spark-18498] [SQL] Revise HDFSMetadataLog API for bette...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15924 **[Test build #68815 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68815/consoleFull)** for PR 15924 at commit [`8e3d705`](https://github.com/apache/spark/commit/8e3d7051673f30863a938e8b06299b8aec227886). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15889: [SPARK-18445][BUILD][DOCS] Fix the markdown for `...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15889#discussion_r88597141 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -1014,7 +1015,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) * Output the RDD to any Hadoop-supported file system, using a Hadoop `OutputFormat` class * supporting the key and value types K and V in this RDD. * - * Note that, we should make sure our tasks are idempotent when speculation is enabled, i.e. do + * @note We should make sure our tasks are idempotent when speculation is enabled, i.e. do --- End diff -- Ah, I just checked that It seems both ``` We should make sure our tasks are idempotent when speculation is enabled, i.e. do not use output committer that writes data directly. ``` and ``` There is an example in https://issues.apache.org/jira/browse/SPARK-10063 to show the bad result of using direct output committer with speculation enabled. ``` are relevant (it seems both are related with a direct output committer). Also, the original documentation also concatenates both sentences. ![2016-11-18 1 05 24](https://cloud.githubusercontent.com/assets/6477701/20418388/605d112e-ad90-11e6-901c-403acdd26ee6.png) ![2016-11-18 1 05 55](https://cloud.githubusercontent.com/assets/6477701/20418387/605cc19c-ad90-11e6-82e0-afd576766010.png) So. let me leave it as is if it looks okay. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15921: [SPARK-18493] Add missing python APIs: withWatermark and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15921 **[Test build #68826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68826/consoleFull)** for PR 15921 at commit [`da5de14`](https://github.com/apache/spark/commit/da5de1457fd5613847009a43dc7480cc62fedb95). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68823/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68823/consoleFull)** for PR 15874 at commit [`257ef19`](https://github.com/apache/spark/commit/257ef1955696b937a0b53feb0ebde136f482dae1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88596091 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -245,51 +311,35 @@ case class NewInstance( override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val javaType = ctx.javaType(dataType) -val argIsNulls = ctx.freshName("argIsNulls") -ctx.addMutableState("boolean[]", argIsNulls, - s"$argIsNulls = new boolean[${arguments.size}];") -val argValues = arguments.zipWithIndex.map { case (e, i) => - val argValue = ctx.freshName("argValue") - ctx.addMutableState(ctx.javaType(e.dataType), argValue, "") - argValue -} -val argCodes = arguments.zipWithIndex.map { case (e, i) => - val expr = e.genCode(ctx) - expr.code + s""" - $argIsNulls[$i] = ${expr.isNull}; - ${argValues(i)} = ${expr.value}; - """ -} -val argCode = ctx.splitExpressions(ctx.INPUT_ROW, argCodes) +val (argCode, argString, resultIsNull) = prepareArguments(ctx, ev) val outer = outerPointer.map(func => Literal.fromObject(func()).genCode(ctx)) var isNull = ev.isNull -val setIsNull = if (propagateNull && arguments.nonEmpty) { - s""" - boolean $isNull = false; - for (int idx = 0; idx < ${arguments.length}; idx++) { - if ($argIsNulls[idx]) { $isNull = true; break; } - } - """ +val prepareIsNull = if (needNullCheck) { + s"boolean $isNull = $resultIsNull;" } else { isNull = "false" "" } val constructorCall = outer.map { gen => - s"""${gen.value}.new ${cls.getSimpleName}(${argValues.mkString(", ")})""" + s"${gen.value}.new ${cls.getSimpleName}($argString)" }.getOrElse { - s"new $className(${argValues.mkString(", ")})" + s"new $className($argString)" } val code = s""" $argCode ${outer.map(_.code).getOrElse("")} - $setIsNull - final $javaType ${ev.value} = $isNull ? ${ctx.defaultValue(javaType)} : $constructorCall; - """ + $prepareIsNull +""" + + (if (needNullCheck) { +s"final $javaType ${ev.value} = $isNull ? ${ctx.defaultValue(javaType)} : $constructorCall;" --- End diff -- Do we use janino instead of javac? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15717: [SPARK-17910][SQL] Allow users to update the comment of ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15717 Will review this PR tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15852: Spark-18187 [SQL] CompactibleFileStreamLog should not us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15852 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68816/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15852: Spark-18187 [SQL] CompactibleFileStreamLog should not us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15852 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15852: Spark-18187 [SQL] CompactibleFileStreamLog should not us...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15852 **[Test build #68816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68816/consoleFull)** for PR 15852 at commit [`dbd8b67`](https://github.com/apache/spark/commit/dbd8b6788aae2a563ad282931323f0b0c04d4d48). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15358: [SPARK-17783] [SQL] Hide Credentials in CREATE and DESC ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15358 cc @rxin @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15868: [SPARK-18413][SQL] Add `maxConnections` JDBCOption
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15868 > We cannot call repartition in SQL language environment This is a good point, we need to provide a workaround for sql users, and this PR did. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r88594560 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -603,7 +603,14 @@ case class ExternalMapToCatalyst private( override def foldable: Boolean = false - override def dataType: MapType = MapType(keyConverter.dataType, valueConverter.dataType) + override def dataType: MapType = { +val isPrimitiveType = valueType match { + case BooleanType | ByteType | ShortType | IntegerType | LongType | +FloatType | DoubleType => true + case _ => false +} +MapType(keyConverter.dataType, valueConverter.dataType, !isPrimitiveType) --- End diff -- Yes, it returns `true` now. I think that this is because `LambdaVariable.nullable` always return `true`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r88593822 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -603,7 +603,14 @@ case class ExternalMapToCatalyst private( override def foldable: Boolean = false - override def dataType: MapType = MapType(keyConverter.dataType, valueConverter.dataType) + override def dataType: MapType = { +val isPrimitiveType = valueType match { + case BooleanType | ByteType | ShortType | IntegerType | LongType | +FloatType | DoubleType => true + case _ => false +} +MapType(keyConverter.dataType, valueConverter.dataType, !isPrimitiveType) --- End diff -- it should be false, do we return true now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15901: [SPARK-18467][SQL] Extracts method for preparing argumen...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15901 LGTM except some minor comments, thanks for working on it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88593388 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -164,23 +233,20 @@ case class Invoke( """ } -val setIsNull = if (propagateNull && arguments.nonEmpty) { - s"boolean ${ev.isNull} = ${obj.isNull} || ${argGen.map(_.isNull).mkString(" || ")};" -} else { - s"boolean ${ev.isNull} = ${obj.isNull};" -} - // If the function can return null, we do an extra check to make sure our null bit is still set // correctly. val postNullCheck = if (ctx.defaultValue(dataType) == "null") { s"${ev.isNull} = ${ev.value} == null;" } else { "" } + val code = s""" ${obj.code} - ${argGen.map(_.code).mkString("\n")} - $setIsNull + if (!${obj.isNull}) { +$argCode + } + boolean ${ev.isNull} = ${obj.isNull} || $resultIsNull; --- End diff -- I know `resultIsNull` must be a member variable now, but in the future we may make it a local variable if the argument code is not so long to exceed the 64kb limit. Here why not we assume `resultIsNull` can be local variable and make us more robust? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15916 Of course it is not actually a real memory leak because the memory is released at the end by calling `taskMemoryManager.cleanUpAllAllocatedMemory` in `Executor`. But with `spark.unsafe.exceptionOnMemoryLeak` as true by default, we will see the exception. Or we just need to turn `spark.unsafe.exceptionOnMemoryLeak` to false by default? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68825/consoleFull)** for PR 15874 at commit [`2c264b7`](https://github.com/apache/spark/commit/2c264b7660d8be68428f573be67f2720ee9a3c51). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88593070 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -164,23 +233,20 @@ case class Invoke( """ } -val setIsNull = if (propagateNull && arguments.nonEmpty) { - s"boolean ${ev.isNull} = ${obj.isNull} || ${argGen.map(_.isNull).mkString(" || ")};" -} else { - s"boolean ${ev.isNull} = ${obj.isNull};" -} - // If the function can return null, we do an extra check to make sure our null bit is still set // correctly. val postNullCheck = if (ctx.defaultValue(dataType) == "null") { s"${ev.isNull} = ${ev.value} == null;" } else { "" } + val code = s""" ${obj.code} - ${argGen.map(_.code).mkString("\n")} - $setIsNull + if (!${obj.isNull}) { +$argCode + } + boolean ${ev.isNull} = ${obj.isNull} || $resultIsNull; --- End diff -- I'm afraid not because if evaluating arguments is split to some methods, `resultIsNull` can't be referred from the split methods. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15916 **[Test build #68824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68824/consoleFull)** for PR 15916 at commit [`fa1d1fd`](https://github.com/apache/spark/commit/fa1d1fd8736ba8297de30121acd392ef270b3704). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15921: [SPARK-18493] Add missing python APIs: withWatermark and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15921 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15921: [SPARK-18493] Add missing python APIs: withWatermark and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15921 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68814/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15921: [SPARK-18493] Add missing python APIs: withWatermark and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15921 **[Test build #68814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68814/consoleFull)** for PR 15921 at commit [`7d7bc4d`](https://github.com/apache/spark/commit/7d7bc4d9c77a952c51db40e3070e5c50bf2d88bd). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88592721 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -164,23 +233,20 @@ case class Invoke( """ } -val setIsNull = if (propagateNull && arguments.nonEmpty) { - s"boolean ${ev.isNull} = ${obj.isNull} || ${argGen.map(_.isNull).mkString(" || ")};" -} else { - s"boolean ${ev.isNull} = ${obj.isNull};" -} - // If the function can return null, we do an extra check to make sure our null bit is still set // correctly. val postNullCheck = if (ctx.defaultValue(dataType) == "null") { s"${ev.isNull} = ${ev.value} == null;" } else { "" } + val code = s""" ${obj.code} - ${argGen.map(_.code).mkString("\n")} - $setIsNull + if (!${obj.isNull}) { +$argCode + } + boolean ${ev.isNull} = ${obj.isNull} || $resultIsNull; --- End diff -- this assumes `resultIsNull` is a member variable not local variable, can we avoid doing this? We can make the code clearer and more robust like: ``` javaType ${ev.value} = defaultValue; boolean ${ev.isNull} = true; if (!${obj.isNull}) { $argCode ${ev.isNull} = $resultIsNull ... $postNullCheck } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15916 @rxin An exception will be thrown, not just a warning message. This exception is thrown at `Executor` after it calls `taskMemoryManager.cleanUpAllAllocatedMemory` and finds there are memory not released after the task completion. The exception looks like: [info] - SPARK-18487: Consume all elements for show/take to avoid memory leak *** FAILED *** (1 second, 73 milliseconds) [info] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 179.0 failed 1 times, most recent failure: Lost task 0.0 in stage 179.0 (TID 501, localhost, executor driver): org.apache.spark.SparkException: Managed memory leak detected; size = 33816576 bytes, TID = 501 [info] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:295) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [info] at java.lang.Thread.run(Thread.java:745) [info] [info] Driver stacktrace: [info] at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentS tages(DAGScheduler.scala:1436) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1424) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423) [info] at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) [info] at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) [info] at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1423) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) [info] at scala.Option.foreach(Option.scala:257) ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r88592405 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -603,7 +603,14 @@ case class ExternalMapToCatalyst private( override def foldable: Boolean = false - override def dataType: MapType = MapType(keyConverter.dataType, valueConverter.dataType) + override def dataType: MapType = { +val isPrimitiveType = valueType match { + case BooleanType | ByteType | ShortType | IntegerType | LongType | +FloatType | DoubleType => true + case _ => false +} +MapType(keyConverter.dataType, valueConverter.dataType, !isPrimitiveType) --- End diff -- `(Tuple1(Map(2 -> 3)) :: Nil).toDF("m")` returns `true` in `valueConverter.nullable`. Am I wrong? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68823/consoleFull)** for PR 15874 at commit [`257ef19`](https://github.com/apache/spark/commit/257ef1955696b937a0b53feb0ebde136f482dae1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15925: [SPARK-18436][SQL]isin with a empty list throw exception
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/15925 not generate isin means that `select * from test where key in()` is equal to `select * from test` ? I think this is not we expected behavior ,we should not cover this situation by ignore it. And spark.sql("") also throw a exception . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15916 What do you mean by "report"? A warning message was logged? If a warning message was logged, it is generated by the callback itself which just releases the memory. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15922: [SPARK-18462] Fix ClassCastException in SparkList...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15922 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15901: [SPARK-18467][SQL] Extracts method for preparing ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15901#discussion_r88591775 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -33,6 +33,79 @@ import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData} import org.apache.spark.sql.types._ /** + * Common base class for [[StaticInvoke]], [[Invoke]], and [[NewInstance]]. + */ +trait InvokeLike extends Expression with NonSQLExpression { + + def arguments: Seq[Expression] + + def propagateNull: Boolean + + protected lazy val needNullCheck: Boolean = propagateNull && arguments.exists(_.nullable) + + /** + * Prepares codes for arguments. + * + * - generate codes for argument. + * - use ctx.splitExpressions() to not exceed 64kb JVM limit while preparing arguments. + * - avoid some of nullabilty checking which are not needed because the expression is not + * nullable. + * - when needNullCheck == true, short circuit if we found one of arguments is null because + * preparing rest of arguments can be skipped in the case. + * + * @param ctx a [[CodegenContext]] + * @param ev an [[ExprCode]] with unique terms. + * @return (code to prepare arguments, argument string, result of argument null check) + */ + def prepareArguments(ctx: CodegenContext, ev: ExprCode): (String, String, String) = { --- End diff -- `ev` is not used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15922: [SPARK-18462] Fix ClassCastException in SparkListenerDri...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15922 Merging in master/branch-2.1/branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15916 @rxin Yeah, the added test case will report memory leak failure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org