[GitHub] spark issue #14850: [SPARK-17279][SQL] better error message for exceptions d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14850 **[Test build #64833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64833/consoleFull)** for PR 14850 at commit [`4efb6fc`](https://github.com/apache/spark/commit/4efb6fc86b5fdcb7023aa9a2ee7063d4338b51da). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r77296679 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -0,0 +1,503 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mock.MockitoSugar + +import org.apache.spark._ +import org.apache.spark.internal.config +import org.apache.spark.util.ManualClock + +class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with MockitoSugar +with LocalSparkContext { + + private val clock = new ManualClock(0) + + private var blacklistTracker: BlacklistTracker = _ + + override def afterEach(): Unit = { +if (blacklistTracker != null) { + blacklistTracker = null +} +super.afterEach() + } + + val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 100).map{_.toString}).toSet + + /** + * Its easier to write our tests as if we could directly look at the sets of nodes & executors in + * the blacklist. However the api doesn't expose a set (for thread-safety), so this is a simple + * way to test something similar, since we know the universe of values that might appear in these + * sets. + */ + def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): Unit = { +allOptions.foreach { opt => + val actual = f(opt) + val exp = expected.contains(opt) + assert(actual === exp, raw"""for string "$opt" """) +} + } + + def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = { +sc = new SparkContext(conf) +val scheduler = mock[TaskSchedulerImpl] +when(scheduler.sc).thenReturn(sc) + when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker) +scheduler + } + + test("Blacklisting individual tasks") { +val conf = new SparkConf().setAppName("test").setMaster("local") + .set(config.BLACKLIST_ENABLED.key, "true") +val scheduler = mockTaskSchedWithConf(conf) +// Task 1 failed on executor 1 +blacklistTracker = new BlacklistTracker(conf, clock) +val taskSet = FakeTask.createTaskSet(10) +val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), taskSet, 4, clock) +tsm.updateBlacklistForFailedTask("hostA", "1", 0) +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + val exp = (executor == "1" && index == 0) + assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp) +} +assert(blacklistTracker.nodeBlacklist() === Set()) +assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set()) +assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set()) +assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set()) + +// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all executors on that host, +// for all tasks for the stage. Note the api expects multiple checks for each type of +// blacklist -- this actually fits naturally with its use in the scheduler +tsm.updateBlacklistForFailedTask("hostA", "1", 1) +tsm.updateBlacklistForFailedTask("hostA", "2", 0) +tsm.updateBlacklistForFailedTask("hostA", "2", 1) +// we don't explicitly return the executors in hostA here, but that is OK +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + withClue(s"exec = $executor; index = $index") { +val badExec = (executor == "1" || executor == "2") +val badPart = (index == 0 || index == 1) +val taskExp = (badExec && badPart) +assert( + tsm.isExecutorBlacklistedForTask(executor, index) === taskExp) +val
[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/12601 It sounds like you understand my points. Then, please update your PR based on my latest changes and we can review your PR. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r77296487 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -0,0 +1,503 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mock.MockitoSugar + +import org.apache.spark._ +import org.apache.spark.internal.config +import org.apache.spark.util.ManualClock + +class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with MockitoSugar +with LocalSparkContext { + + private val clock = new ManualClock(0) + + private var blacklistTracker: BlacklistTracker = _ + + override def afterEach(): Unit = { +if (blacklistTracker != null) { + blacklistTracker = null +} +super.afterEach() + } + + val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 100).map{_.toString}).toSet + + /** + * Its easier to write our tests as if we could directly look at the sets of nodes & executors in + * the blacklist. However the api doesn't expose a set (for thread-safety), so this is a simple + * way to test something similar, since we know the universe of values that might appear in these + * sets. + */ + def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): Unit = { +allOptions.foreach { opt => + val actual = f(opt) + val exp = expected.contains(opt) + assert(actual === exp, raw"""for string "$opt" """) +} + } + + def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = { +sc = new SparkContext(conf) +val scheduler = mock[TaskSchedulerImpl] +when(scheduler.sc).thenReturn(sc) + when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker) +scheduler + } + + test("Blacklisting individual tasks") { +val conf = new SparkConf().setAppName("test").setMaster("local") + .set(config.BLACKLIST_ENABLED.key, "true") +val scheduler = mockTaskSchedWithConf(conf) +// Task 1 failed on executor 1 +blacklistTracker = new BlacklistTracker(conf, clock) +val taskSet = FakeTask.createTaskSet(10) +val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), taskSet, 4, clock) +tsm.updateBlacklistForFailedTask("hostA", "1", 0) +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + val exp = (executor == "1" && index == 0) + assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp) +} +assert(blacklistTracker.nodeBlacklist() === Set()) +assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set()) +assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set()) +assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set()) + +// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all executors on that host, +// for all tasks for the stage. Note the api expects multiple checks for each type of +// blacklist -- this actually fits naturally with its use in the scheduler +tsm.updateBlacklistForFailedTask("hostA", "1", 1) +tsm.updateBlacklistForFailedTask("hostA", "2", 0) +tsm.updateBlacklistForFailedTask("hostA", "2", 1) +// we don't explicitly return the executors in hostA here, but that is OK +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + withClue(s"exec = $executor; index = $index") { +val badExec = (executor == "1" || executor == "2") +val badPart = (index == 0 || index == 1) +val taskExp = (badExec && badPart) +assert( + tsm.isExecutorBlacklistedForTask(executor, index) === taskExp) +val
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r77296415 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -0,0 +1,503 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mock.MockitoSugar + +import org.apache.spark._ +import org.apache.spark.internal.config +import org.apache.spark.util.ManualClock + +class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with MockitoSugar +with LocalSparkContext { + + private val clock = new ManualClock(0) + + private var blacklistTracker: BlacklistTracker = _ + + override def afterEach(): Unit = { +if (blacklistTracker != null) { + blacklistTracker = null +} +super.afterEach() + } + + val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 100).map{_.toString}).toSet + + /** + * Its easier to write our tests as if we could directly look at the sets of nodes & executors in + * the blacklist. However the api doesn't expose a set (for thread-safety), so this is a simple + * way to test something similar, since we know the universe of values that might appear in these + * sets. + */ + def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): Unit = { +allOptions.foreach { opt => + val actual = f(opt) + val exp = expected.contains(opt) + assert(actual === exp, raw"""for string "$opt" """) +} + } + + def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = { +sc = new SparkContext(conf) +val scheduler = mock[TaskSchedulerImpl] +when(scheduler.sc).thenReturn(sc) + when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker) +scheduler + } + + test("Blacklisting individual tasks") { +val conf = new SparkConf().setAppName("test").setMaster("local") + .set(config.BLACKLIST_ENABLED.key, "true") +val scheduler = mockTaskSchedWithConf(conf) +// Task 1 failed on executor 1 +blacklistTracker = new BlacklistTracker(conf, clock) +val taskSet = FakeTask.createTaskSet(10) +val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), taskSet, 4, clock) +tsm.updateBlacklistForFailedTask("hostA", "1", 0) +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + val exp = (executor == "1" && index == 0) + assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp) +} +assert(blacklistTracker.nodeBlacklist() === Set()) +assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set()) +assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set()) +assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set()) + +// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all executors on that host, +// for all tasks for the stage. Note the api expects multiple checks for each type of +// blacklist -- this actually fits naturally with its use in the scheduler +tsm.updateBlacklistForFailedTask("hostA", "1", 1) +tsm.updateBlacklistForFailedTask("hostA", "2", 0) +tsm.updateBlacklistForFailedTask("hostA", "2", 1) +// we don't explicitly return the executors in hostA here, but that is OK +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + withClue(s"exec = $executor; index = $index") { +val badExec = (executor == "1" || executor == "2") +val badPart = (index == 0 || index == 1) +val taskExp = (badExec && badPart) +assert( + tsm.isExecutorBlacklistedForTask(executor, index) === taskExp) +val
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r77296312 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -0,0 +1,503 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mock.MockitoSugar + +import org.apache.spark._ +import org.apache.spark.internal.config +import org.apache.spark.util.ManualClock + +class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with MockitoSugar +with LocalSparkContext { + + private val clock = new ManualClock(0) + + private var blacklistTracker: BlacklistTracker = _ + + override def afterEach(): Unit = { +if (blacklistTracker != null) { + blacklistTracker = null +} +super.afterEach() + } + + val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 100).map{_.toString}).toSet + + /** + * Its easier to write our tests as if we could directly look at the sets of nodes & executors in + * the blacklist. However the api doesn't expose a set (for thread-safety), so this is a simple + * way to test something similar, since we know the universe of values that might appear in these + * sets. + */ + def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): Unit = { +allOptions.foreach { opt => + val actual = f(opt) + val exp = expected.contains(opt) + assert(actual === exp, raw"""for string "$opt" """) +} + } + + def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = { +sc = new SparkContext(conf) +val scheduler = mock[TaskSchedulerImpl] +when(scheduler.sc).thenReturn(sc) + when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker) +scheduler + } + + test("Blacklisting individual tasks") { +val conf = new SparkConf().setAppName("test").setMaster("local") + .set(config.BLACKLIST_ENABLED.key, "true") +val scheduler = mockTaskSchedWithConf(conf) +// Task 1 failed on executor 1 +blacklistTracker = new BlacklistTracker(conf, clock) +val taskSet = FakeTask.createTaskSet(10) +val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), taskSet, 4, clock) +tsm.updateBlacklistForFailedTask("hostA", "1", 0) +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + val exp = (executor == "1" && index == 0) + assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp) +} +assert(blacklistTracker.nodeBlacklist() === Set()) +assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set()) +assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set()) +assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set()) + +// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all executors on that host, +// for all tasks for the stage. Note the api expects multiple checks for each type of +// blacklist -- this actually fits naturally with its use in the scheduler +tsm.updateBlacklistForFailedTask("hostA", "1", 1) +tsm.updateBlacklistForFailedTask("hostA", "2", 0) +tsm.updateBlacklistForFailedTask("hostA", "2", 1) +// we don't explicitly return the executors in hostA here, but that is OK +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + withClue(s"exec = $executor; index = $index") { +val badExec = (executor == "1" || executor == "2") +val badPart = (index == 0 || index == 1) +val taskExp = (badExec && badPart) +assert( + tsm.isExecutorBlacklistedForTask(executor, index) === taskExp) +val
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r77296176 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -0,0 +1,503 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mock.MockitoSugar + +import org.apache.spark._ +import org.apache.spark.internal.config +import org.apache.spark.util.ManualClock + +class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with MockitoSugar +with LocalSparkContext { + + private val clock = new ManualClock(0) + + private var blacklistTracker: BlacklistTracker = _ + + override def afterEach(): Unit = { +if (blacklistTracker != null) { + blacklistTracker = null +} +super.afterEach() + } + + val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 100).map{_.toString}).toSet + + /** + * Its easier to write our tests as if we could directly look at the sets of nodes & executors in + * the blacklist. However the api doesn't expose a set (for thread-safety), so this is a simple + * way to test something similar, since we know the universe of values that might appear in these + * sets. + */ + def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): Unit = { +allOptions.foreach { opt => + val actual = f(opt) + val exp = expected.contains(opt) + assert(actual === exp, raw"""for string "$opt" """) +} + } + + def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = { +sc = new SparkContext(conf) +val scheduler = mock[TaskSchedulerImpl] +when(scheduler.sc).thenReturn(sc) + when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker) +scheduler + } + + test("Blacklisting individual tasks") { +val conf = new SparkConf().setAppName("test").setMaster("local") + .set(config.BLACKLIST_ENABLED.key, "true") +val scheduler = mockTaskSchedWithConf(conf) +// Task 1 failed on executor 1 +blacklistTracker = new BlacklistTracker(conf, clock) +val taskSet = FakeTask.createTaskSet(10) +val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), taskSet, 4, clock) +tsm.updateBlacklistForFailedTask("hostA", "1", 0) +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + val exp = (executor == "1" && index == 0) + assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp) +} +assert(blacklistTracker.nodeBlacklist() === Set()) +assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set()) +assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set()) +assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set()) + +// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all executors on that host, +// for all tasks for the stage. Note the api expects multiple checks for each type of +// blacklist -- this actually fits naturally with its use in the scheduler +tsm.updateBlacklistForFailedTask("hostA", "1", 1) +tsm.updateBlacklistForFailedTask("hostA", "2", 0) +tsm.updateBlacklistForFailedTask("hostA", "2", 1) +// we don't explicitly return the executors in hostA here, but that is OK +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + withClue(s"exec = $executor; index = $index") { +val badExec = (executor == "1" || executor == "2") +val badPart = (index == 0 || index == 1) +val taskExp = (badExec && badPart) +assert( + tsm.isExecutorBlacklistedForTask(executor, index) === taskExp) +val
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r77296120 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -0,0 +1,503 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mock.MockitoSugar + +import org.apache.spark._ +import org.apache.spark.internal.config +import org.apache.spark.util.ManualClock + +class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with MockitoSugar +with LocalSparkContext { + + private val clock = new ManualClock(0) + + private var blacklistTracker: BlacklistTracker = _ + + override def afterEach(): Unit = { +if (blacklistTracker != null) { + blacklistTracker = null +} +super.afterEach() + } + + val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 100).map{_.toString}).toSet + + /** + * Its easier to write our tests as if we could directly look at the sets of nodes & executors in + * the blacklist. However the api doesn't expose a set (for thread-safety), so this is a simple + * way to test something similar, since we know the universe of values that might appear in these + * sets. + */ + def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): Unit = { +allOptions.foreach { opt => + val actual = f(opt) + val exp = expected.contains(opt) + assert(actual === exp, raw"""for string "$opt" """) +} + } + + def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = { +sc = new SparkContext(conf) +val scheduler = mock[TaskSchedulerImpl] +when(scheduler.sc).thenReturn(sc) + when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker) +scheduler + } + + test("Blacklisting individual tasks") { +val conf = new SparkConf().setAppName("test").setMaster("local") + .set(config.BLACKLIST_ENABLED.key, "true") +val scheduler = mockTaskSchedWithConf(conf) +// Task 1 failed on executor 1 +blacklistTracker = new BlacklistTracker(conf, clock) +val taskSet = FakeTask.createTaskSet(10) +val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), taskSet, 4, clock) +tsm.updateBlacklistForFailedTask("hostA", "1", 0) +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + val exp = (executor == "1" && index == 0) + assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp) +} +assert(blacklistTracker.nodeBlacklist() === Set()) +assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set()) +assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set()) +assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set()) + +// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all executors on that host, +// for all tasks for the stage. Note the api expects multiple checks for each type of +// blacklist -- this actually fits naturally with its use in the scheduler +tsm.updateBlacklistForFailedTask("hostA", "1", 1) +tsm.updateBlacklistForFailedTask("hostA", "2", 0) +tsm.updateBlacklistForFailedTask("hostA", "2", 1) +// we don't explicitly return the executors in hostA here, but that is OK +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + withClue(s"exec = $executor; index = $index") { +val badExec = (executor == "1" || executor == "2") +val badPart = (index == 0 || index == 1) +val taskExp = (badExec && badPart) --- End diff -- expectTaskBlacklisted? (this is too short to guess the meaning). Or just in-line it
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r77296072 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -0,0 +1,503 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mock.MockitoSugar + +import org.apache.spark._ +import org.apache.spark.internal.config +import org.apache.spark.util.ManualClock + +class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with MockitoSugar +with LocalSparkContext { + + private val clock = new ManualClock(0) + + private var blacklistTracker: BlacklistTracker = _ + + override def afterEach(): Unit = { +if (blacklistTracker != null) { + blacklistTracker = null +} +super.afterEach() + } + + val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 100).map{_.toString}).toSet + + /** + * Its easier to write our tests as if we could directly look at the sets of nodes & executors in + * the blacklist. However the api doesn't expose a set (for thread-safety), so this is a simple + * way to test something similar, since we know the universe of values that might appear in these + * sets. + */ + def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): Unit = { +allOptions.foreach { opt => + val actual = f(opt) + val exp = expected.contains(opt) + assert(actual === exp, raw"""for string "$opt" """) +} + } + + def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = { +sc = new SparkContext(conf) +val scheduler = mock[TaskSchedulerImpl] +when(scheduler.sc).thenReturn(sc) + when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker) +scheduler + } + + test("Blacklisting individual tasks") { +val conf = new SparkConf().setAppName("test").setMaster("local") + .set(config.BLACKLIST_ENABLED.key, "true") +val scheduler = mockTaskSchedWithConf(conf) +// Task 1 failed on executor 1 +blacklistTracker = new BlacklistTracker(conf, clock) +val taskSet = FakeTask.createTaskSet(10) +val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), taskSet, 4, clock) +tsm.updateBlacklistForFailedTask("hostA", "1", 0) +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + val exp = (executor == "1" && index == 0) + assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp) +} +assert(blacklistTracker.nodeBlacklist() === Set()) +assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set()) +assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set()) +assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set()) + +// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all executors on that host, +// for all tasks for the stage. Note the api expects multiple checks for each type of +// blacklist -- this actually fits naturally with its use in the scheduler +tsm.updateBlacklistForFailedTask("hostA", "1", 1) +tsm.updateBlacklistForFailedTask("hostA", "2", 0) +tsm.updateBlacklistForFailedTask("hostA", "2", 1) +// we don't explicitly return the executors in hostA here, but that is OK +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + withClue(s"exec = $executor; index = $index") { +val badExec = (executor == "1" || executor == "2") +val badPart = (index == 0 || index == 1) --- End diff -- badIndex? badTaskIndex? (I'm guessing this is leftover from the old partition naming) --- If your project is set up for it, you can
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r77295971 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -0,0 +1,503 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mock.MockitoSugar + +import org.apache.spark._ +import org.apache.spark.internal.config +import org.apache.spark.util.ManualClock + +class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with MockitoSugar +with LocalSparkContext { + + private val clock = new ManualClock(0) + + private var blacklistTracker: BlacklistTracker = _ + + override def afterEach(): Unit = { +if (blacklistTracker != null) { + blacklistTracker = null +} +super.afterEach() + } + + val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 100).map{_.toString}).toSet + + /** + * Its easier to write our tests as if we could directly look at the sets of nodes & executors in + * the blacklist. However the api doesn't expose a set (for thread-safety), so this is a simple + * way to test something similar, since we know the universe of values that might appear in these + * sets. + */ + def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): Unit = { +allOptions.foreach { opt => + val actual = f(opt) + val exp = expected.contains(opt) + assert(actual === exp, raw"""for string "$opt" """) +} + } + + def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = { +sc = new SparkContext(conf) +val scheduler = mock[TaskSchedulerImpl] +when(scheduler.sc).thenReturn(sc) + when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker) +scheduler + } + + test("Blacklisting individual tasks") { +val conf = new SparkConf().setAppName("test").setMaster("local") + .set(config.BLACKLIST_ENABLED.key, "true") +val scheduler = mockTaskSchedWithConf(conf) +// Task 1 failed on executor 1 +blacklistTracker = new BlacklistTracker(conf, clock) +val taskSet = FakeTask.createTaskSet(10) +val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), taskSet, 4, clock) +tsm.updateBlacklistForFailedTask("hostA", "1", 0) +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + val exp = (executor == "1" && index == 0) + assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp) +} +assert(blacklistTracker.nodeBlacklist() === Set()) +assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set()) +assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set()) +assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set()) + +// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all executors on that host, +// for all tasks for the stage. Note the api expects multiple checks for each type of +// blacklist -- this actually fits naturally with its use in the scheduler +tsm.updateBlacklistForFailedTask("hostA", "1", 1) --- End diff -- Don't bother fixing all of these, but in the future, I find tests like this much easier to read if you pass in the parameters as named parameters, so it's obvious what everything means (e.g., updateBlacklistForFailedTask("hostA", exec = "1", index = 1)) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r77296004 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -0,0 +1,503 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mock.MockitoSugar + +import org.apache.spark._ +import org.apache.spark.internal.config +import org.apache.spark.util.ManualClock + +class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with MockitoSugar +with LocalSparkContext { + + private val clock = new ManualClock(0) + + private var blacklistTracker: BlacklistTracker = _ + + override def afterEach(): Unit = { +if (blacklistTracker != null) { + blacklistTracker = null +} +super.afterEach() + } + + val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 100).map{_.toString}).toSet + + /** + * Its easier to write our tests as if we could directly look at the sets of nodes & executors in + * the blacklist. However the api doesn't expose a set (for thread-safety), so this is a simple + * way to test something similar, since we know the universe of values that might appear in these + * sets. + */ + def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): Unit = { +allOptions.foreach { opt => + val actual = f(opt) + val exp = expected.contains(opt) + assert(actual === exp, raw"""for string "$opt" """) +} + } + + def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = { +sc = new SparkContext(conf) +val scheduler = mock[TaskSchedulerImpl] +when(scheduler.sc).thenReturn(sc) + when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker) +scheduler + } + + test("Blacklisting individual tasks") { +val conf = new SparkConf().setAppName("test").setMaster("local") + .set(config.BLACKLIST_ENABLED.key, "true") +val scheduler = mockTaskSchedWithConf(conf) +// Task 1 failed on executor 1 +blacklistTracker = new BlacklistTracker(conf, clock) +val taskSet = FakeTask.createTaskSet(10) +val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), taskSet, 4, clock) +tsm.updateBlacklistForFailedTask("hostA", "1", 0) +for { + executor <- (1 to 4).map(_.toString) + index <- 0 until 10 +} { + val exp = (executor == "1" && index == 0) + assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp) +} +assert(blacklistTracker.nodeBlacklist() === Set()) +assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set()) +assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set()) +assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set()) + +// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all executors on that host, --- End diff -- can you add "should" before "blacklist" (so it's clear what the test is doing versus what the test is verifying) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14921: [SPARK-17361][SQL] file-based external table with...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14921#discussion_r77295817 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -314,12 +314,8 @@ case class DataSource( /** * Create a resolved [[BaseRelation]] that can be used to read data from or write data into this * [[DataSource]] - * - * @param checkPathExist A flag to indicate whether to check the existence of path or not. - * This flag will be set to false when we create an empty table (the - * path of the table does not exist). */ - def resolveRelation(checkPathExist: Boolean = true): BaseRelation = { + def resolveRelation(): BaseRelation = { --- End diff -- After a discussion with Wenchen, `resolveRelation` will be invoked by `CREATE TABLE ... USING...`, although the write path in `DataFrameWriter`APIs does not invoke it. Thanks! @clockfly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14809#discussion_r77295778 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -241,10 +241,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } // converts the table metadata to Hive compatible format, i.e. set the serde information. - def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): CatalogTable = { + def newHiveCompatibleMetastoreTable(serde: HiveSerDe): CatalogTable = { +val location = if (tableDefinition.tableType == EXTERNAL) { + // When we hit this branch, we are saving an external data source table with hive + // compatible format, which means the data source is file-based and must have a `path`. + val map = new CaseInsensitiveMap(tableDefinition.storage.properties) + assert(map.contains("path"), --- End diff -- how about require? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14921: [SPARK-17361][SQL] file-based external table without pat...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14921 @clockfly updated, the only external behavior change is what I fixed in this PR: creating file-based external table without path will fail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r77295487 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -0,0 +1,503 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mock.MockitoSugar + +import org.apache.spark._ +import org.apache.spark.internal.config +import org.apache.spark.util.ManualClock + +class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with MockitoSugar +with LocalSparkContext { + + private val clock = new ManualClock(0) + + private var blacklistTracker: BlacklistTracker = _ + + override def afterEach(): Unit = { +if (blacklistTracker != null) { + blacklistTracker = null +} +super.afterEach() + } + + val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 100).map{_.toString}).toSet --- End diff -- allExecutorAndHostIds? allExecutorAndHostNames? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14773: [SPARK-17203][SQL] data source options should alw...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14773#discussion_r77295324 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala --- @@ -90,17 +90,31 @@ case class RefreshResource(path: String) /** * Builds a map in which keys are case insensitive */ -class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, String] +class CaseInsensitiveMap[T] private(baseMap: Map[String, T]) extends Map[String, T] --- End diff -- If we decide to go case-insensitive all usage of options, probably we don't need class CaseInsensitiveMap adapter class any longer. Instead, we should probably convert to user provided options to lower case directly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14931: [SPARK-17370] Shuffle service files not invalidat...
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/14931#discussion_r77294974 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -346,15 +346,16 @@ private[spark] class TaskSchedulerImpl( def statusUpdate(tid: Long, state: TaskState, serializedData: ByteBuffer) { var failedExecutor: Option[String] = None +var reason: ExecutorLossReason = null --- End diff -- Nit: it would be clearer to make this an Option similar to `failedExecutor` rather than a null. The two variables are used together. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14915: [SPARK-17356][SQL] Fix out of memory issue when generati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14915 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64828/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14915: [SPARK-17356][SQL] Fix out of memory issue when generati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14915 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14773: [SPARK-17203][SQL] data source options should alw...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14773#discussion_r77294893 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala --- @@ -90,17 +90,31 @@ case class RefreshResource(path: String) /** * Builds a map in which keys are case insensitive */ -class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, String] +class CaseInsensitiveMap[T] private(baseMap: Map[String, T]) extends Map[String, T] with Serializable { - val baseMap = map.map(kv => kv.copy(_1 = kv._1.toLowerCase)) + override def get(k: String): Option[T] = baseMap.get(k.toLowerCase) - override def get(k: String): Option[String] = baseMap.get(k.toLowerCase) + override def + [B1 >: T](kv: (String, B1)): Map[String, B1] = +new CaseInsensitiveMap(baseMap + kv.copy(_1 = kv._1.toLowerCase)) - override def + [B1 >: String](kv: (String, B1)): Map[String, B1] = -baseMap + kv.copy(_1 = kv._1.toLowerCase) + override def iterator: Iterator[(String, T)] = baseMap.iterator - override def iterator: Iterator[(String, String)] = baseMap.iterator + override def -(key: String): Map[String, T] = +new CaseInsensitiveMap(baseMap - key.toLowerCase) +} - override def -(key: String): Map[String, String] = baseMap - key.toLowerCase +object CaseInsensitiveMap { + def apply[T](map: Map[String, T]): CaseInsensitiveMap[T] = { +val lowercaseKeys = map.keys.map(_.toLowerCase) +if (lowercaseKeys.toSet.size != map.size) { + val duplicatedKeys = lowercaseKeys.groupBy(identity).collect { +case (x, ys) if ys.size > 1 => x + } + throw new AnalysisException( --- End diff -- Why it is a AnalysisException? This class is only supposed to be used at driver? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14915: [SPARK-17356][SQL] Fix out of memory issue when generati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14915 **[Test build #64828 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64828/consoleFull)** for PR 14915 at commit [`39f3c63`](https://github.com/apache/spark/commit/39f3c63cbde086f44b8777d4ff708daa3bef2f18). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14809#discussion_r77294000 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -272,17 +282,11 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat "Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. " (None, message) -case (Some(serde), Some(path)) => +case Some(serde) => val message = -s"Persisting file based data source table $qualifiedTableName with an input path " + - s"into Hive metastore in Hive compatible format." - (Some(newHiveCompatibleMetastoreTable(serde, path)), message) - -case (Some(_), None) => - val message = -s"Data source table $qualifiedTableName is not file based. Persisting it into " + - s"Hive metastore in Spark SQL specific format, which is NOT compatible with Hive." - (None, message) +s"Persisting file based data source table $qualifiedTableName into " + --- End diff -- I think it is good to have the path in the log message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...
Github user JustinPihony commented on the issue: https://github.com/apache/spark/pull/12601 Yes I realize that `SchemaRelationProvider` is not necessary, but the legwork has already been done, so why not take advantage of it. Even still, this original PR is actually not that far from your duplicate, except it has a few lines in `JDBCRelation.scala`. So the code changes are close to the same size. All I need is for a final review as I have made the changes requested from the original review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14809#discussion_r77293800 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -241,10 +241,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } // converts the table metadata to Hive compatible format, i.e. set the serde information. - def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): CatalogTable = { + def newHiveCompatibleMetastoreTable(serde: HiveSerDe): CatalogTable = { +val location = if (tableDefinition.tableType == EXTERNAL) { + // When we hit this branch, we are saving an external data source table with hive + // compatible format, which means the data source is file-based and must have a `path`. + val map = new CaseInsensitiveMap(tableDefinition.storage.properties) + assert(map.contains("path"), --- End diff -- assert will be removed during runtime, can we use explicit exception? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14850: [SPARK-17279][SQL] better error message for exceptions d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14850 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14850: [SPARK-17279][SQL] better error message for exceptions d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14850 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64829/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14850: [SPARK-17279][SQL] better error message for exceptions d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14850 **[Test build #64829 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64829/consoleFull)** for PR 14850 at commit [`b7c459b`](https://github.com/apache/spark/commit/b7c459b0231055f6e9915a42c72bc39610b1fe03). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips charact...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14924 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips charact...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14924 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64827/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression interf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14834 **[Test build #64832 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64832/consoleFull)** for PR 14834 at commit [`c52ef66`](https://github.com/apache/spark/commit/c52ef666e40b511351ea7dd45b075be7f001416f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips charact...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14924 **[Test build #64827 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64827/consoleFull)** for PR 14924 at commit [`554f46c`](https://github.com/apache/spark/commit/554f46cb9cb3a8d5e9bc0be458f8e3ec5c78d809). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14850: [SPARK-17279][SQL] better error message for excep...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14850#discussion_r77292898 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -1039,9 +1033,18 @@ case class ScalaUDF( (convert, argTerm) }.unzip -val callFunc = s"${ctx.boxedType(dataType)} $resultTerm = " + - s"(${ctx.boxedType(dataType)})${catalystConverterTerm}" + -s".apply($funcTerm.apply(${funcArguments.mkString(", ")}));" +val getFuncResult = s"$funcTerm.apply(${funcArguments.mkString(", ")})" +val rethrowException = "throw new org.apache.spark.SparkException" + + """("Exception happens when execute user code in Scala UDF.", e);""" --- End diff -- Exception happens when `executing` user defined function (className: input argument type => output argument type) . Or `Failed to execute user defined function (className: input argument type => output argument type) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression interf...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/14834 @yanboliang Thanks for the tip. Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14921: [SPARK-17361][SQL] file-based external table with...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14921#discussion_r77292804 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -314,12 +314,8 @@ case class DataSource( /** * Create a resolved [[BaseRelation]] that can be used to read data from or write data into this * [[DataSource]] - * - * @param checkPathExist A flag to indicate whether to check the existence of path or not. - * This flag will be set to false when we create an empty table (the - * path of the table does not exist). */ - def resolveRelation(checkPathExist: Boolean = true): BaseRelation = { + def resolveRelation(): BaseRelation = { --- End diff -- Based on my understanding, `resolveRelation` is not invoked by the write path of the non-file based data sources. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14921: [SPARK-17361][SQL] createExternalTable should fail if pa...
Github user clockfly commented on the issue: https://github.com/apache/spark/pull/14921 @cloud-fan Can you update the PR tile and PR description to be more user-facing. For example, it is better to use "CREATE TABLE ..USING.." in the PR title. It is also better to list some user behavior change in the PR description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression interf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14834 **[Test build #64831 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64831/consoleFull)** for PR 14834 at commit [`5bce1ba`](https://github.com/apache/spark/commit/5bce1ba7381d207c51005628319574d40a2211aa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14921#discussion_r77292635 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -314,12 +314,8 @@ case class DataSource( /** * Create a resolved [[BaseRelation]] that can be used to read data from or write data into this * [[DataSource]] - * - * @param checkPathExist A flag to indicate whether to check the existence of path or not. - * This flag will be set to false when we create an empty table (the - * path of the table does not exist). */ - def resolveRelation(checkPathExist: Boolean = true): BaseRelation = { + def resolveRelation(): BaseRelation = { --- End diff -- When a data source wants to implement a write path (`save` API), they need to extend the trait `CreatableRelationProvider`. That is what my PR https://github.com/apache/spark/pull/14077 does. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14921#discussion_r77292575 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -314,12 +314,8 @@ case class DataSource( /** * Create a resolved [[BaseRelation]] that can be used to read data from or write data into this * [[DataSource]] - * - * @param checkPathExist A flag to indicate whether to check the existence of path or not. - * This flag will be set to false when we create an empty table (the - * path of the table does not exist). */ - def resolveRelation(checkPathExist: Boolean = true): BaseRelation = { + def resolveRelation(): BaseRelation = { --- End diff -- What I said before is wrong, managed table still need to call `resolveRelation` to do some validation, because the data source may not be file-based but something else. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14921#discussion_r77292542 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -314,12 +314,8 @@ case class DataSource( /** * Create a resolved [[BaseRelation]] that can be used to read data from or write data into this * [[DataSource]] - * - * @param checkPathExist A flag to indicate whether to check the existence of path or not. - * This flag will be set to false when we create an empty table (the - * path of the table does not exist). */ - def resolveRelation(checkPathExist: Boolean = true): BaseRelation = { + def resolveRelation(): BaseRelation = { --- End diff -- `createRelation` is from the trait `RelationProvider` ```Scala override def createRelation( sqlContext: SQLContext, parameters: Map[String, String]): BaseRelation ``` `RelationProvider` is only used for the read path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14921#discussion_r77292448 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -314,12 +314,8 @@ case class DataSource( /** * Create a resolved [[BaseRelation]] that can be used to read data from or write data into this * [[DataSource]] - * - * @param checkPathExist A flag to indicate whether to check the existence of path or not. - * This flag will be set to false when we create an empty table (the - * path of the table does not exist). */ - def resolveRelation(checkPathExist: Boolean = true): BaseRelation = { + def resolveRelation(): BaseRelation = { --- End diff -- @gatorsmile I means write path. When createRelation() is called on a RelationProvider, RelationProvider may do some extra check to make sure the options provided are valid. We'd better enforce the check when trying to create a managed table. For example, JdbcRelationProvider will validate the options ``` class JdbcRelationProvider extends RelationProvider with DataSourceRegister { override def shortName(): String = "jdbc" /** Returns a new base relation with the given parameters. */ override def createRelation( sqlContext: SQLContext, parameters: Map[String, String]): BaseRelation = { val jdbcOptions = new JDBCOptions(parameters) if (jdbcOptions.partitionColumn != null && (jdbcOptions.lowerBound == null || jdbcOptions.upperBound == null || jdbcOptions.numPartitions == null)) { sys.error("Partitioning incompletely specified") } val partitionInfo = if (jdbcOptions.partitionColumn == null) { null } else { JDBCPartitioningInfo( jdbcOptions.partitionColumn, jdbcOptions.lowerBound.toLong, jdbcOptions.upperBound.toLong, jdbcOptions.numPartitions.toInt) } val parts = JDBCRelation.columnPartition(partitionInfo) val properties = new Properties() // Additional properties that we will pass to getConnection parameters.foreach(kv => properties.setProperty(kv._1, kv._2)) JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession) } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14921#discussion_r77292357 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -314,12 +314,8 @@ case class DataSource( /** * Create a resolved [[BaseRelation]] that can be used to read data from or write data into this * [[DataSource]] - * - * @param checkPathExist A flag to indicate whether to check the existence of path or not. - * This flag will be set to false when we create an empty table (the - * path of the table does not exist). */ - def resolveRelation(checkPathExist: Boolean = true): BaseRelation = { + def resolveRelation(): BaseRelation = { --- End diff -- FYI, today, I just updated the write path for JDBC connection. https://github.com/apache/spark/pull/14077 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14921: [SPARK-17361][SQL] createExternalTable should fail if pa...
Github user clockfly commented on the issue: https://github.com/apache/spark/pull/14921 ï¼1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14883#discussion_r77292196 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -184,4 +184,17 @@ abstract class ExternalCatalog { def listFunctions(db: String, pattern: String): Seq[String] + // -- + // Resources + // -- + + /** + * Add a JAR resource to the underlying external catalog for DDL (e.g. CREATE TABLE) and DML + * (e.g., LOAD TABLE) operations. + * + * For example, when users create a Hive serde table, they can specify a custom + * Serializer-Deserializer (SerDe) class. When Hive metastore is unable to access the custom SerDe + * JAR (e.g., not on the Hive classpath), the JAR file must be added at runtime using this API. + */ + def addJar(path: String): Unit --- End diff -- We still need an `addJar` API in the `ExternalCatalog` for passing it to the underlying Hive metastore, even if we can specify jar in data source options. Here, we are not storing the `Jar` info in the metastore. The Hive metastore needs it for creating/loading the tables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14921#discussion_r77291908 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -314,12 +314,8 @@ case class DataSource( /** * Create a resolved [[BaseRelation]] that can be used to read data from or write data into this * [[DataSource]] - * - * @param checkPathExist A flag to indicate whether to check the existence of path or not. - * This flag will be set to false when we create an empty table (the - * path of the table does not exist). */ - def resolveRelation(checkPathExist: Boolean = true): BaseRelation = { + def resolveRelation(): BaseRelation = { --- End diff -- @clockfly Sorry, I did not get your point. What you said above is only for the read path, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14933: [SPARK-16533][CORE] - backport driver deadlock fix to 2....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14933 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14933: [SPARK-16533][CORE] - backport driver deadlock fi...
GitHub user angolon opened a pull request: https://github.com/apache/spark/pull/14933 [SPARK-16533][CORE] - backport driver deadlock fix to 2.0 ## What changes were proposed in this pull request? Backport changes from #14710 and #14925 to 2.0 You can merge this pull request into a Git repository by running: $ git pull https://github.com/angolon/spark SPARK-16533-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14933.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14933 commit 45a3f220b5f1b08fbe4f8d390755041dd2738e67 Author: Angus GerryDate: 2016-09-01T17:35:31Z [SPARK-16533][CORE] resolve deadlocking in driver when executors die This pull request reverts the changes made as a part of #14605, which simply side-steps the deadlock issue. Instead, I propose the following approach: * Use `scheduleWithFixedDelay` when calling `ExecutorAllocationManager.schedule` for scheduling executor requests. The intent of this is that if invocations are delayed beyond the default schedule interval on account of lock contention, then we avoid a situation where calls to `schedule` are made back-to-back, potentially releasing and then immediately reacquiring these locks - further exacerbating contention. * Replace a number of calls to `askWithRetry` with `ask` inside of message handling code in `CoarseGrainedSchedulerBackend` and its ilk. This allows us queue messages with the relevant endpoints, release whatever locks we might be holding, and then block whilst awaiting the response. This change is made at the cost of being able to retry should sending the message fail, as retrying outside of the lock could easily cause race conditions if other conflicting messages have been sent whilst awaiting a response. I believe this to be the lesser of two evils, as in many cases these RPC calls are to process local components, and so failures are more likely to be deterministic, and timeouts are more likely to be caused by lock contention. Existing tests, and manual tests under yarn-client mode. Author: Angus Gerry Closes #14710 from angolon/SPARK-16533. commit de488ce0a0025d3c9736a1df6e45d90e265a84d4 Author: Marcelo Vanzin Date: 2016-09-01T21:02:58Z [SPARK-16533][HOTFIX] Fix compilation on Scala 2.10. No idea why it was failing (the needed import was there), but this makes things work. Author: Marcelo Vanzin Closes #14925 from vanzin/SPARK-16533. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14883#discussion_r77291638 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -184,4 +184,17 @@ abstract class ExternalCatalog { def listFunctions(db: String, pattern: String): Seq[String] + // -- + // Resources + // -- + + /** + * Add a JAR resource to the underlying external catalog for DDL (e.g. CREATE TABLE) and DML + * (e.g., LOAD TABLE) operations. + * + * For example, when users create a Hive serde table, they can specify a custom + * Serializer-Deserializer (SerDe) class. When Hive metastore is unable to access the custom SerDe + * JAR (e.g., not on the Hive classpath), the JAR file must be added at runtime using this API. + */ + def addJar(path: String): Unit --- End diff -- This is a good point. Once hive is consolidated into data source table, we can specify jar in data source way e.g., option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14921: [SPARK-17361][SQL] createExternalTable should fail if pa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14921 **[Test build #64830 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64830/consoleFull)** for PR 14921 at commit [`eefe3bc`](https://github.com/apache/spark/commit/eefe3bc28fedeafabd8b3739f9bdea796b74ea9b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13584: [SPARK-15509][ML][SparkR] R MLlib algorithms should supp...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/13584 LGTM - @felixcheung Feel free to merge when its ready --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14850: [SPARK-17279][SQL] better error message for exceptions d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14850 **[Test build #64829 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64829/consoleFull)** for PR 14850 at commit [`b7c459b`](https://github.com/apache/spark/commit/b7c459b0231055f6e9915a42c72bc39610b1fe03). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14854 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64826/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14854 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14854 **[Test build #64826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64826/consoleFull)** for PR 14854 at commit [`9865d32`](https://github.com/apache/spark/commit/9865d3207ec4db9d6832d240690cf9c47742361f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13584: [SPARK-15509][ML][SparkR] R MLlib algorithms should supp...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/13584 LGTM. @shivaram do you have any other comment? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13584: [SPARK-15509][ML][SparkR] R MLlib algorithms shou...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13584#discussion_r77290933 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/RWrapperUtils.scala --- @@ -35,13 +35,37 @@ object RWrapperUtils extends Logging { */ def checkDataColumns(rFormula: RFormula, data: Dataset[_]): Unit = { if (data.schema.fieldNames.contains(rFormula.getLabelCol)) { - logWarning("data containing 'label' column, so change its name to avoid conflict") - rFormula.setLabelCol(rFormula.getLabelCol + "_output") + val newLabelName = convertToUniqueName(rFormula.getLabelCol, data.schema.fieldNames) --- End diff -- fair enough. that makes sense, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14921#discussion_r77290803 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -314,12 +314,8 @@ case class DataSource( /** * Create a resolved [[BaseRelation]] that can be used to read data from or write data into this * [[DataSource]] - * - * @param checkPathExist A flag to indicate whether to check the existence of path or not. - * This flag will be set to false when we create an empty table (the - * path of the table does not exist). */ - def resolveRelation(checkPathExist: Boolean = true): BaseRelation = { + def resolveRelation(): BaseRelation = { --- End diff -- For example, if it is a JDBC Relation provider, we will call ```dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions)``` to some extra check ``` def resolveRelation(checkPathExist: Boolean = true): BaseRelation = { val caseInsensitiveOptions = new CaseInsensitiveMap(options) val relation = (providingClass.newInstance(), userSpecifiedSchema) match { // TODO: Throw when too much is given. case (dataSource: SchemaRelationProvider, Some(schema)) => dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions, schema) case (dataSource: RelationProvider, None) => dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14883#discussion_r77290783 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -184,4 +184,17 @@ abstract class ExternalCatalog { def listFunctions(db: String, pattern: String): Seq[String] + // -- + // Resources + // -- + + /** + * Add a JAR resource to the underlying external catalog for DDL (e.g. CREATE TABLE) and DML + * (e.g., LOAD TABLE) operations. + * + * For example, when users create a Hive serde table, they can specify a custom + * Serializer-Deserializer (SerDe) class. When Hive metastore is unable to access the custom SerDe + * JAR (e.g., not on the Hive classpath), the JAR file must be added at runtime using this API. + */ + def addJar(path: String): Unit --- End diff -- Let me copy and paste the reasons why users need the customization capabilities: - When to add a new File Format? > User has files with special file formats not supported by Hive yet, and users donât want to convert the files before loading into Hive. > User has a more efficient way of storing data on disk. - When to add a new File Format? - When to add a new SerDe? > User has data with special serialized format not supported by Hive yet, and users donât want to convert the data before loading into Hive. > User has a more efficient way of serializing the data on disk. Also, the existing UDF, UDAF and UDTF can be reused without any rewriting. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14921#discussion_r77290773 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -314,12 +314,8 @@ case class DataSource( /** * Create a resolved [[BaseRelation]] that can be used to read data from or write data into this * [[DataSource]] - * - * @param checkPathExist A flag to indicate whether to check the existence of path or not. - * This flag will be set to false when we create an empty table (the - * path of the table does not exist). */ - def resolveRelation(checkPathExist: Boolean = true): BaseRelation = { + def resolveRelation(): BaseRelation = { --- End diff -- Checked with Wenchen, it is not safe to skip calling resolveRelation() when it is a managed table. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14866: [SPARK-17298][SQL] Require explicit CROSS join fo...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14866#discussion_r77290718 --- Diff: R/pkg/R/DataFrame.R --- @@ -2265,7 +2265,7 @@ setMethod("join", signature(x = "SparkDataFrame", y = "SparkDataFrame"), function(x, y, joinExpr = NULL, joinType = NULL) { if (is.null(joinExpr)) { - sdf <- callJMethod(x@sdf, "join", y@sdf) + sdf <- callJMethod(x@sdf, "crossJoin", y@sdf) } else { if (class(joinExpr) != "Column") stop("joinExpr must be a Column") if (is.null(joinType)) { --- End diff -- should the next line be "crossJoin" too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14932: [SPARK-17371] Resubmitted shuffle outputs can get delete...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14932 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14932: [SPARK-17371] Resubmitted shuffle outputs can get delete...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14932 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64824/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14932: [SPARK-17371] Resubmitted shuffle outputs can get delete...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14932 **[Test build #64824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64824/consoleFull)** for PR 14932 at commit [`350d3be`](https://github.com/apache/spark/commit/350d3be3d77408a85d894bccb746fb603782ca1f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14883#discussion_r77290382 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -184,4 +184,17 @@ abstract class ExternalCatalog { def listFunctions(db: String, pattern: String): Seq[String] + // -- + // Resources + // -- + + /** + * Add a JAR resource to the underlying external catalog for DDL (e.g. CREATE TABLE) and DML + * (e.g., LOAD TABLE) operations. + * + * For example, when users create a Hive serde table, they can specify a custom + * Serializer-Deserializer (SerDe) class. When Hive metastore is unable to access the custom SerDe + * JAR (e.g., not on the Hive classpath), the JAR file must be added at runtime using this API. + */ + def addJar(path: String): Unit --- End diff -- Yeah, we still need it after hive and data source tables are consolidated. The major reason is for supporting custom Hive Serde, custom Hive file format, and custom UDF/UDAF. They are major features for the existing Hive users. These supports might be critical for migration from Hive to Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14931 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64823/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14931 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14931 **[Test build #64823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64823/consoleFull)** for PR 14931 at commit [`a704376`](https://github.com/apache/spark/commit/a704376b57b488ce0ce1b0ba8ed13d36e5debfd4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14915: [SPARK-17356][SQL] Fix out of memory issue when generati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14915 **[Test build #64828 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64828/consoleFull)** for PR 14915 at commit [`39f3c63`](https://github.com/apache/spark/commit/39f3c63cbde086f44b8777d4ff708daa3bef2f18). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14915: [SPARK-17356][SQL] Fix out of memory issue when g...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14915#discussion_r77288334 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala --- @@ -274,6 +274,13 @@ abstract class QueryTest extends PlanTest { val normalized1 = logicalPlan.transformAllExpressions { case udf: ScalaUDF => udf.copy(function = null) case gen: UserDefinedGenerator => gen.copy(function = null) + // SPARK-17356: In usage of mllib, Metadata may store a huge vector of data, transforming + // it to JSON may trigger OutOfMemoryError. + case a @ Alias(child, name) if a.explicitMetadata.isDefined => +Alias(child, name)(a.exprId, a.qualifier, Some(Metadata.empty), a.isGenerated) + case a: AttributeReference if a.metadata != Metadata.empty => +AttributeReference(a.name, a.dataType, a.nullable, Metadata.empty)(a.exprId, a.qualifier, + a.isGenerated) --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14915: [SPARK-17356][SQL] Fix out of memory issue when g...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14915#discussion_r77287956 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala --- @@ -274,6 +274,13 @@ abstract class QueryTest extends PlanTest { val normalized1 = logicalPlan.transformAllExpressions { case udf: ScalaUDF => udf.copy(function = null) case gen: UserDefinedGenerator => gen.copy(function = null) + // SPARK-17356: In usage of mllib, Metadata may store a huge vector of data, transforming + // it to JSON may trigger OutOfMemoryError. + case a @ Alias(child, name) if a.explicitMetadata.isDefined => +Alias(child, name)(a.exprId, a.qualifier, Some(Metadata.empty), a.isGenerated) + case a: AttributeReference if a.metadata != Metadata.empty => +AttributeReference(a.name, a.dataType, a.nullable, Metadata.empty)(a.exprId, a.qualifier, + a.isGenerated) --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14930: [SPARK-16926] [SQL] Add unit test to compare tabl...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14930 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14930: [SPARK-16926] [SQL] Add unit test to compare table and p...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14930 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14930: [SPARK-16926] [SQL] Add unit test to compare tabl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14930#discussion_r77287554 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala --- @@ -143,4 +144,38 @@ class HiveTableScanSuite extends HiveComparisonTest with SQLTestUtils with TestH } } } + + test("SPARK-16926: number of table and partition columns match for new partitioned table") { +val view = "src" +withTempView(view) { + spark.range(1, 5).createOrReplaceTempView(view) + val table = "table_with_partition" + withTable(table) { +sql( + s""" + |CREATE TABLE $table(id string) + |PARTITIONED BY (p1 string,p2 string,p3 string,p4 string,p5 string) + """.stripMargin) +sql( + s""" + |FROM $view v + |INSERT INTO TABLE $table + |PARTITION (p1='a',p2='b',p3='c',p4='d',p5='e') + |SELECT v.id + |INSERT INTO TABLE $table --- End diff -- why do we insert the same value to the table 2 times? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14930: [SPARK-16926] [SQL] Add unit test to compare table and p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14930 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14930: [SPARK-16926] [SQL] Add unit test to compare table and p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14930 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64822/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14930: [SPARK-16926] [SQL] Add unit test to compare table and p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14930 **[Test build #64822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64822/consoleFull)** for PR 14930 at commit [`7ac4d63`](https://github.com/apache/spark/commit/7ac4d63cafc854aa4a5c3622b668474da1255022). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14858: [SPARK-17219][ML] Add NaN value handling in Bucketizer
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14858 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64825/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14858: [SPARK-17219][ML] Add NaN value handling in Bucketizer
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14858 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14858: [SPARK-17219][ML] Add NaN value handling in Bucketizer
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14858 **[Test build #64825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64825/consoleFull)** for PR 14858 at commit [`cc5a1e7`](https://github.com/apache/spark/commit/cc5a1e7342e2dbd43734a414a0ace0465eaecfe8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips charact...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14924 **[Test build #64827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64827/consoleFull)** for PR 14924 at commit [`554f46c`](https://github.com/apache/spark/commit/554f46cb9cb3a8d5e9bc0be458f8e3ec5c78d809). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14900 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14900 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64820/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14900 **[Test build #64820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64820/consoleFull)** for PR 14900 at commit [`c4ad6a1`](https://github.com/apache/spark/commit/c4ad6a1b813c5553e9c86c86ee234ffa43daff5e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips charact...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/14924 @rxin Done ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14921: [SPARK-17361][SQL] createExternalTable should fail if pa...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14921 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14913: [SPARK-17358][SQL] Cached table(parquet/orc) shou...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14913#discussion_r77285873 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -169,6 +169,13 @@ case class HadoopFsRelation( location.allFiles().map(_.getPath.toUri.toString).toArray override def sizeInBytes: Long = location.allFiles().map(_.getLen).sum + + override def equals(other: Any): Boolean = other match { +case r: HadoopFsRelation => location == r.location --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14921: [SPARK-17361][SQL] createExternalTable should fail if pa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14921 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64819/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14921: [SPARK-17361][SQL] createExternalTable should fail if pa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14921 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14921: [SPARK-17361][SQL] createExternalTable should fail if pa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14921 **[Test build #64819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64819/consoleFull)** for PR 14921 at commit [`2533d65`](https://github.com/apache/spark/commit/2533d656e1fba7c814aaa248761508599150337f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14854 **[Test build #64826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64826/consoleFull)** for PR 14854 at commit [`9865d32`](https://github.com/apache/spark/commit/9865d3207ec4db9d6832d240690cf9c47742361f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14929: [Don't merge][WIP] Better error message for JSON file fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14929 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14929: [Don't merge][WIP] Better error message for JSON file fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14929 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64821/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14929: [Don't merge][WIP] Better error message for JSON file fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14929 **[Test build #64821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64821/consoleFull)** for PR 14929 at commit [`3491f15`](https://github.com/apache/spark/commit/3491f15f6050126904b05b8e5f5beb25c9d8f1a8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14930: [SPARK-16926] [SQL] Add unit test to compare tabl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14930#discussion_r77285020 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala --- @@ -143,4 +144,38 @@ class HiveTableScanSuite extends HiveComparisonTest with SQLTestUtils with TestH } } } + + test("SPARK-16926: number of table and partition columns match for new partitioned table") { +val view = "src" +withTempView(view) { + spark.range(1, 5).createOrReplaceTempView(view) --- End diff -- oh sorry I read the code wrong... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14883#discussion_r77284957 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -184,4 +184,17 @@ abstract class ExternalCatalog { def listFunctions(db: String, pattern: String): Seq[String] + // -- + // Resources + // -- + + /** + * Add a JAR resource to the underlying external catalog for DDL (e.g. CREATE TABLE) and DML + * (e.g., LOAD TABLE) operations. + * + * For example, when users create a Hive serde table, they can specify a custom + * Serializer-Deserializer (SerDe) class. When Hive metastore is unable to access the custom SerDe + * JAR (e.g., not on the Hive classpath), the JAR file must be added at runtime using this API. + */ + def addJar(path: String): Unit --- End diff -- I'm still a little hesitant to add this concept. After we consolidate hive and data source tables, do we still need it? data source resolution is done before we save table metadata to metastore. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14929: [Don't merge][WIP] Better error message for JSON file fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14929 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64818/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14929: [Don't merge][WIP] Better error message for JSON file fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14929 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14929: [Don't merge][WIP] Better error message for JSON file fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14929 **[Test build #64818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64818/consoleFull)** for PR 14929 at commit [`50e312f`](https://github.com/apache/spark/commit/50e312f414d34783f33ca890a697aa654c8e0847). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14866 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14866 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64814/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14866 **[Test build #64814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64814/consoleFull)** for PR 14866 at commit [`703a554`](https://github.com/apache/spark/commit/703a554a342d91fba3040a769a8f7ff786569f0c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org