date:20160901

[GitHub] spark issue #14850: [SPARK-17279][SQL] better error message for exceptions d...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14850
  
**[Test build #64833 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64833/consoleFull)**
 for PR 14850 at commit 
[`4efb6fc`](https://github.com/apache/spark/commit/4efb6fc86b5fdcb7023aa9a2ee7063d4338b51da).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-09-01 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r77296679
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala ---
@@ -0,0 +1,503 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.mockito.Mockito.when
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark._
+import org.apache.spark.internal.config
+import org.apache.spark.util.ManualClock
+
+class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach 
with MockitoSugar
+with LocalSparkContext {
+
+  private val clock = new ManualClock(0)
+
+  private var blacklistTracker: BlacklistTracker = _
+
+  override def afterEach(): Unit = {
+if (blacklistTracker != null) {
+  blacklistTracker = null
+}
+super.afterEach()
+  }
+
+  val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 
100).map{_.toString}).toSet
+
+  /**
+   * Its easier to write our tests as if we could directly look at the 
sets of nodes & executors in
+   * the blacklist.  However the api doesn't expose a set (for 
thread-safety), so this is a simple
+   * way to test something similar, since we know the universe of values 
that might appear in these
+   * sets.
+   */
+  def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): 
Unit = {
+allOptions.foreach { opt =>
+  val actual = f(opt)
+  val exp = expected.contains(opt)
+  assert(actual === exp, raw"""for string "$opt" """)
+}
+  }
+
+  def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = {
+sc = new SparkContext(conf)
+val scheduler = mock[TaskSchedulerImpl]
+when(scheduler.sc).thenReturn(sc)
+
when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker)
+scheduler
+  }
+
+  test("Blacklisting individual tasks") {
+val conf = new SparkConf().setAppName("test").setMaster("local")
+  .set(config.BLACKLIST_ENABLED.key, "true")
+val scheduler = mockTaskSchedWithConf(conf)
+// Task 1 failed on executor 1
+blacklistTracker = new BlacklistTracker(conf, clock)
+val taskSet = FakeTask.createTaskSet(10)
+val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), 
taskSet, 4, clock)
+tsm.updateBlacklistForFailedTask("hostA", "1", 0)
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  val exp = (executor == "1"  && index == 0)
+  assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp)
+}
+assert(blacklistTracker.nodeBlacklist() === Set())
+assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set())
+assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set())
+assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set())
+
+// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all 
executors on that host,
+// for all tasks for the stage.  Note the api expects multiple checks 
for each type of
+// blacklist -- this actually fits naturally with its use in the 
scheduler
+tsm.updateBlacklistForFailedTask("hostA", "1", 1)
+tsm.updateBlacklistForFailedTask("hostA", "2", 0)
+tsm.updateBlacklistForFailedTask("hostA", "2", 1)
+// we don't explicitly return the executors in hostA here, but that is 
OK
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  withClue(s"exec = $executor; index = $index") {
+val badExec = (executor == "1" || executor == "2")
+val badPart = (index == 0 || index == 1)
+val taskExp = (badExec && badPart)
+assert(
+  tsm.isExecutorBlacklistedForTask(executor, index) === taskExp)
+val

[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-09-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/12601
  
It sounds like you understand my points. Then, please update your PR based 
on my latest changes and we can review your PR. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-09-01 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r77296487
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala ---
@@ -0,0 +1,503 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.mockito.Mockito.when
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark._
+import org.apache.spark.internal.config
+import org.apache.spark.util.ManualClock
+
+class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach 
with MockitoSugar
+with LocalSparkContext {
+
+  private val clock = new ManualClock(0)
+
+  private var blacklistTracker: BlacklistTracker = _
+
+  override def afterEach(): Unit = {
+if (blacklistTracker != null) {
+  blacklistTracker = null
+}
+super.afterEach()
+  }
+
+  val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 
100).map{_.toString}).toSet
+
+  /**
+   * Its easier to write our tests as if we could directly look at the 
sets of nodes & executors in
+   * the blacklist.  However the api doesn't expose a set (for 
thread-safety), so this is a simple
+   * way to test something similar, since we know the universe of values 
that might appear in these
+   * sets.
+   */
+  def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): 
Unit = {
+allOptions.foreach { opt =>
+  val actual = f(opt)
+  val exp = expected.contains(opt)
+  assert(actual === exp, raw"""for string "$opt" """)
+}
+  }
+
+  def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = {
+sc = new SparkContext(conf)
+val scheduler = mock[TaskSchedulerImpl]
+when(scheduler.sc).thenReturn(sc)
+
when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker)
+scheduler
+  }
+
+  test("Blacklisting individual tasks") {
+val conf = new SparkConf().setAppName("test").setMaster("local")
+  .set(config.BLACKLIST_ENABLED.key, "true")
+val scheduler = mockTaskSchedWithConf(conf)
+// Task 1 failed on executor 1
+blacklistTracker = new BlacklistTracker(conf, clock)
+val taskSet = FakeTask.createTaskSet(10)
+val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), 
taskSet, 4, clock)
+tsm.updateBlacklistForFailedTask("hostA", "1", 0)
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  val exp = (executor == "1"  && index == 0)
+  assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp)
+}
+assert(blacklistTracker.nodeBlacklist() === Set())
+assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set())
+assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set())
+assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set())
+
+// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all 
executors on that host,
+// for all tasks for the stage.  Note the api expects multiple checks 
for each type of
+// blacklist -- this actually fits naturally with its use in the 
scheduler
+tsm.updateBlacklistForFailedTask("hostA", "1", 1)
+tsm.updateBlacklistForFailedTask("hostA", "2", 0)
+tsm.updateBlacklistForFailedTask("hostA", "2", 1)
+// we don't explicitly return the executors in hostA here, but that is 
OK
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  withClue(s"exec = $executor; index = $index") {
+val badExec = (executor == "1" || executor == "2")
+val badPart = (index == 0 || index == 1)
+val taskExp = (badExec && badPart)
+assert(
+  tsm.isExecutorBlacklistedForTask(executor, index) === taskExp)
+val

[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-09-01 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r77296415
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala ---
@@ -0,0 +1,503 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.mockito.Mockito.when
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark._
+import org.apache.spark.internal.config
+import org.apache.spark.util.ManualClock
+
+class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach 
with MockitoSugar
+with LocalSparkContext {
+
+  private val clock = new ManualClock(0)
+
+  private var blacklistTracker: BlacklistTracker = _
+
+  override def afterEach(): Unit = {
+if (blacklistTracker != null) {
+  blacklistTracker = null
+}
+super.afterEach()
+  }
+
+  val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 
100).map{_.toString}).toSet
+
+  /**
+   * Its easier to write our tests as if we could directly look at the 
sets of nodes & executors in
+   * the blacklist.  However the api doesn't expose a set (for 
thread-safety), so this is a simple
+   * way to test something similar, since we know the universe of values 
that might appear in these
+   * sets.
+   */
+  def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): 
Unit = {
+allOptions.foreach { opt =>
+  val actual = f(opt)
+  val exp = expected.contains(opt)
+  assert(actual === exp, raw"""for string "$opt" """)
+}
+  }
+
+  def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = {
+sc = new SparkContext(conf)
+val scheduler = mock[TaskSchedulerImpl]
+when(scheduler.sc).thenReturn(sc)
+
when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker)
+scheduler
+  }
+
+  test("Blacklisting individual tasks") {
+val conf = new SparkConf().setAppName("test").setMaster("local")
+  .set(config.BLACKLIST_ENABLED.key, "true")
+val scheduler = mockTaskSchedWithConf(conf)
+// Task 1 failed on executor 1
+blacklistTracker = new BlacklistTracker(conf, clock)
+val taskSet = FakeTask.createTaskSet(10)
+val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), 
taskSet, 4, clock)
+tsm.updateBlacklistForFailedTask("hostA", "1", 0)
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  val exp = (executor == "1"  && index == 0)
+  assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp)
+}
+assert(blacklistTracker.nodeBlacklist() === Set())
+assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set())
+assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set())
+assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set())
+
+// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all 
executors on that host,
+// for all tasks for the stage.  Note the api expects multiple checks 
for each type of
+// blacklist -- this actually fits naturally with its use in the 
scheduler
+tsm.updateBlacklistForFailedTask("hostA", "1", 1)
+tsm.updateBlacklistForFailedTask("hostA", "2", 0)
+tsm.updateBlacklistForFailedTask("hostA", "2", 1)
+// we don't explicitly return the executors in hostA here, but that is 
OK
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  withClue(s"exec = $executor; index = $index") {
+val badExec = (executor == "1" || executor == "2")
+val badPart = (index == 0 || index == 1)
+val taskExp = (badExec && badPart)
+assert(
+  tsm.isExecutorBlacklistedForTask(executor, index) === taskExp)
+val

[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-09-01 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r77296312
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala ---
@@ -0,0 +1,503 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.mockito.Mockito.when
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark._
+import org.apache.spark.internal.config
+import org.apache.spark.util.ManualClock
+
+class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach 
with MockitoSugar
+with LocalSparkContext {
+
+  private val clock = new ManualClock(0)
+
+  private var blacklistTracker: BlacklistTracker = _
+
+  override def afterEach(): Unit = {
+if (blacklistTracker != null) {
+  blacklistTracker = null
+}
+super.afterEach()
+  }
+
+  val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 
100).map{_.toString}).toSet
+
+  /**
+   * Its easier to write our tests as if we could directly look at the 
sets of nodes & executors in
+   * the blacklist.  However the api doesn't expose a set (for 
thread-safety), so this is a simple
+   * way to test something similar, since we know the universe of values 
that might appear in these
+   * sets.
+   */
+  def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): 
Unit = {
+allOptions.foreach { opt =>
+  val actual = f(opt)
+  val exp = expected.contains(opt)
+  assert(actual === exp, raw"""for string "$opt" """)
+}
+  }
+
+  def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = {
+sc = new SparkContext(conf)
+val scheduler = mock[TaskSchedulerImpl]
+when(scheduler.sc).thenReturn(sc)
+
when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker)
+scheduler
+  }
+
+  test("Blacklisting individual tasks") {
+val conf = new SparkConf().setAppName("test").setMaster("local")
+  .set(config.BLACKLIST_ENABLED.key, "true")
+val scheduler = mockTaskSchedWithConf(conf)
+// Task 1 failed on executor 1
+blacklistTracker = new BlacklistTracker(conf, clock)
+val taskSet = FakeTask.createTaskSet(10)
+val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), 
taskSet, 4, clock)
+tsm.updateBlacklistForFailedTask("hostA", "1", 0)
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  val exp = (executor == "1"  && index == 0)
+  assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp)
+}
+assert(blacklistTracker.nodeBlacklist() === Set())
+assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set())
+assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set())
+assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set())
+
+// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all 
executors on that host,
+// for all tasks for the stage.  Note the api expects multiple checks 
for each type of
+// blacklist -- this actually fits naturally with its use in the 
scheduler
+tsm.updateBlacklistForFailedTask("hostA", "1", 1)
+tsm.updateBlacklistForFailedTask("hostA", "2", 0)
+tsm.updateBlacklistForFailedTask("hostA", "2", 1)
+// we don't explicitly return the executors in hostA here, but that is 
OK
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  withClue(s"exec = $executor; index = $index") {
+val badExec = (executor == "1" || executor == "2")
+val badPart = (index == 0 || index == 1)
+val taskExp = (badExec && badPart)
+assert(
+  tsm.isExecutorBlacklistedForTask(executor, index) === taskExp)
+val

[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-09-01 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r77296176
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala ---
@@ -0,0 +1,503 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.mockito.Mockito.when
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark._
+import org.apache.spark.internal.config
+import org.apache.spark.util.ManualClock
+
+class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach 
with MockitoSugar
+with LocalSparkContext {
+
+  private val clock = new ManualClock(0)
+
+  private var blacklistTracker: BlacklistTracker = _
+
+  override def afterEach(): Unit = {
+if (blacklistTracker != null) {
+  blacklistTracker = null
+}
+super.afterEach()
+  }
+
+  val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 
100).map{_.toString}).toSet
+
+  /**
+   * Its easier to write our tests as if we could directly look at the 
sets of nodes & executors in
+   * the blacklist.  However the api doesn't expose a set (for 
thread-safety), so this is a simple
+   * way to test something similar, since we know the universe of values 
that might appear in these
+   * sets.
+   */
+  def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): 
Unit = {
+allOptions.foreach { opt =>
+  val actual = f(opt)
+  val exp = expected.contains(opt)
+  assert(actual === exp, raw"""for string "$opt" """)
+}
+  }
+
+  def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = {
+sc = new SparkContext(conf)
+val scheduler = mock[TaskSchedulerImpl]
+when(scheduler.sc).thenReturn(sc)
+
when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker)
+scheduler
+  }
+
+  test("Blacklisting individual tasks") {
+val conf = new SparkConf().setAppName("test").setMaster("local")
+  .set(config.BLACKLIST_ENABLED.key, "true")
+val scheduler = mockTaskSchedWithConf(conf)
+// Task 1 failed on executor 1
+blacklistTracker = new BlacklistTracker(conf, clock)
+val taskSet = FakeTask.createTaskSet(10)
+val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), 
taskSet, 4, clock)
+tsm.updateBlacklistForFailedTask("hostA", "1", 0)
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  val exp = (executor == "1"  && index == 0)
+  assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp)
+}
+assert(blacklistTracker.nodeBlacklist() === Set())
+assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set())
+assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set())
+assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set())
+
+// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all 
executors on that host,
+// for all tasks for the stage.  Note the api expects multiple checks 
for each type of
+// blacklist -- this actually fits naturally with its use in the 
scheduler
+tsm.updateBlacklistForFailedTask("hostA", "1", 1)
+tsm.updateBlacklistForFailedTask("hostA", "2", 0)
+tsm.updateBlacklistForFailedTask("hostA", "2", 1)
+// we don't explicitly return the executors in hostA here, but that is 
OK
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  withClue(s"exec = $executor; index = $index") {
+val badExec = (executor == "1" || executor == "2")
+val badPart = (index == 0 || index == 1)
+val taskExp = (badExec && badPart)
+assert(
+  tsm.isExecutorBlacklistedForTask(executor, index) === taskExp)
+val

[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-09-01 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r77296120
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala ---
@@ -0,0 +1,503 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.mockito.Mockito.when
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark._
+import org.apache.spark.internal.config
+import org.apache.spark.util.ManualClock
+
+class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach 
with MockitoSugar
+with LocalSparkContext {
+
+  private val clock = new ManualClock(0)
+
+  private var blacklistTracker: BlacklistTracker = _
+
+  override def afterEach(): Unit = {
+if (blacklistTracker != null) {
+  blacklistTracker = null
+}
+super.afterEach()
+  }
+
+  val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 
100).map{_.toString}).toSet
+
+  /**
+   * Its easier to write our tests as if we could directly look at the 
sets of nodes & executors in
+   * the blacklist.  However the api doesn't expose a set (for 
thread-safety), so this is a simple
+   * way to test something similar, since we know the universe of values 
that might appear in these
+   * sets.
+   */
+  def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): 
Unit = {
+allOptions.foreach { opt =>
+  val actual = f(opt)
+  val exp = expected.contains(opt)
+  assert(actual === exp, raw"""for string "$opt" """)
+}
+  }
+
+  def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = {
+sc = new SparkContext(conf)
+val scheduler = mock[TaskSchedulerImpl]
+when(scheduler.sc).thenReturn(sc)
+
when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker)
+scheduler
+  }
+
+  test("Blacklisting individual tasks") {
+val conf = new SparkConf().setAppName("test").setMaster("local")
+  .set(config.BLACKLIST_ENABLED.key, "true")
+val scheduler = mockTaskSchedWithConf(conf)
+// Task 1 failed on executor 1
+blacklistTracker = new BlacklistTracker(conf, clock)
+val taskSet = FakeTask.createTaskSet(10)
+val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), 
taskSet, 4, clock)
+tsm.updateBlacklistForFailedTask("hostA", "1", 0)
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  val exp = (executor == "1"  && index == 0)
+  assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp)
+}
+assert(blacklistTracker.nodeBlacklist() === Set())
+assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set())
+assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set())
+assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set())
+
+// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all 
executors on that host,
+// for all tasks for the stage.  Note the api expects multiple checks 
for each type of
+// blacklist -- this actually fits naturally with its use in the 
scheduler
+tsm.updateBlacklistForFailedTask("hostA", "1", 1)
+tsm.updateBlacklistForFailedTask("hostA", "2", 0)
+tsm.updateBlacklistForFailedTask("hostA", "2", 1)
+// we don't explicitly return the executors in hostA here, but that is 
OK
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  withClue(s"exec = $executor; index = $index") {
+val badExec = (executor == "1" || executor == "2")
+val badPart = (index == 0 || index == 1)
+val taskExp = (badExec && badPart)
--- End diff --

expectTaskBlacklisted? (this is too short to guess the meaning). Or just 
in-line it

[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-09-01 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r77296072
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala ---
@@ -0,0 +1,503 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.mockito.Mockito.when
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark._
+import org.apache.spark.internal.config
+import org.apache.spark.util.ManualClock
+
+class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach 
with MockitoSugar
+with LocalSparkContext {
+
+  private val clock = new ManualClock(0)
+
+  private var blacklistTracker: BlacklistTracker = _
+
+  override def afterEach(): Unit = {
+if (blacklistTracker != null) {
+  blacklistTracker = null
+}
+super.afterEach()
+  }
+
+  val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 
100).map{_.toString}).toSet
+
+  /**
+   * Its easier to write our tests as if we could directly look at the 
sets of nodes & executors in
+   * the blacklist.  However the api doesn't expose a set (for 
thread-safety), so this is a simple
+   * way to test something similar, since we know the universe of values 
that might appear in these
+   * sets.
+   */
+  def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): 
Unit = {
+allOptions.foreach { opt =>
+  val actual = f(opt)
+  val exp = expected.contains(opt)
+  assert(actual === exp, raw"""for string "$opt" """)
+}
+  }
+
+  def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = {
+sc = new SparkContext(conf)
+val scheduler = mock[TaskSchedulerImpl]
+when(scheduler.sc).thenReturn(sc)
+
when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker)
+scheduler
+  }
+
+  test("Blacklisting individual tasks") {
+val conf = new SparkConf().setAppName("test").setMaster("local")
+  .set(config.BLACKLIST_ENABLED.key, "true")
+val scheduler = mockTaskSchedWithConf(conf)
+// Task 1 failed on executor 1
+blacklistTracker = new BlacklistTracker(conf, clock)
+val taskSet = FakeTask.createTaskSet(10)
+val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), 
taskSet, 4, clock)
+tsm.updateBlacklistForFailedTask("hostA", "1", 0)
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  val exp = (executor == "1"  && index == 0)
+  assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp)
+}
+assert(blacklistTracker.nodeBlacklist() === Set())
+assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set())
+assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set())
+assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set())
+
+// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all 
executors on that host,
+// for all tasks for the stage.  Note the api expects multiple checks 
for each type of
+// blacklist -- this actually fits naturally with its use in the 
scheduler
+tsm.updateBlacklistForFailedTask("hostA", "1", 1)
+tsm.updateBlacklistForFailedTask("hostA", "2", 0)
+tsm.updateBlacklistForFailedTask("hostA", "2", 1)
+// we don't explicitly return the executors in hostA here, but that is 
OK
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  withClue(s"exec = $executor; index = $index") {
+val badExec = (executor == "1" || executor == "2")
+val badPart = (index == 0 || index == 1)
--- End diff --

badIndex? badTaskIndex? (I'm guessing this is leftover from the old 
partition naming)


---
If your project is set up for it, you can

[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-09-01 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r77295971
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala ---
@@ -0,0 +1,503 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.mockito.Mockito.when
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark._
+import org.apache.spark.internal.config
+import org.apache.spark.util.ManualClock
+
+class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach 
with MockitoSugar
+with LocalSparkContext {
+
+  private val clock = new ManualClock(0)
+
+  private var blacklistTracker: BlacklistTracker = _
+
+  override def afterEach(): Unit = {
+if (blacklistTracker != null) {
+  blacklistTracker = null
+}
+super.afterEach()
+  }
+
+  val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 
100).map{_.toString}).toSet
+
+  /**
+   * Its easier to write our tests as if we could directly look at the 
sets of nodes & executors in
+   * the blacklist.  However the api doesn't expose a set (for 
thread-safety), so this is a simple
+   * way to test something similar, since we know the universe of values 
that might appear in these
+   * sets.
+   */
+  def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): 
Unit = {
+allOptions.foreach { opt =>
+  val actual = f(opt)
+  val exp = expected.contains(opt)
+  assert(actual === exp, raw"""for string "$opt" """)
+}
+  }
+
+  def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = {
+sc = new SparkContext(conf)
+val scheduler = mock[TaskSchedulerImpl]
+when(scheduler.sc).thenReturn(sc)
+
when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker)
+scheduler
+  }
+
+  test("Blacklisting individual tasks") {
+val conf = new SparkConf().setAppName("test").setMaster("local")
+  .set(config.BLACKLIST_ENABLED.key, "true")
+val scheduler = mockTaskSchedWithConf(conf)
+// Task 1 failed on executor 1
+blacklistTracker = new BlacklistTracker(conf, clock)
+val taskSet = FakeTask.createTaskSet(10)
+val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), 
taskSet, 4, clock)
+tsm.updateBlacklistForFailedTask("hostA", "1", 0)
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  val exp = (executor == "1"  && index == 0)
+  assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp)
+}
+assert(blacklistTracker.nodeBlacklist() === Set())
+assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set())
+assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set())
+assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set())
+
+// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all 
executors on that host,
+// for all tasks for the stage.  Note the api expects multiple checks 
for each type of
+// blacklist -- this actually fits naturally with its use in the 
scheduler
+tsm.updateBlacklistForFailedTask("hostA", "1", 1)
--- End diff --

Don't bother fixing all of these, but in the future, I find tests like this 
much easier to read if you pass in the parameters  as named parameters, so it's 
obvious what everything means (e.g., updateBlacklistForFailedTask("hostA", exec 
= "1", index = 1))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-09-01 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r77296004
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala ---
@@ -0,0 +1,503 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.mockito.Mockito.when
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark._
+import org.apache.spark.internal.config
+import org.apache.spark.util.ManualClock
+
+class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach 
with MockitoSugar
+with LocalSparkContext {
+
+  private val clock = new ManualClock(0)
+
+  private var blacklistTracker: BlacklistTracker = _
+
+  override def afterEach(): Unit = {
+if (blacklistTracker != null) {
+  blacklistTracker = null
+}
+super.afterEach()
+  }
+
+  val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 
100).map{_.toString}).toSet
+
+  /**
+   * Its easier to write our tests as if we could directly look at the 
sets of nodes & executors in
+   * the blacklist.  However the api doesn't expose a set (for 
thread-safety), so this is a simple
+   * way to test something similar, since we know the universe of values 
that might appear in these
+   * sets.
+   */
+  def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): 
Unit = {
+allOptions.foreach { opt =>
+  val actual = f(opt)
+  val exp = expected.contains(opt)
+  assert(actual === exp, raw"""for string "$opt" """)
+}
+  }
+
+  def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = {
+sc = new SparkContext(conf)
+val scheduler = mock[TaskSchedulerImpl]
+when(scheduler.sc).thenReturn(sc)
+
when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker)
+scheduler
+  }
+
+  test("Blacklisting individual tasks") {
+val conf = new SparkConf().setAppName("test").setMaster("local")
+  .set(config.BLACKLIST_ENABLED.key, "true")
+val scheduler = mockTaskSchedWithConf(conf)
+// Task 1 failed on executor 1
+blacklistTracker = new BlacklistTracker(conf, clock)
+val taskSet = FakeTask.createTaskSet(10)
+val tsm = new TaskSetManager(scheduler, Some(blacklistTracker), 
taskSet, 4, clock)
+tsm.updateBlacklistForFailedTask("hostA", "1", 0)
+for {
+  executor <- (1 to 4).map(_.toString)
+  index <- 0 until 10
+} {
+  val exp = (executor == "1"  && index == 0)
+  assert(tsm.isExecutorBlacklistedForTask(executor, index) === exp)
+}
+assert(blacklistTracker.nodeBlacklist() === Set())
+assertEquivalentToSet(blacklistTracker.isNodeBlacklisted(_), Set())
+assertEquivalentToSet(tsm.isNodeBlacklistedForTaskSet, Set())
+assertEquivalentToSet(tsm.isExecutorBlacklistedForTaskSet, Set())
+
+// Task 1 & 2 failed on both executor 1 & 2, so we blacklist all 
executors on that host,
--- End diff --

can you add "should" before "blacklist" (so it's clear what the test is 
doing versus what the test is verifying)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14921: [SPARK-17361][SQL] file-based external table with...

2016-09-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14921#discussion_r77295817
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -314,12 +314,8 @@ case class DataSource(
   /**
* Create a resolved [[BaseRelation]] that can be used to read data from 
or write data into this
* [[DataSource]]
-   *
-   * @param checkPathExist A flag to indicate whether to check the 
existence of path or not.
-   *   This flag will be set to false when we create 
an empty table (the
-   *   path of the table does not exist).
*/
-  def resolveRelation(checkPathExist: Boolean = true): BaseRelation = {
+  def resolveRelation(): BaseRelation = {
--- End diff --

After a discussion with Wenchen, `resolveRelation` will be invoked by 
`CREATE TABLE ... USING...`, although the write path in `DataFrameWriter`APIs 
does not invoke it. Thanks! @clockfly 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...

2016-09-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14809#discussion_r77295778
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -241,10 +241,21 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   // converts the table metadata to Hive compatible format, i.e. set 
the serde information.
-  def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): 
CatalogTable = {
+  def newHiveCompatibleMetastoreTable(serde: HiveSerDe): CatalogTable 
= {
+val location = if (tableDefinition.tableType == EXTERNAL) {
+  // When we hit this branch, we are saving an external data 
source table with hive
+  // compatible format, which means the data source is file-based 
and must have a `path`.
+  val map = new 
CaseInsensitiveMap(tableDefinition.storage.properties)
+  assert(map.contains("path"),
--- End diff --

how about require?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14921: [SPARK-17361][SQL] file-based external table without pat...

2016-09-01 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14921
  
@clockfly updated, the only external behavior change is what I fixed in 
this PR: creating file-based external table without path will fail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-09-01 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r77295487
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala ---
@@ -0,0 +1,503 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.mockito.Mockito.when
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.mock.MockitoSugar
+
+import org.apache.spark._
+import org.apache.spark.internal.config
+import org.apache.spark.util.ManualClock
+
+class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach 
with MockitoSugar
+with LocalSparkContext {
+
+  private val clock = new ManualClock(0)
+
+  private var blacklistTracker: BlacklistTracker = _
+
+  override def afterEach(): Unit = {
+if (blacklistTracker != null) {
+  blacklistTracker = null
+}
+super.afterEach()
+  }
+
+  val allOptions = (('A' to 'Z').map("host" + _) ++ (1 to 
100).map{_.toString}).toSet
--- End diff --

allExecutorAndHostIds? allExecutorAndHostNames?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14773: [SPARK-17203][SQL] data source options should alw...

2016-09-01 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14773#discussion_r77295324
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala ---
@@ -90,17 +90,31 @@ case class RefreshResource(path: String)
 /**
  * Builds a map in which keys are case insensitive
  */
-class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, 
String]
+class CaseInsensitiveMap[T] private(baseMap: Map[String, T]) extends 
Map[String, T]
--- End diff --

If we decide to go case-insensitive all usage of options, probably we don't 
need class CaseInsensitiveMap adapter class any longer.  Instead, we should 
probably convert to user provided options to lower case directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14931: [SPARK-17370] Shuffle service files not invalidat...

2016-09-01 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/14931#discussion_r77294974
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -346,15 +346,16 @@ private[spark] class TaskSchedulerImpl(
 
   def statusUpdate(tid: Long, state: TaskState, serializedData: 
ByteBuffer) {
 var failedExecutor: Option[String] = None
+var reason: ExecutorLossReason = null
--- End diff --

Nit: it would be clearer to make this an Option similar to `failedExecutor` 
rather than a null. The two variables are used together.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14915: [SPARK-17356][SQL] Fix out of memory issue when generati...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14915
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64828/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14915: [SPARK-17356][SQL] Fix out of memory issue when generati...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14915
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14773: [SPARK-17203][SQL] data source options should alw...

2016-09-01 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14773#discussion_r77294893
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala ---
@@ -90,17 +90,31 @@ case class RefreshResource(path: String)
 /**
  * Builds a map in which keys are case insensitive
  */
-class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, 
String]
+class CaseInsensitiveMap[T] private(baseMap: Map[String, T]) extends 
Map[String, T]
   with Serializable {
 
-  val baseMap = map.map(kv => kv.copy(_1 = kv._1.toLowerCase))
+  override def get(k: String): Option[T] = baseMap.get(k.toLowerCase)
 
-  override def get(k: String): Option[String] = baseMap.get(k.toLowerCase)
+  override def + [B1 >: T](kv: (String, B1)): Map[String, B1] =
+new CaseInsensitiveMap(baseMap + kv.copy(_1 = kv._1.toLowerCase))
 
-  override def + [B1 >: String](kv: (String, B1)): Map[String, B1] =
-baseMap + kv.copy(_1 = kv._1.toLowerCase)
+  override def iterator: Iterator[(String, T)] = baseMap.iterator
 
-  override def iterator: Iterator[(String, String)] = baseMap.iterator
+  override def -(key: String): Map[String, T] =
+new CaseInsensitiveMap(baseMap - key.toLowerCase)
+}
 
-  override def -(key: String): Map[String, String] = baseMap - 
key.toLowerCase
+object CaseInsensitiveMap {
+  def apply[T](map: Map[String, T]): CaseInsensitiveMap[T] = {
+val lowercaseKeys = map.keys.map(_.toLowerCase)
+if (lowercaseKeys.toSet.size != map.size) {
+  val duplicatedKeys = lowercaseKeys.groupBy(identity).collect {
+case (x, ys) if ys.size > 1 => x
+  }
+  throw new AnalysisException(
--- End diff --

Why it is a AnalysisException? This class is only supposed to be used at 
driver?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14915: [SPARK-17356][SQL] Fix out of memory issue when generati...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14915
  
**[Test build #64828 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64828/consoleFull)**
 for PR 14915 at commit 
[`39f3c63`](https://github.com/apache/spark/commit/39f3c63cbde086f44b8777d4ff708daa3bef2f18).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...

2016-09-01 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14809#discussion_r77294000
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -272,17 +282,11 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   "Hive metastore in Spark SQL specific format, which is NOT 
compatible with Hive. "
   (None, message)
 
-case (Some(serde), Some(path)) =>
+case Some(serde) =>
   val message =
-s"Persisting file based data source table $qualifiedTableName 
with an input path " +
-  s"into Hive metastore in Hive compatible format."
-  (Some(newHiveCompatibleMetastoreTable(serde, path)), message)
-
-case (Some(_), None) =>
-  val message =
-s"Data source table $qualifiedTableName is not file based. 
Persisting it into " +
-  s"Hive metastore in Spark SQL specific format, which is NOT 
compatible with Hive."
-  (None, message)
+s"Persisting file based data source table $qualifiedTableName 
into " +
--- End diff --

I think it is good to have the path in the log message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-09-01 Thread JustinPihony

Github user JustinPihony commented on the issue:

https://github.com/apache/spark/pull/12601
  
Yes I realize that `SchemaRelationProvider` is not necessary, but the 
legwork has already been done, so why not take advantage of it. Even still, 
this original PR is actually not that far from your duplicate, except it has a 
few lines in `JDBCRelation.scala`. So the code changes are close to the same 
size. All I need is for a final review as I have made the changes requested 
from the original review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...

2016-09-01 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14809#discussion_r77293800
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -241,10 +241,21 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 
   // converts the table metadata to Hive compatible format, i.e. set 
the serde information.
-  def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): 
CatalogTable = {
+  def newHiveCompatibleMetastoreTable(serde: HiveSerDe): CatalogTable 
= {
+val location = if (tableDefinition.tableType == EXTERNAL) {
+  // When we hit this branch, we are saving an external data 
source table with hive
+  // compatible format, which means the data source is file-based 
and must have a `path`.
+  val map = new 
CaseInsensitiveMap(tableDefinition.storage.properties)
+  assert(map.contains("path"),
--- End diff --

assert will be removed during runtime, can we use explicit exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14850: [SPARK-17279][SQL] better error message for exceptions d...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14850
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14850: [SPARK-17279][SQL] better error message for exceptions d...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14850
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64829/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14850: [SPARK-17279][SQL] better error message for exceptions d...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14850
  
**[Test build #64829 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64829/consoleFull)**
 for PR 14850 at commit 
[`b7c459b`](https://github.com/apache/spark/commit/b7c459b0231055f6e9915a42c72bc39610b1fe03).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips charact...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14924
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips charact...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14924
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64827/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression interf...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14834
  
**[Test build #64832 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64832/consoleFull)**
 for PR 14834 at commit 
[`c52ef66`](https://github.com/apache/spark/commit/c52ef666e40b511351ea7dd45b075be7f001416f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips charact...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14924
  
**[Test build #64827 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64827/consoleFull)**
 for PR 14924 at commit 
[`554f46c`](https://github.com/apache/spark/commit/554f46cb9cb3a8d5e9bc0be458f8e3ec5c78d809).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14850: [SPARK-17279][SQL] better error message for excep...

2016-09-01 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14850#discussion_r77292898
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -1039,9 +1033,18 @@ case class ScalaUDF(
   (convert, argTerm)
 }.unzip
 
-val callFunc = s"${ctx.boxedType(dataType)} $resultTerm = " +
-  s"(${ctx.boxedType(dataType)})${catalystConverterTerm}" +
-s".apply($funcTerm.apply(${funcArguments.mkString(", ")}));"
+val getFuncResult = s"$funcTerm.apply(${funcArguments.mkString(", ")})"
+val rethrowException = "throw new org.apache.spark.SparkException" +
+  """("Exception happens when execute user code in Scala UDF.", e);"""
--- End diff --

Exception happens when `executing` user defined function (className:  input 
argument type => output argument type) .
Or 
`Failed to execute user defined function (className: input argument type => 
output argument type)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression interf...

2016-09-01 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/14834
  
@yanboliang Thanks for the tip. Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14921: [SPARK-17361][SQL] file-based external table with...

2016-09-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14921#discussion_r77292804
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -314,12 +314,8 @@ case class DataSource(
   /**
* Create a resolved [[BaseRelation]] that can be used to read data from 
or write data into this
* [[DataSource]]
-   *
-   * @param checkPathExist A flag to indicate whether to check the 
existence of path or not.
-   *   This flag will be set to false when we create 
an empty table (the
-   *   path of the table does not exist).
*/
-  def resolveRelation(checkPathExist: Boolean = true): BaseRelation = {
+  def resolveRelation(): BaseRelation = {
--- End diff --

Based on my understanding, `resolveRelation` is not invoked by the write 
path of the non-file based data sources. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14921: [SPARK-17361][SQL] createExternalTable should fail if pa...

2016-09-01 Thread clockfly

Github user clockfly commented on the issue:

https://github.com/apache/spark/pull/14921
  
@cloud-fan  Can you update the PR tile and PR description to be more 
user-facing.

For example, it is better to use "CREATE TABLE ..USING.." in the PR title.
It is also better to list some user behavior change in the PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression interf...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14834
  
**[Test build #64831 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64831/consoleFull)**
 for PR 14834 at commit 
[`5bce1ba`](https://github.com/apache/spark/commit/5bce1ba7381d207c51005628319574d40a2211aa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...

2016-09-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14921#discussion_r77292635
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -314,12 +314,8 @@ case class DataSource(
   /**
* Create a resolved [[BaseRelation]] that can be used to read data from 
or write data into this
* [[DataSource]]
-   *
-   * @param checkPathExist A flag to indicate whether to check the 
existence of path or not.
-   *   This flag will be set to false when we create 
an empty table (the
-   *   path of the table does not exist).
*/
-  def resolveRelation(checkPathExist: Boolean = true): BaseRelation = {
+  def resolveRelation(): BaseRelation = {
--- End diff --

When a data source wants to implement a write path (`save` API), they need 
to extend the trait `CreatableRelationProvider`. That is what my PR 
https://github.com/apache/spark/pull/14077 does. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...

2016-09-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14921#discussion_r77292575
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -314,12 +314,8 @@ case class DataSource(
   /**
* Create a resolved [[BaseRelation]] that can be used to read data from 
or write data into this
* [[DataSource]]
-   *
-   * @param checkPathExist A flag to indicate whether to check the 
existence of path or not.
-   *   This flag will be set to false when we create 
an empty table (the
-   *   path of the table does not exist).
*/
-  def resolveRelation(checkPathExist: Boolean = true): BaseRelation = {
+  def resolveRelation(): BaseRelation = {
--- End diff --

What I said before is wrong, managed table still need to call 
`resolveRelation` to do some validation, because the data source may not be 
file-based but something else.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...

2016-09-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14921#discussion_r77292542
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -314,12 +314,8 @@ case class DataSource(
   /**
* Create a resolved [[BaseRelation]] that can be used to read data from 
or write data into this
* [[DataSource]]
-   *
-   * @param checkPathExist A flag to indicate whether to check the 
existence of path or not.
-   *   This flag will be set to false when we create 
an empty table (the
-   *   path of the table does not exist).
*/
-  def resolveRelation(checkPathExist: Boolean = true): BaseRelation = {
+  def resolveRelation(): BaseRelation = {
--- End diff --

`createRelation` is from the trait `RelationProvider`
```Scala
  override def createRelation(
  sqlContext: SQLContext,
  parameters: Map[String, String]): BaseRelation 
```

`RelationProvider` is only used for the read path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...

2016-09-01 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14921#discussion_r77292448
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -314,12 +314,8 @@ case class DataSource(
   /**
* Create a resolved [[BaseRelation]] that can be used to read data from 
or write data into this
* [[DataSource]]
-   *
-   * @param checkPathExist A flag to indicate whether to check the 
existence of path or not.
-   *   This flag will be set to false when we create 
an empty table (the
-   *   path of the table does not exist).
*/
-  def resolveRelation(checkPathExist: Boolean = true): BaseRelation = {
+  def resolveRelation(): BaseRelation = {
--- End diff --

@gatorsmile  I means write path.

When createRelation() is called on a RelationProvider, RelationProvider may 
do some extra check to make sure the options provided are valid. We'd better 
enforce the check when trying to create a managed table.

For example, JdbcRelationProvider will validate the options 
```
class JdbcRelationProvider extends RelationProvider with DataSourceRegister 
{

  override def shortName(): String = "jdbc"

  /** Returns a new base relation with the given parameters. */
  override def createRelation(
  sqlContext: SQLContext,
  parameters: Map[String, String]): BaseRelation = {
val jdbcOptions = new JDBCOptions(parameters)
if (jdbcOptions.partitionColumn != null
  && (jdbcOptions.lowerBound == null
|| jdbcOptions.upperBound == null
|| jdbcOptions.numPartitions == null)) {
  sys.error("Partitioning incompletely specified")
}

val partitionInfo = if (jdbcOptions.partitionColumn == null) {
  null
} else {
  JDBCPartitioningInfo(
jdbcOptions.partitionColumn,
jdbcOptions.lowerBound.toLong,
jdbcOptions.upperBound.toLong,
jdbcOptions.numPartitions.toInt)
}
val parts = JDBCRelation.columnPartition(partitionInfo)
val properties = new Properties() // Additional properties that we will 
pass to getConnection
parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, 
properties)(sqlContext.sparkSession)
  }
}

```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...

2016-09-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14921#discussion_r77292357
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -314,12 +314,8 @@ case class DataSource(
   /**
* Create a resolved [[BaseRelation]] that can be used to read data from 
or write data into this
* [[DataSource]]
-   *
-   * @param checkPathExist A flag to indicate whether to check the 
existence of path or not.
-   *   This flag will be set to false when we create 
an empty table (the
-   *   path of the table does not exist).
*/
-  def resolveRelation(checkPathExist: Boolean = true): BaseRelation = {
+  def resolveRelation(): BaseRelation = {
--- End diff --

FYI, today, I just updated the write path for JDBC connection. 
https://github.com/apache/spark/pull/14077


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14921: [SPARK-17361][SQL] createExternalTable should fail if pa...

2016-09-01 Thread clockfly

Github user clockfly commented on the issue:

https://github.com/apache/spark/pull/14921
  
ï¼1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-09-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r77292196
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -184,4 +184,17 @@ abstract class ExternalCatalog {
 
   def listFunctions(db: String, pattern: String): Seq[String]
 
+  // 
--
+  // Resources
+  // 
--
+
+  /**
+   * Add a JAR resource to the underlying external catalog for DDL (e.g. 
CREATE TABLE) and DML
+   * (e.g., LOAD TABLE) operations.
+   *
+   * For example, when users create a Hive serde table, they can specify a 
custom
+   * Serializer-Deserializer (SerDe) class. When Hive metastore is unable 
to access the custom SerDe
+   * JAR (e.g., not on the Hive classpath), the JAR file must be added at 
runtime using this API.
+   */
+  def addJar(path: String): Unit
--- End diff --

We still need an `addJar` API in the `ExternalCatalog` for passing it to 
the underlying Hive metastore, even if we can specify jar in data source 
options. Here, we are not storing the `Jar` info in the metastore. The Hive 
metastore needs it for creating/loading the tables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...

2016-09-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14921#discussion_r77291908
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -314,12 +314,8 @@ case class DataSource(
   /**
* Create a resolved [[BaseRelation]] that can be used to read data from 
or write data into this
* [[DataSource]]
-   *
-   * @param checkPathExist A flag to indicate whether to check the 
existence of path or not.
-   *   This flag will be set to false when we create 
an empty table (the
-   *   path of the table does not exist).
*/
-  def resolveRelation(checkPathExist: Boolean = true): BaseRelation = {
+  def resolveRelation(): BaseRelation = {
--- End diff --

@clockfly Sorry, I did not get your point. What you said above is only for 
the read path, right? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14933: [SPARK-16533][CORE] - backport driver deadlock fix to 2....

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14933
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14933: [SPARK-16533][CORE] - backport driver deadlock fi...

2016-09-01 Thread angolon

GitHub user angolon opened a pull request:

https://github.com/apache/spark/pull/14933

[SPARK-16533][CORE] - backport driver deadlock fix to 2.0

## What changes were proposed in this pull request?
Backport changes from #14710 and #14925 to 2.0

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/angolon/spark SPARK-16533-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14933.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14933


commit 45a3f220b5f1b08fbe4f8d390755041dd2738e67
Author: Angus Gerry 
Date:   2016-09-01T17:35:31Z

[SPARK-16533][CORE] resolve deadlocking in driver when executors die

This pull request reverts the changes made as a part of #14605, which 
simply side-steps the deadlock issue. Instead, I propose the following approach:
* Use `scheduleWithFixedDelay` when calling 
`ExecutorAllocationManager.schedule` for scheduling executor requests. The 
intent of this is that if invocations are delayed beyond the default schedule 
interval on account of lock contention, then we avoid a situation where calls 
to `schedule` are made back-to-back, potentially releasing and then immediately 
reacquiring these locks - further exacerbating contention.
* Replace a number of calls to `askWithRetry` with `ask` inside of message 
handling code in `CoarseGrainedSchedulerBackend` and its ilk. This allows us 
queue messages with the relevant endpoints, release whatever locks we might be 
holding, and then block whilst awaiting the response. This change is made at 
the cost of being able to retry should sending the message fail, as retrying 
outside of the lock could easily cause race conditions if other conflicting 
messages have been sent whilst awaiting a response. I believe this to be the 
lesser of two evils, as in many cases these RPC calls are to process local 
components, and so failures are more likely to be deterministic, and timeouts 
are more likely to be caused by lock contention.

Existing tests, and manual tests under yarn-client mode.

Author: Angus Gerry 

Closes #14710 from angolon/SPARK-16533.

commit de488ce0a0025d3c9736a1df6e45d90e265a84d4
Author: Marcelo Vanzin 
Date:   2016-09-01T21:02:58Z

[SPARK-16533][HOTFIX] Fix compilation on Scala 2.10.

No idea why it was failing (the needed import was there), but
this makes things work.

Author: Marcelo Vanzin 

Closes #14925 from vanzin/SPARK-16533.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-09-01 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r77291638
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -184,4 +184,17 @@ abstract class ExternalCatalog {
 
   def listFunctions(db: String, pattern: String): Seq[String]
 
+  // 
--
+  // Resources
+  // 
--
+
+  /**
+   * Add a JAR resource to the underlying external catalog for DDL (e.g. 
CREATE TABLE) and DML
+   * (e.g., LOAD TABLE) operations.
+   *
+   * For example, when users create a Hive serde table, they can specify a 
custom
+   * Serializer-Deserializer (SerDe) class. When Hive metastore is unable 
to access the custom SerDe
+   * JAR (e.g., not on the Hive classpath), the JAR file must be added at 
runtime using this API.
+   */
+  def addJar(path: String): Unit
--- End diff --

This is a good point. Once hive is consolidated into data source table, we 
can specify jar in data source way e.g., option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14921: [SPARK-17361][SQL] createExternalTable should fail if pa...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14921
  
**[Test build #64830 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64830/consoleFull)**
 for PR 14921 at commit 
[`eefe3bc`](https://github.com/apache/spark/commit/eefe3bc28fedeafabd8b3739f9bdea796b74ea9b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13584: [SPARK-15509][ML][SparkR] R MLlib algorithms should supp...

2016-09-01 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/13584
  
LGTM - @felixcheung Feel free to merge when its ready


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14850: [SPARK-17279][SQL] better error message for exceptions d...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14850
  
**[Test build #64829 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64829/consoleFull)**
 for PR 14850 at commit 
[`b7c459b`](https://github.com/apache/spark/commit/b7c459b0231055f6e9915a42c72bc39610b1fe03).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14854
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64826/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14854
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14854
  
**[Test build #64826 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64826/consoleFull)**
 for PR 14854 at commit 
[`9865d32`](https://github.com/apache/spark/commit/9865d3207ec4db9d6832d240690cf9c47742361f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13584: [SPARK-15509][ML][SparkR] R MLlib algorithms should supp...

2016-09-01 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/13584
  
LGTM. @shivaram do you have any other comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13584: [SPARK-15509][ML][SparkR] R MLlib algorithms shou...

2016-09-01 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13584#discussion_r77290933
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/RWrapperUtils.scala ---
@@ -35,13 +35,37 @@ object RWrapperUtils extends Logging {
*/
   def checkDataColumns(rFormula: RFormula, data: Dataset[_]): Unit = {
 if (data.schema.fieldNames.contains(rFormula.getLabelCol)) {
-  logWarning("data containing 'label' column, so change its name to 
avoid conflict")
-  rFormula.setLabelCol(rFormula.getLabelCol + "_output")
+  val newLabelName = convertToUniqueName(rFormula.getLabelCol, 
data.schema.fieldNames)
--- End diff --

fair enough. that makes sense, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...

2016-09-01 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14921#discussion_r77290803
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -314,12 +314,8 @@ case class DataSource(
   /**
* Create a resolved [[BaseRelation]] that can be used to read data from 
or write data into this
* [[DataSource]]
-   *
-   * @param checkPathExist A flag to indicate whether to check the 
existence of path or not.
-   *   This flag will be set to false when we create 
an empty table (the
-   *   path of the table does not exist).
*/
-  def resolveRelation(checkPathExist: Boolean = true): BaseRelation = {
+  def resolveRelation(): BaseRelation = {
--- End diff --

For example, if it is a JDBC Relation provider, we will call
```dataSource.createRelation(sparkSession.sqlContext, 
caseInsensitiveOptions)``` to some extra check

```
  def resolveRelation(checkPathExist: Boolean = true): BaseRelation = {
val caseInsensitiveOptions = new CaseInsensitiveMap(options)
val relation = (providingClass.newInstance(), userSpecifiedSchema) 
match {
  // TODO: Throw when too much is given.
  case (dataSource: SchemaRelationProvider, Some(schema)) =>
dataSource.createRelation(sparkSession.sqlContext, 
caseInsensitiveOptions, schema)
  case (dataSource: RelationProvider, None) =>
dataSource.createRelation(sparkSession.sqlContext, 
caseInsensitiveOptions)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-09-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r77290783
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -184,4 +184,17 @@ abstract class ExternalCatalog {
 
   def listFunctions(db: String, pattern: String): Seq[String]
 
+  // 
--
+  // Resources
+  // 
--
+
+  /**
+   * Add a JAR resource to the underlying external catalog for DDL (e.g. 
CREATE TABLE) and DML
+   * (e.g., LOAD TABLE) operations.
+   *
+   * For example, when users create a Hive serde table, they can specify a 
custom
+   * Serializer-Deserializer (SerDe) class. When Hive metastore is unable 
to access the custom SerDe
+   * JAR (e.g., not on the Hive classpath), the JAR file must be added at 
runtime using this API.
+   */
+  def addJar(path: String): Unit
--- End diff --

Let me copy and paste the reasons why users need the customization 
capabilities:
- When to add a new File Format?

> User has files with special file formats not supported by Hive yet, and 
users donât want to convert the files before loading into Hive.
> User has a more efficient way of storing data on disk.
- When to add a new File Format?

- When to add a new SerDe?

> User has data with special serialized format not supported by Hive yet, 
and users donât want to convert the data before loading into Hive.
> User has a more efficient way of serializing the data on disk.

Also, the existing UDF, UDAF and UDTF can be reused without any rewriting. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14921: [SPARK-17361][SQL] createExternalTable should fai...

2016-09-01 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14921#discussion_r77290773
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -314,12 +314,8 @@ case class DataSource(
   /**
* Create a resolved [[BaseRelation]] that can be used to read data from 
or write data into this
* [[DataSource]]
-   *
-   * @param checkPathExist A flag to indicate whether to check the 
existence of path or not.
-   *   This flag will be set to false when we create 
an empty table (the
-   *   path of the table does not exist).
*/
-  def resolveRelation(checkPathExist: Boolean = true): BaseRelation = {
+  def resolveRelation(): BaseRelation = {
--- End diff --

Checked with Wenchen, it is not safe to skip calling resolveRelation() when 
it is a managed  table. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14866: [SPARK-17298][SQL] Require explicit CROSS join fo...

2016-09-01 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14866#discussion_r77290718
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2265,7 +2265,7 @@ setMethod("join",
   signature(x = "SparkDataFrame", y = "SparkDataFrame"),
   function(x, y, joinExpr = NULL, joinType = NULL) {
 if (is.null(joinExpr)) {
-  sdf <- callJMethod(x@sdf, "join", y@sdf)
+  sdf <- callJMethod(x@sdf, "crossJoin", y@sdf)
 } else {
   if (class(joinExpr) != "Column") stop("joinExpr must be a 
Column")
   if (is.null(joinType)) {
--- End diff --

should the next line be "crossJoin" too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14932: [SPARK-17371] Resubmitted shuffle outputs can get delete...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14932
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14932: [SPARK-17371] Resubmitted shuffle outputs can get delete...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14932
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64824/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14932: [SPARK-17371] Resubmitted shuffle outputs can get delete...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14932
  
**[Test build #64824 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64824/consoleFull)**
 for PR 14932 at commit 
[`350d3be`](https://github.com/apache/spark/commit/350d3be3d77408a85d894bccb746fb603782ca1f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-09-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r77290382
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -184,4 +184,17 @@ abstract class ExternalCatalog {
 
   def listFunctions(db: String, pattern: String): Seq[String]
 
+  // 
--
+  // Resources
+  // 
--
+
+  /**
+   * Add a JAR resource to the underlying external catalog for DDL (e.g. 
CREATE TABLE) and DML
+   * (e.g., LOAD TABLE) operations.
+   *
+   * For example, when users create a Hive serde table, they can specify a 
custom
+   * Serializer-Deserializer (SerDe) class. When Hive metastore is unable 
to access the custom SerDe
+   * JAR (e.g., not on the Hive classpath), the JAR file must be added at 
runtime using this API.
+   */
+  def addJar(path: String): Unit
--- End diff --

Yeah, we still need it after hive and data source tables are consolidated. 
The major reason is for supporting custom Hive Serde, custom Hive file format, 
and custom UDF/UDAF. They are major features for the existing Hive users. These 
supports might be critical for migration from Hive to Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14931
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64823/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14931
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14931
  
**[Test build #64823 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64823/consoleFull)**
 for PR 14931 at commit 
[`a704376`](https://github.com/apache/spark/commit/a704376b57b488ce0ce1b0ba8ed13d36e5debfd4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14915: [SPARK-17356][SQL] Fix out of memory issue when generati...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14915
  
**[Test build #64828 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64828/consoleFull)**
 for PR 14915 at commit 
[`39f3c63`](https://github.com/apache/spark/commit/39f3c63cbde086f44b8777d4ff708daa3bef2f18).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14915: [SPARK-17356][SQL] Fix out of memory issue when g...

2016-09-01 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14915#discussion_r77288334
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala ---
@@ -274,6 +274,13 @@ abstract class QueryTest extends PlanTest {
 val normalized1 = logicalPlan.transformAllExpressions {
   case udf: ScalaUDF => udf.copy(function = null)
   case gen: UserDefinedGenerator => gen.copy(function = null)
+  // SPARK-17356: In usage of mllib, Metadata may store a huge vector 
of data, transforming
+  // it to JSON may trigger OutOfMemoryError.
+  case a @ Alias(child, name) if a.explicitMetadata.isDefined =>
+Alias(child, name)(a.exprId, a.qualifier, Some(Metadata.empty), 
a.isGenerated)
+  case a: AttributeReference if a.metadata != Metadata.empty =>
+AttributeReference(a.name, a.dataType, a.nullable, 
Metadata.empty)(a.exprId, a.qualifier,
+  a.isGenerated)
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14915: [SPARK-17356][SQL] Fix out of memory issue when g...

2016-09-01 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14915#discussion_r77287956
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala ---
@@ -274,6 +274,13 @@ abstract class QueryTest extends PlanTest {
 val normalized1 = logicalPlan.transformAllExpressions {
   case udf: ScalaUDF => udf.copy(function = null)
   case gen: UserDefinedGenerator => gen.copy(function = null)
+  // SPARK-17356: In usage of mllib, Metadata may store a huge vector 
of data, transforming
+  // it to JSON may trigger OutOfMemoryError.
+  case a @ Alias(child, name) if a.explicitMetadata.isDefined =>
+Alias(child, name)(a.exprId, a.qualifier, Some(Metadata.empty), 
a.isGenerated)
+  case a: AttributeReference if a.metadata != Metadata.empty =>
+AttributeReference(a.name, a.dataType, a.nullable, 
Metadata.empty)(a.exprId, a.qualifier,
+  a.isGenerated)
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14930: [SPARK-16926] [SQL] Add unit test to compare tabl...

2016-09-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14930


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14930: [SPARK-16926] [SQL] Add unit test to compare table and p...

2016-09-01 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14930
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14930: [SPARK-16926] [SQL] Add unit test to compare tabl...

2016-09-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14930#discussion_r77287554
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala
 ---
@@ -143,4 +144,38 @@ class HiveTableScanSuite extends HiveComparisonTest 
with SQLTestUtils with TestH
   }
 }
   }
+
+  test("SPARK-16926: number of table and partition columns match for new 
partitioned table") {
+val view = "src"
+withTempView(view) {
+  spark.range(1, 5).createOrReplaceTempView(view)
+  val table = "table_with_partition"
+  withTable(table) {
+sql(
+  s"""
+ |CREATE TABLE $table(id string)
+ |PARTITIONED BY (p1 string,p2 string,p3 string,p4 string,p5 
string)
+   """.stripMargin)
+sql(
+  s"""
+ |FROM $view v
+ |INSERT INTO TABLE $table
+ |PARTITION (p1='a',p2='b',p3='c',p4='d',p5='e')
+ |SELECT v.id
+ |INSERT INTO TABLE $table
--- End diff --

why do we insert the same value to the table 2 times?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14930: [SPARK-16926] [SQL] Add unit test to compare table and p...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14930
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14930: [SPARK-16926] [SQL] Add unit test to compare table and p...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14930
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64822/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14930: [SPARK-16926] [SQL] Add unit test to compare table and p...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14930
  
**[Test build #64822 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64822/consoleFull)**
 for PR 14930 at commit 
[`7ac4d63`](https://github.com/apache/spark/commit/7ac4d63cafc854aa4a5c3622b668474da1255022).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14858: [SPARK-17219][ML] Add NaN value handling in Bucketizer

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14858
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64825/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14858: [SPARK-17219][ML] Add NaN value handling in Bucketizer

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14858
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14858: [SPARK-17219][ML] Add NaN value handling in Bucketizer

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14858
  
**[Test build #64825 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64825/consoleFull)**
 for PR 14858 at commit 
[`cc5a1e7`](https://github.com/apache/spark/commit/cc5a1e7342e2dbd43734a414a0ace0465eaecfe8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips charact...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14924
  
**[Test build #64827 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64827/consoleFull)**
 for PR 14924 at commit 
[`554f46c`](https://github.com/apache/spark/commit/554f46cb9cb3a8d5e9bc0be458f8e3ec5c78d809).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14900
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14900
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64820/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14900
  
**[Test build #64820 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64820/consoleFull)**
 for PR 14900 at commit 
[`c4ad6a1`](https://github.com/apache/spark/commit/c4ad6a1b813c5553e9c86c86ee234ffa43daff5e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14924: [SPARK-17299] TRIM/LTRIM/RTRIM should not strips charact...

2016-09-01 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/14924
  
@rxin Done ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14921: [SPARK-17361][SQL] createExternalTable should fail if pa...

2016-09-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14921
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14913: [SPARK-17358][SQL] Cached table(parquet/orc) shou...

2016-09-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14913#discussion_r77285873
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala
 ---
@@ -169,6 +169,13 @@ case class HadoopFsRelation(
 location.allFiles().map(_.getPath.toUri.toString).toArray
 
   override def sizeInBytes: Long = location.allFiles().map(_.getLen).sum
+
+  override def equals(other: Any): Boolean = other match {
+case r: HadoopFsRelation => location == r.location
--- End diff --

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14921: [SPARK-17361][SQL] createExternalTable should fail if pa...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14921
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64819/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14921: [SPARK-17361][SQL] createExternalTable should fail if pa...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14921
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14921: [SPARK-17361][SQL] createExternalTable should fail if pa...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14921
  
**[Test build #64819 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64819/consoleFull)**
 for PR 14921 at commit 
[`2533d65`](https://github.com/apache/spark/commit/2533d656e1fba7c814aaa248761508599150337f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14854
  
**[Test build #64826 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64826/consoleFull)**
 for PR 14854 at commit 
[`9865d32`](https://github.com/apache/spark/commit/9865d3207ec4db9d6832d240690cf9c47742361f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14929: [Don't merge][WIP] Better error message for JSON file fo...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14929
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14929: [Don't merge][WIP] Better error message for JSON file fo...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14929
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64821/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14929: [Don't merge][WIP] Better error message for JSON file fo...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14929
  
**[Test build #64821 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64821/consoleFull)**
 for PR 14929 at commit 
[`3491f15`](https://github.com/apache/spark/commit/3491f15f6050126904b05b8e5f5beb25c9d8f1a8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14930: [SPARK-16926] [SQL] Add unit test to compare tabl...

2016-09-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14930#discussion_r77285020
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala
 ---
@@ -143,4 +144,38 @@ class HiveTableScanSuite extends HiveComparisonTest 
with SQLTestUtils with TestH
   }
 }
   }
+
+  test("SPARK-16926: number of table and partition columns match for new 
partitioned table") {
+val view = "src"
+withTempView(view) {
+  spark.range(1, 5).createOrReplaceTempView(view)
--- End diff --

oh sorry I read the code wrong...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-09-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r77284957
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -184,4 +184,17 @@ abstract class ExternalCatalog {
 
   def listFunctions(db: String, pattern: String): Seq[String]
 
+  // 
--
+  // Resources
+  // 
--
+
+  /**
+   * Add a JAR resource to the underlying external catalog for DDL (e.g. 
CREATE TABLE) and DML
+   * (e.g., LOAD TABLE) operations.
+   *
+   * For example, when users create a Hive serde table, they can specify a 
custom
+   * Serializer-Deserializer (SerDe) class. When Hive metastore is unable 
to access the custom SerDe
+   * JAR (e.g., not on the Hive classpath), the JAR file must be added at 
runtime using this API.
+   */
+  def addJar(path: String): Unit
--- End diff --

I'm still a little hesitant to add this concept. After we consolidate hive 
and data source tables, do we still need it? data source resolution is done 
before we save table metadata to metastore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14929: [Don't merge][WIP] Better error message for JSON file fo...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14929
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64818/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14929: [Don't merge][WIP] Better error message for JSON file fo...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14929
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14929: [Don't merge][WIP] Better error message for JSON file fo...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14929
  
**[Test build #64818 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64818/consoleFull)**
 for PR 14929 at commit 
[`50e312f`](https://github.com/apache/spark/commit/50e312f414d34783f33ca890a697aa654c8e0847).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14866
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14866
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64814/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14866
  
**[Test build #64814 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64814/consoleFull)**
 for PR 14866 at commit 
[`703a554`](https://github.com/apache/spark/commit/703a554a342d91fba3040a769a8f7ff786569f0c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 676 matches

Mail list logo