[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-02-21 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/11601
  
Sent an update to add multi-column support. Let me know if this is not what 
you have in mind.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11601
  
**[Test build #73268 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73268/testReport)**
 for PR 11601 at commit 
[`e86d919`](https://github.com/apache/spark/commit/e86d9198c65c3b289b091150b52708deda37f090).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16832: [WIP][SPARK-19490][SQL] ignore case sensitivity when fil...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16832
  
**[Test build #73267 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73267/testReport)**
 for PR 16832 at commit 
[`f790821`](https://github.com/apache/spark/commit/f7908219735137fe5402201f6cad45fa279f3f5f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17001
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17001
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73263/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17001
  
**[Test build #73263 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73263/testReport)**
 for PR 17001 at commit 
[`96dcc7d`](https://github.com/apache/spark/commit/96dcc7ddb0de7c903f3fa8373ada317a760057d7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102399167
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -54,11 +57,29 @@ case class Statistics(
 
   /** Readable string representation for the Statistics. */
   def simpleString: String = {
-Seq(s"sizeInBytes=$sizeInBytes",
-  if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
+Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
+  if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = 
false)}" else "",
   s"isBroadcastable=$isBroadcastable"
 ).filter(_.nonEmpty).mkString(", ")
   }
+
+  /** Show the given number in a readable format. */
+  def format(number: BigInt, isSize: Boolean): String = {
+val decimalValue = BigDecimal(number, new MathContext(3, 
RoundingMode.HALF_UP))
+if (isSize) {
+  // The largest unit in Utils.bytesToString is TB
--- End diff --

yea, I also think TB is a little small


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102398658
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -282,7 +282,8 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
 if (statement == null) {
   null  // This is enough since ParseException will raise later.
 } else if (isExplainableStatement(statement)) {
-  ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = 
ctx.CODEGEN != null)
+  ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = 
ctx.CODEGEN != null,
+cost = ctx.COST != null)
--- End diff --

Can you give a clue on the style?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102398371
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -54,11 +57,29 @@ case class Statistics(
 
   /** Readable string representation for the Statistics. */
   def simpleString: String = {
-Seq(s"sizeInBytes=$sizeInBytes",
-  if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
+Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
+  if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = 
false)}" else "",
   s"isBroadcastable=$isBroadcastable"
 ).filter(_.nonEmpty).mkString(", ")
   }
+
+  /** Show the given number in a readable format. */
+  def format(number: BigInt, isSize: Boolean): String = {
+val decimalValue = BigDecimal(number, new MathContext(3, 
RoundingMode.HALF_UP))
+if (isSize) {
+  // The largest unit in Utils.bytesToString is TB
+  val PB = 1L << 50
+  if (number < 2 * PB) {
+// The number is not very large, so we can use Utils.bytesToString 
to show it.
+Utils.bytesToString(number.toLong)
+  } else {
+// The number is too large, show it in scientific notation.
+decimalValue.toString() + " B"
+  }
+} else {
+  decimalValue.toString()
--- End diff --

I'm not sure, will that be more readable than scientific notation if no 
unit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16476: [SPARK-19084][SQL] Implement expression field

2017-02-21 Thread gczsjdy
Github user gczsjdy commented on the issue:

https://github.com/apache/spark/pull/16476
  
@cloud-fan Could you please help me review this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16691: [SPARK-19349][DStreams]improve resource ready check to a...

2017-02-21 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16691
  
I don't like copying codes from Spark core. Maybe let's still keep using a 
dummy Spark job.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16936: [SPARK-19605][DStream] Fail it if existing resource is n...

2017-02-21 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16936
  
I agreed with @srowen. Streaming should not need to know the details about 
the mode, and when someone changes other modules, they won't know these codes 
inside streaming. I would like to see a cleaner solution.

In addition, when there are not enough resources in a cluster, even if the 
user sets a large core number, they will still not be able to run the streaming 
job. Hence, they should always be aware of this limitation of Streaming 
receivers, and it seems not worth to fix this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17017: [SPARK-19682][SparkR] Issue warning (or error) when subs...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17017
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17017: [SPARK-19682][SparkR] Issue warning (or error) when subs...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17017
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73264/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17017: [SPARK-19682][SparkR] Issue warning (or error) when subs...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17017
  
**[Test build #73264 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73264/testReport)**
 for PR 17017 at commit 
[`91204aa`](https://github.com/apache/spark/commit/91204aa36b7b9456a9b7f27d5eb97c1c92a61699).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16714: [SPARK-16333][Core] Enable EventLoggingListener t...

2017-02-21 Thread jasonmoore2k
Github user jasonmoore2k commented on a diff in the pull request:

https://github.com/apache/spark/pull/16714#discussion_r102396498
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -62,18 +62,21 @@ private[spark] object JsonProtocol {
* JSON serialization methods for SparkListenerEvents |
* -- */
 
-  def sparkEventToJson(event: SparkListenerEvent): JValue = {
+  def sparkEventToJson(
+event: SparkListenerEvent,
+omitInternalAccums: Boolean = false,
+omitUpdatedBlockStatuses: Boolean = false): JValue = {
 event match {
--- End diff --

stageSubmitted
stageCompleted
jobStart
jobEnd

You didn't seem to use omitInternalAccums/omitUpdatedBlockStatuses in these 
cases, although you had changed the underlying methods to support these flags.  
Intended?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/17001
  
great, I will take a look at it~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16714: [SPARK-16333][Core] Enable EventLoggingListener t...

2017-02-21 Thread jasonmoore2k
Github user jasonmoore2k commented on a diff in the pull request:

https://github.com/apache/spark/pull/16714#discussion_r102395516
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -97,61 +100,80 @@ private[spark] object JsonProtocol {
   case logStart: SparkListenerLogStart =>
 logStartToJson(logStart)
   case metricsUpdate: SparkListenerExecutorMetricsUpdate =>
-executorMetricsUpdateToJson(metricsUpdate)
+executorMetricsUpdateToJson(metricsUpdate, omitInternalAccums)
   case blockUpdated: SparkListenerBlockUpdated =>
 throw new MatchError(blockUpdated)  // TODO(ekl) implement this
   case _ => parse(mapper.writeValueAsString(event))
 }
   }
 
-  def stageSubmittedToJson(stageSubmitted: SparkListenerStageSubmitted): 
JValue = {
-val stageInfo = stageInfoToJson(stageSubmitted.stageInfo)
+  def stageSubmittedToJson(
+stageSubmitted: SparkListenerStageSubmitted,
+omitInternalAccums: Boolean = false): JValue = {
+val stageInfo = stageInfoToJson(stageSubmitted.stageInfo, 
omitInternalAccums)
 val properties = propertiesToJson(stageSubmitted.properties)
 ("Event" -> SPARK_LISTENER_EVENT_FORMATTED_CLASS_NAMES.stageSubmitted) 
~
 ("Stage Info" -> stageInfo) ~
 ("Properties" -> properties)
   }
 
-  def stageCompletedToJson(stageCompleted: SparkListenerStageCompleted): 
JValue = {
+  def stageCompletedToJson(
+stageCompleted: SparkListenerStageCompleted,
+omitInternalAccums: Boolean = false): JValue = {
 val stageInfo = stageInfoToJson(stageCompleted.stageInfo)
--- End diff --

Were you intending to pass omitInternalAccums into this stageInfoToJson 
call?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17012: [SPARK-19677][SS] Renaming a file atop an existing one s...

2017-02-21 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/17012
  
The fix looks good to me. But the test doesn't test anything. Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...

2017-02-21 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16910#discussion_r102395323
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1494,4 +1495,148 @@ class HiveDDLSuite
   }
 }
   }
+
+  test("insert data to a hive serde table which has a not existed location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a string, b int)
+ |USING hive
+ |OPTIONS(path "file:${dir.getCanonicalPath}")
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+val tableLocFile = new File(table.location.stripPrefix("file:"))
+tableLocFile.delete()
+assert(!tableLocFile.exists())
+spark.sql("INSERT INTO TABLE t SELECT 'c', 1")
+assert(tableLocFile.exists())
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+
+Utils.deleteRecursively(dir)
+assert(!tableLocFile.exists())
+spark.sql("INSERT OVERWRITE TABLE t SELECT 'c', 1")
+assert(tableLocFile.exists())
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t SET LOCATION 
'${newDirFile.getAbsolutePath}'")
+spark.sessionState.catalog.refreshTable(TableIdentifier("t"))
+
+val table1 = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+assert(table1.location.stripSuffix("/") == 
newDirFile.getAbsolutePath.stripSuffix("/"))
+assert(!newDirFile.exists())
+
+spark.sql("INSERT INTO TABLE t SELECT 'c', 1")
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+assert(newDirFile.exists())
+  }
+}
+  }
+
+  test("insert into a hive serde table with no existed partition location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a int, b int, c int, d int)
+ |USING hive
+ |PARTITIONED BY(a, b)
+ |LOCATION "file:${dir.getCanonicalPath}"
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+spark.sql("INSERT INTO TABLE t PARTITION(a=1, b=2) SELECT 3, 4")
+checkAnswer(spark.table("t"), Row(3, 4, 1, 2) :: Nil)
+
+val partLoc = new File(s"${dir.getAbsolutePath}/a=1")
+Utils.deleteRecursively(partLoc)
+assert(!partLoc.exists())
+// insert overwrite into a partition which location has been 
deleted.
+spark.sql("INSERT OVERWRITE TABLE t PARTITION(a=1, b=2) SELECT 7, 
8")
+assert(partLoc.exists())
+checkAnswer(spark.table("t"), Row(7, 8, 1, 2) :: Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t PARTITION(a=1, b=2) SET LOCATION " +
+  s"'${newDirFile.getAbsolutePath}'")
+assert(!newDirFile.exists())
+
+// insert into a partition which location does not exists.
+spark.sql("INSERT INTO TABLE t PARTITION(a=1, b=2) SELECT 9, 10")
+assert(newDirFile.exists())
+checkAnswer(spark.table("t"), Row(9, 10, 1, 2) :: Nil)
+  }
+}
+  }
+
+  test("read data from a hive serde table which has a not existed location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a string, b int)
+ |USING hive
+ |OPTIONS(path "file:${dir.getAbsolutePath}")
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+dir.delete()
+checkAnswer(spark.table("t"), Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t SET LOCATION 
'${newDirFile.getAbsolutePath}'")
+
+val table1 = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+assert(table1.location.stripSuffix("/") == 

[GitHub] spark issue #16714: [SPARK-16333][Core] Enable EventLoggingListener to log l...

2017-02-21 Thread jasonmoore2k
Github user jasonmoore2k commented on the issue:

https://github.com/apache/spark/pull/16714
  
Would some of the other recent contributors to this area (e.g. @zsxwing or 
@JoshRosen) be able to comment on any use for these internal accumulables / 
block status updates, and whether they can be removed from the event log?  I 
couldn't see anything that goes wrong, and my file went from 22 GB to 91 MB so 
it makes a big difference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...

2017-02-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16910#discussion_r102395141
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1494,4 +1495,148 @@ class HiveDDLSuite
   }
 }
   }
+
+  test("insert data to a hive serde table which has a not existed location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a string, b int)
+ |USING hive
+ |OPTIONS(path "file:${dir.getCanonicalPath}")
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+val tableLocFile = new File(table.location.stripPrefix("file:"))
+tableLocFile.delete()
+assert(!tableLocFile.exists())
+spark.sql("INSERT INTO TABLE t SELECT 'c', 1")
+assert(tableLocFile.exists())
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+
+Utils.deleteRecursively(dir)
+assert(!tableLocFile.exists())
+spark.sql("INSERT OVERWRITE TABLE t SELECT 'c', 1")
+assert(tableLocFile.exists())
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t SET LOCATION 
'${newDirFile.getAbsolutePath}'")
+spark.sessionState.catalog.refreshTable(TableIdentifier("t"))
+
+val table1 = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+assert(table1.location.stripSuffix("/") == 
newDirFile.getAbsolutePath.stripSuffix("/"))
+assert(!newDirFile.exists())
+
+spark.sql("INSERT INTO TABLE t SELECT 'c', 1")
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+assert(newDirFile.exists())
+  }
+}
+  }
+
+  test("insert into a hive serde table with no existed partition location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a int, b int, c int, d int)
+ |USING hive
+ |PARTITIONED BY(a, b)
+ |LOCATION "file:${dir.getCanonicalPath}"
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+spark.sql("INSERT INTO TABLE t PARTITION(a=1, b=2) SELECT 3, 4")
+checkAnswer(spark.table("t"), Row(3, 4, 1, 2) :: Nil)
+
+val partLoc = new File(s"${dir.getAbsolutePath}/a=1")
+Utils.deleteRecursively(partLoc)
+assert(!partLoc.exists())
+// insert overwrite into a partition which location has been 
deleted.
+spark.sql("INSERT OVERWRITE TABLE t PARTITION(a=1, b=2) SELECT 7, 
8")
+assert(partLoc.exists())
+checkAnswer(spark.table("t"), Row(7, 8, 1, 2) :: Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t PARTITION(a=1, b=2) SET LOCATION " +
+  s"'${newDirFile.getAbsolutePath}'")
+assert(!newDirFile.exists())
+
+// insert into a partition which location does not exists.
+spark.sql("INSERT INTO TABLE t PARTITION(a=1, b=2) SELECT 9, 10")
+assert(newDirFile.exists())
+checkAnswer(spark.table("t"), Row(9, 10, 1, 2) :: Nil)
+  }
+}
+  }
+
+  test("read data from a hive serde table which has a not existed location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a string, b int)
+ |USING hive
+ |OPTIONS(path "file:${dir.getAbsolutePath}")
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+dir.delete()
+checkAnswer(spark.table("t"), Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t SET LOCATION 
'${newDirFile.getAbsolutePath}'")
+
+val table1 = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+assert(table1.location.stripSuffix("/") == 

[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17001
  
Personally, I think we should improve the test case. Instead of doing it in 
HiveDDLSuite, we can do it in HiveSparkSubmitSuite.scala. Basically, when using 
the same metastore, you just need to verify whether the table location is 
dependent on `spark.sql.warehouse.dir`. Below is the test cases you can follow: 
https://github.com/apache/spark/pull/16388


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16910
  
**[Test build #73266 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73266/testReport)**
 for PR 16910 at commit 
[`119fa64`](https://github.com/apache/spark/commit/119fa64b42de98b8e242586e5e218bde907d5a54).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16158
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...

2017-02-21 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16910#discussion_r102394591
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1494,4 +1495,148 @@ class HiveDDLSuite
   }
 }
   }
+
+  test("insert data to a hive serde table which has a not existed location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a string, b int)
+ |USING hive
+ |OPTIONS(path "file:${dir.getCanonicalPath}")
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+val tableLocFile = new File(table.location.stripPrefix("file:"))
+tableLocFile.delete()
+assert(!tableLocFile.exists())
+spark.sql("INSERT INTO TABLE t SELECT 'c', 1")
+assert(tableLocFile.exists())
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+
+Utils.deleteRecursively(dir)
+assert(!tableLocFile.exists())
+spark.sql("INSERT OVERWRITE TABLE t SELECT 'c', 1")
+assert(tableLocFile.exists())
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t SET LOCATION 
'${newDirFile.getAbsolutePath}'")
+spark.sessionState.catalog.refreshTable(TableIdentifier("t"))
+
+val table1 = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+assert(table1.location.stripSuffix("/") == 
newDirFile.getAbsolutePath.stripSuffix("/"))
+assert(!newDirFile.exists())
+
+spark.sql("INSERT INTO TABLE t SELECT 'c', 1")
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+assert(newDirFile.exists())
+  }
+}
+  }
+
+  test("insert into a hive serde table with no existed partition location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a int, b int, c int, d int)
+ |USING hive
+ |PARTITIONED BY(a, b)
+ |LOCATION "file:${dir.getCanonicalPath}"
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+spark.sql("INSERT INTO TABLE t PARTITION(a=1, b=2) SELECT 3, 4")
+checkAnswer(spark.table("t"), Row(3, 4, 1, 2) :: Nil)
+
+val partLoc = new File(s"${dir.getAbsolutePath}/a=1")
+Utils.deleteRecursively(partLoc)
+assert(!partLoc.exists())
+// insert overwrite into a partition which location has been 
deleted.
+spark.sql("INSERT OVERWRITE TABLE t PARTITION(a=1, b=2) SELECT 7, 
8")
+assert(partLoc.exists())
+checkAnswer(spark.table("t"), Row(7, 8, 1, 2) :: Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t PARTITION(a=1, b=2) SET LOCATION " +
+  s"'${newDirFile.getAbsolutePath}'")
+assert(!newDirFile.exists())
+
+// insert into a partition which location does not exists.
+spark.sql("INSERT INTO TABLE t PARTITION(a=1, b=2) SELECT 9, 10")
+assert(newDirFile.exists())
+checkAnswer(spark.table("t"), Row(9, 10, 1, 2) :: Nil)
+  }
+}
+  }
+
+  test("read data from a hive serde table which has a not existed location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a string, b int)
+ |USING hive
+ |OPTIONS(path "file:${dir.getAbsolutePath}")
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+dir.delete()
+checkAnswer(spark.table("t"), Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t SET LOCATION 
'${newDirFile.getAbsolutePath}'")
+
+val table1 = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+assert(table1.location.stripSuffix("/") == 

[GitHub] spark pull request #16970: [SPARK-19497][SS]Implement streaming deduplicatio...

2017-02-21 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16970#discussion_r102394580
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala
 ---
@@ -321,3 +327,66 @@ case class MapGroupsWithStateExec(
   }
   }
 }
+
+
+/** Physical operator for executing streaming Deduplication. */
+case class DeduplicationExec(
+keyExpressions: Seq[Attribute],
+child: SparkPlan,
+stateId: Option[OperatorStateId] = None,
+eventTimeWatermark: Option[Long] = None)
+  extends UnaryExecNode with StateStoreWriter with WatermarkSupport {
+
+  /** Distribute by grouping attributes */
+  override def requiredChildDistribution: Seq[Distribution] =
+ClusteredDistribution(keyExpressions) :: Nil
+
+  override protected def doExecute(): RDD[InternalRow] = {
+metrics // force lazy init at driver
+
+child.execute().mapPartitionsWithStateStore(
+  getStateId.checkpointLocation,
+  operatorId = getStateId.operatorId,
+  storeVersion = getStateId.batchId,
+  keyExpressions.toStructType,
+  child.output.toStructType,
+  sqlContext.sessionState,
+  Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) =>
+  val getKey = GenerateUnsafeProjection.generate(keyExpressions, 
child.output)
+  val numOutputRows = longMetric("numOutputRows")
+  val numTotalStateRows = longMetric("numTotalStateRows")
+  val numUpdatedStateRows = longMetric("numUpdatedStateRows")
+
+
+  val baseIterator = watermarkPredicate match {
+case Some(predicate) => iter.filter((row: InternalRow) => 
!predicate.eval(row))
+case None => iter
+  }
+
+  while (baseIterator.hasNext) {
+val row = baseIterator.next().asInstanceOf[UnsafeRow]
+val key = getKey(row)
+val value = store.get(key)
+if (value.isEmpty) {
+  store.put(key.copy(), row.copy())
--- End diff --

Cool! Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16158
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73262/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16158
  
**[Test build #73262 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73262/testReport)**
 for PR 16158 at commit 
[`2a0af1d`](https://github.com/apache/spark/commit/2a0af1d43891c20a5a8a16a656261f8570cdc90a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73265 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73265/testReport)**
 for PR 16970 at commit 
[`7a7c0c7`](https://github.com/apache/spark/commit/7a7c0c781c236f8421304ab17403f7347eededcb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16970: [SPARK-19497][SS]Implement streaming deduplicatio...

2017-02-21 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16970#discussion_r102394378
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1996,7 +1996,7 @@ class Dataset[T] private[sql](
   def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17017: [SPARK-19682][SparkR] Issue warning (or error) when subs...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17017
  
**[Test build #73264 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73264/testReport)**
 for PR 17017 at commit 
[`91204aa`](https://github.com/apache/spark/commit/91204aa36b7b9456a9b7f27d5eb97c1c92a61699).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17017: [SPARK-19682][SparkR] Issue warning (or error) when subs...

2017-02-21 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17017
  
That's one weird test failure.
Looks good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16980: [SPARK-19617][SS]fix structured streaming restart bug

2017-02-21 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16980
  
The JIRA ID is not SPARK-19645?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17017: [SPARK-19682][SparkR] Issue warning (or error) when subs...

2017-02-21 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17017
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...

2017-02-21 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16910#discussion_r102393239
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -114,22 +114,30 @@ class HadoopTableReader(
 val tablePath = hiveTable.getPath
 val inputPathStr = applyFilterIfNeeded(tablePath, filterOpt)
 
-// logDebug("Table input: %s".format(tablePath))
-val ifc = hiveTable.getInputFormatClass
-  .asInstanceOf[java.lang.Class[InputFormat[Writable, Writable]]]
-val hadoopRDD = createHadoopRdd(tableDesc, inputPathStr, ifc)
-
-val attrsWithIndex = attributes.zipWithIndex
-val mutableRow = new SpecificInternalRow(attributes.map(_.dataType))
-
-val deserializedHadoopRDD = hadoopRDD.mapPartitions { iter =>
-  val hconf = broadcastedHadoopConf.value.value
-  val deserializer = deserializerClass.newInstance()
-  deserializer.initialize(hconf, tableDesc.getProperties)
-  HadoopTableReader.fillObject(iter, deserializer, attrsWithIndex, 
mutableRow, deserializer)
-}
+val locationPath = new Path(inputPathStr)
+val fs = 
locationPath.getFileSystem(sparkSession.sessionState.newHadoopConf())
--- End diff --

ok~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16980: [SPARK-19617][SS]fix structured streaming restart bug

2017-02-21 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16980
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...

2017-02-21 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17001#discussion_r102392451
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -311,11 +313,12 @@ private[hive] class HiveClientImpl(
   override def createDatabase(
   database: CatalogDatabase,
   ignoreIfExists: Boolean): Unit = withHiveState {
+// default database's location always use the warehouse path, here set 
to emtpy string
 client.createDatabase(
   new HiveDatabase(
 database.name,
 database.description,
-database.locationUri,
+if (database.name == SessionCatalog.DEFAULT_DATABASE) "" else 
database.locationUri,
--- End diff --

sorry, actually it will throw an exception, my local default has created, 
so it does not hit the exception, I will just replace the default database 
location when reload from metastore, drop the logic when create database set 
location to empty string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17001
  
**[Test build #73263 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73263/testReport)**
 for PR 17001 at commit 
[`96dcc7d`](https://github.com/apache/spark/commit/96dcc7ddb0de7c903f3fa8373ada317a760057d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-02-21 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16910
  
LGTM except a few minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16970: [SPARK-19497][SS]Implement streaming deduplicatio...

2017-02-21 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16970#discussion_r102391414
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1996,7 +1996,7 @@ class Dataset[T] private[sql](
   def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
 val resolver = sparkSession.sessionState.analyzer.resolver
 val allColumns = queryExecution.analyzed.output
-val groupCols = colNames.flatMap { colName =>
+val groupCols = colNames.toSet.toSeq.flatMap { (colName: String) =>
--- End diff --

The results will be same. It's just pretty weird that it depends on the 
optimizer to remove duplicated columns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...

2017-02-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16910#discussion_r102390808
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1494,4 +1495,148 @@ class HiveDDLSuite
   }
 }
   }
+
+  test("insert data to a hive serde table which has a not existed location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a string, b int)
+ |USING hive
+ |OPTIONS(path "file:${dir.getCanonicalPath}")
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+val tableLocFile = new File(table.location.stripPrefix("file:"))
+tableLocFile.delete()
+assert(!tableLocFile.exists())
+spark.sql("INSERT INTO TABLE t SELECT 'c', 1")
+assert(tableLocFile.exists())
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+
+Utils.deleteRecursively(dir)
+assert(!tableLocFile.exists())
+spark.sql("INSERT OVERWRITE TABLE t SELECT 'c', 1")
+assert(tableLocFile.exists())
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t SET LOCATION 
'${newDirFile.getAbsolutePath}'")
+spark.sessionState.catalog.refreshTable(TableIdentifier("t"))
+
+val table1 = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+assert(table1.location.stripSuffix("/") == 
newDirFile.getAbsolutePath.stripSuffix("/"))
+assert(!newDirFile.exists())
+
+spark.sql("INSERT INTO TABLE t SELECT 'c', 1")
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+assert(newDirFile.exists())
+  }
+}
+  }
+
+  test("insert into a hive serde table with no existed partition location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a int, b int, c int, d int)
+ |USING hive
+ |PARTITIONED BY(a, b)
+ |LOCATION "file:${dir.getCanonicalPath}"
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+spark.sql("INSERT INTO TABLE t PARTITION(a=1, b=2) SELECT 3, 4")
+checkAnswer(spark.table("t"), Row(3, 4, 1, 2) :: Nil)
+
+val partLoc = new File(s"${dir.getAbsolutePath}/a=1")
+Utils.deleteRecursively(partLoc)
+assert(!partLoc.exists())
+// insert overwrite into a partition which location has been 
deleted.
+spark.sql("INSERT OVERWRITE TABLE t PARTITION(a=1, b=2) SELECT 7, 
8")
+assert(partLoc.exists())
+checkAnswer(spark.table("t"), Row(7, 8, 1, 2) :: Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t PARTITION(a=1, b=2) SET LOCATION " +
+  s"'${newDirFile.getAbsolutePath}'")
--- End diff --

shorten it to a single line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17023: [SPARK-19695][SQL] Throw an exception if a `columnNameOf...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17023
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17023: [SPARK-19695][SQL] Throw an exception if a `columnNameOf...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17023
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73259/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/17001
  
[HIVE-1537](https://issues.apache.org/jira/browse/HIVE-1537) related jira 
[PR](https://github.com/apache/hive/commit/ea2fe2e41ddf91bce911d72b3576e5464f0d88a6#diff-6adfd789ed4f2c907b9813301d9debc5R158)
but it seems that is does not have related comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17023: [SPARK-19695][SQL] Throw an exception if a `columnNameOf...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17023
  
**[Test build #73259 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73259/testReport)**
 for PR 17023 at commit 
[`11c2850`](https://github.com/apache/spark/commit/11c2850ee053f251b34ce55cdd053e54ff9ab1cf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17013: [SPARK-19666][SQL] Skip a property without getter in Jav...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17013
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17013: [SPARK-19666][SQL] Skip a property without getter in Jav...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17013
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73258/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17013: [SPARK-19666][SQL] Skip a property without getter in Jav...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17013
  
**[Test build #73258 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73258/testReport)**
 for PR 17013 at commit 
[`ed686fa`](https://github.com/apache/spark/commit/ed686fae82fcd8984817615955ff5b2caf24ea08).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  public static class BeanWithoutGetter implements Serializable `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16972: [SPARK-19556][CORE][WIP] Broadcast data is not en...

2017-02-21 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16972#discussion_r102390214
  
--- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
@@ -73,17 +81,52 @@ private[spark] class DiskStore(conf: SparkConf, 
diskManager: DiskBlockManager) e
   }
 
   def putBytes(blockId: BlockId, bytes: ChunkedByteBuffer): Unit = {
+val bytesToStore = if (serializerManager.encryptionEnabled) {
+  try {
+val data = bytes.toByteBuffer
+val in = new ByteBufferInputStream(data, true)
+val byteBufOut = new ByteBufferOutputStream(data.remaining())
+val out = CryptoStreamUtils.createCryptoOutputStream(byteBufOut, 
conf,
+  serializerManager.encryptionKey.get)
+try {
+  ByteStreams.copy(in, out)
+} finally {
+  in.close()
+  out.close()
+}
+new ChunkedByteBuffer(byteBufOut.toByteBuffer)
+  } finally {
+bytes.dispose()
+  }
+} else {
+  bytes
+}
+
 put(blockId) { fileOutputStream =>
   val channel = fileOutputStream.getChannel
   Utils.tryWithSafeFinally {
-bytes.writeFully(channel)
+bytesToStore.writeFully(channel)
   } {
 channel.close()
   }
 }
   }
 
   def getBytes(blockId: BlockId): ChunkedByteBuffer = {
+val bytes = readBytes(blockId)
+
+val in = 
serializerManager.wrapForEncryption(bytes.toInputStream(dispose = true))
+new ChunkedByteBuffer(ByteBuffer.wrap(IOUtils.toByteArray(in)))
--- End diff --

@vanzin After take some to think about it, I find it may perplex the issue 
if we seperate `MemoryStore` with un-encrypted data and `DiskStore`with 
encrypted data. As get data from remote, we will encrypt data if it is stored 
in memory in un-encrypted style. Besides, when we 
`maybeCacheDiskBytesInMemory`, we will decrypt them again. I've thought about 
caching disk data into memory in encrypted style, and then decrypt them lazily 
when used. It makes things much complicated. Maybe, it is better to keep the 
original style, i.e. keep data encrypted (if can) in memory and disk. We should 
narrow this problem. Any suggesstion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...

2017-02-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16910#discussion_r102390174
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1494,4 +1495,148 @@ class HiveDDLSuite
   }
 }
   }
+
+  test("insert data to a hive serde table which has a not existed location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a string, b int)
+ |USING hive
+ |OPTIONS(path "file:${dir.getCanonicalPath}")
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+val tableLocFile = new File(table.location.stripPrefix("file:"))
+tableLocFile.delete()
+assert(!tableLocFile.exists())
+spark.sql("INSERT INTO TABLE t SELECT 'c', 1")
+assert(tableLocFile.exists())
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+
+Utils.deleteRecursively(dir)
+assert(!tableLocFile.exists())
+spark.sql("INSERT OVERWRITE TABLE t SELECT 'c', 1")
+assert(tableLocFile.exists())
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t SET LOCATION 
'${newDirFile.getAbsolutePath}'")
+spark.sessionState.catalog.refreshTable(TableIdentifier("t"))
+
+val table1 = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+assert(table1.location.stripSuffix("/") == 
newDirFile.getAbsolutePath.stripSuffix("/"))
+assert(!newDirFile.exists())
+
+spark.sql("INSERT INTO TABLE t SELECT 'c', 1")
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+assert(newDirFile.exists())
+  }
+}
+  }
+
+  test("insert into a hive serde table with no existed partition location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a int, b int, c int, d int)
+ |USING hive
+ |PARTITIONED BY(a, b)
+ |LOCATION "file:${dir.getCanonicalPath}"
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+spark.sql("INSERT INTO TABLE t PARTITION(a=1, b=2) SELECT 3, 4")
+checkAnswer(spark.table("t"), Row(3, 4, 1, 2) :: Nil)
+
+val partLoc = new File(s"${dir.getAbsolutePath}/a=1")
+Utils.deleteRecursively(partLoc)
+assert(!partLoc.exists())
+// insert overwrite into a partition which location has been 
deleted.
+spark.sql("INSERT OVERWRITE TABLE t PARTITION(a=1, b=2) SELECT 7, 
8")
+assert(partLoc.exists())
+checkAnswer(spark.table("t"), Row(7, 8, 1, 2) :: Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t PARTITION(a=1, b=2) SET LOCATION " +
+  s"'${newDirFile.getAbsolutePath}'")
+assert(!newDirFile.exists())
+
+// insert into a partition which location does not exists.
+spark.sql("INSERT INTO TABLE t PARTITION(a=1, b=2) SELECT 9, 10")
+assert(newDirFile.exists())
+checkAnswer(spark.table("t"), Row(9, 10, 1, 2) :: Nil)
+  }
+}
+  }
+
+  test("read data from a hive serde table which has a not existed location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a string, b int)
+ |USING hive
+ |OPTIONS(path "file:${dir.getAbsolutePath}")
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+dir.delete()
+checkAnswer(spark.table("t"), Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t SET LOCATION 
'${newDirFile.getAbsolutePath}'")
+
+val table1 = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+assert(table1.location.stripSuffix("/") == 

[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...

2017-02-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16910#discussion_r102389966
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1494,4 +1495,148 @@ class HiveDDLSuite
   }
 }
   }
+
+  test("insert data to a hive serde table which has a not existed location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a string, b int)
+ |USING hive
+ |OPTIONS(path "file:${dir.getCanonicalPath}")
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+val tableLocFile = new File(table.location.stripPrefix("file:"))
+tableLocFile.delete()
+assert(!tableLocFile.exists())
+spark.sql("INSERT INTO TABLE t SELECT 'c', 1")
+assert(tableLocFile.exists())
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+
+Utils.deleteRecursively(dir)
+assert(!tableLocFile.exists())
+spark.sql("INSERT OVERWRITE TABLE t SELECT 'c', 1")
+assert(tableLocFile.exists())
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t SET LOCATION 
'${newDirFile.getAbsolutePath}'")
+spark.sessionState.catalog.refreshTable(TableIdentifier("t"))
+
+val table1 = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+assert(table1.location.stripSuffix("/") == 
newDirFile.getAbsolutePath.stripSuffix("/"))
+assert(!newDirFile.exists())
+
+spark.sql("INSERT INTO TABLE t SELECT 'c', 1")
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+assert(newDirFile.exists())
+  }
+}
+  }
+
+  test("insert into a hive serde table with no existed partition location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a int, b int, c int, d int)
+ |USING hive
+ |PARTITIONED BY(a, b)
+ |LOCATION "file:${dir.getCanonicalPath}"
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+spark.sql("INSERT INTO TABLE t PARTITION(a=1, b=2) SELECT 3, 4")
+checkAnswer(spark.table("t"), Row(3, 4, 1, 2) :: Nil)
+
+val partLoc = new File(s"${dir.getAbsolutePath}/a=1")
+Utils.deleteRecursively(partLoc)
+assert(!partLoc.exists())
+// insert overwrite into a partition which location has been 
deleted.
+spark.sql("INSERT OVERWRITE TABLE t PARTITION(a=1, b=2) SELECT 7, 
8")
+assert(partLoc.exists())
+checkAnswer(spark.table("t"), Row(7, 8, 1, 2) :: Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t PARTITION(a=1, b=2) SET LOCATION " +
+  s"'${newDirFile.getAbsolutePath}'")
+assert(!newDirFile.exists())
+
+// insert into a partition which location does not exists.
+spark.sql("INSERT INTO TABLE t PARTITION(a=1, b=2) SELECT 9, 10")
+assert(newDirFile.exists())
+checkAnswer(spark.table("t"), Row(9, 10, 1, 2) :: Nil)
+  }
+}
+  }
+
+  test("read data from a hive serde table which has a not existed location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a string, b int)
+ |USING hive
+ |OPTIONS(path "file:${dir.getAbsolutePath}")
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+dir.delete()
+checkAnswer(spark.table("t"), Nil)
+
+val newDirFile = new File(dir, "x")
+spark.sql(s"ALTER TABLE t SET LOCATION 
'${newDirFile.getAbsolutePath}'")
+
+val table1 = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+assert(table1.location.stripSuffix("/") == 

[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...

2017-02-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16910#discussion_r102389843
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -114,22 +114,30 @@ class HadoopTableReader(
 val tablePath = hiveTable.getPath
 val inputPathStr = applyFilterIfNeeded(tablePath, filterOpt)
 
-// logDebug("Table input: %s".format(tablePath))
-val ifc = hiveTable.getInputFormatClass
-  .asInstanceOf[java.lang.Class[InputFormat[Writable, Writable]]]
-val hadoopRDD = createHadoopRdd(tableDesc, inputPathStr, ifc)
-
-val attrsWithIndex = attributes.zipWithIndex
-val mutableRow = new SpecificInternalRow(attributes.map(_.dataType))
-
-val deserializedHadoopRDD = hadoopRDD.mapPartitions { iter =>
-  val hconf = broadcastedHadoopConf.value.value
-  val deserializer = deserializerClass.newInstance()
-  deserializer.initialize(hconf, tableDesc.getProperties)
-  HadoopTableReader.fillObject(iter, deserializer, attrsWithIndex, 
mutableRow, deserializer)
-}
+val locationPath = new Path(inputPathStr)
+val fs = 
locationPath.getFileSystem(sparkSession.sessionState.newHadoopConf())
--- End diff --

How about replacing `sparkSession.sessionState.newHadoopConf()` by 
`broadcastedHadoopConf.value.value`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17013: [SPARK-19666][SQL] Skip a property without getter in Jav...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17013
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17013: [SPARK-19666][SQL] Skip a property without getter in Jav...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17013
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73256/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17013: [SPARK-19666][SQL] Skip a property without getter in Jav...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17013
  
**[Test build #73256 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73256/testReport)**
 for PR 17013 at commit 
[`91cee26`](https://github.com/apache/spark/commit/91cee264e936421e33ecfb0cd5d3c3b4a474d4f2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16158
  
**[Test build #73262 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73262/testReport)**
 for PR 16158 at commit 
[`2a0af1d`](https://github.com/apache/spark/commit/2a0af1d43891c20a5a8a16a656261f8570cdc90a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17014: [SPARK-18608][ML][WIP] Fix double-caching in ML algorith...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17014
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73253/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17014: [SPARK-18608][ML][WIP] Fix double-caching in ML algorith...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17014
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17014: [SPARK-18608][ML][WIP] Fix double-caching in ML algorith...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17014
  
**[Test build #73253 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73253/testReport)**
 for PR 17014 at commit 
[`a3f3bb6`](https://github.com/apache/spark/commit/a3f3bb66acbbd670369eb9939f73a2968ecc7649).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17001
  
> In Hive2.0.0, it is allowed to create a table in default database which 
shared between clusters 

Are you able to find the specific Hive JIRA for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...

2017-02-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17001#discussion_r102387112
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -311,11 +313,12 @@ private[hive] class HiveClientImpl(
   override def createDatabase(
   database: CatalogDatabase,
   ignoreIfExists: Boolean): Unit = withHiveState {
+// default database's location always use the warehouse path, here set 
to emtpy string
 client.createDatabase(
   new HiveDatabase(
 database.name,
 database.description,
-database.locationUri,
+if (database.name == SessionCatalog.DEFAULT_DATABASE) "" else 
database.locationUri,
--- End diff --

If it is empty, metastore will set it for us, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17013: [SPARK-19666][SQL] Skip a property without getter in Jav...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17013
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17013: [SPARK-19666][SQL] Skip a property without getter in Jav...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17013
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73257/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17013: [SPARK-19666][SQL] Skip a property without getter in Jav...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17013
  
**[Test build #73257 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73257/testReport)**
 for PR 17013 at commit 
[`5808d71`](https://github.com/apache/spark/commit/5808d71c5284ce9fcaaaffb7ada6d91e34e0b29e).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17024: [SPARK-19525][CORE] Compressing checkpoints.

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17024
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17024: [SPARK-19525][CORE] Compressing checkpoints.

2017-02-21 Thread aramesh117
GitHub user aramesh117 opened a pull request:

https://github.com/apache/spark/pull/17024

[SPARK-19525][CORE] Compressing checkpoints.

Spark's performance improves greatly if we enable compression of
checkpoints.

## What changes were proposed in this pull request?

- Compress each partition before writing to persistent file system.
- Decompress each partition before reading from persistent file system.
- Default behavior should be to not compress.
- Add logging for checkpoint durations for A/B testing with and without 
compression enabled.

## How was this patch tested?

This was tested using existing unit tests for backwards compatibility and 
with new tests for this functionality. It has also been used in our production 
system for almost a year.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aramesh117/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17024.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17024


commit 7837b0c6052fa20bd1a6cf823947e95379d6d3b8
Author: Aaditya Ramesh 
Date:   2017-02-22T05:05:48Z

[SPARK-19525][CORE] Compressing checkpoints.

Spark's performance improves greatly if we enable compression of
checkpoints.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-21 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16938
  
@cloud-fan 
situation 2. CREATE TABLE ...(PARTITIONED BY ...) LOCATION path AS SELECT 
...
is different for `path exists`, which is this PR going to resolve. It is ok 
to make it consist with hive with HiveExternalCatalog in spark?

situation 3. CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ...
is also different for default warehouse table `path exists`, do you mean 
that the parquet action is expected that throw an already exist exception, and 
hive should make consist with it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17020: [SPARK-19693][SQL] Make the SET mapreduce.job.reduces au...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17020
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17001
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17001
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73261/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17020: [SPARK-19693][SQL] Make the SET mapreduce.job.reduces au...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17020
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73254/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17001
  
**[Test build #73261 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73261/testReport)**
 for PR 17001 at commit 
[`3f6e061`](https://github.com/apache/spark/commit/3f6e06195412bad45d84114df68c729d7b0fa237).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17020: [SPARK-19693][SQL] Make the SET mapreduce.job.reduces au...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17020
  
**[Test build #73254 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73254/testReport)**
 for PR 17020 at commit 
[`7948466`](https://github.com/apache/spark/commit/79484664490cb24ae3cf51667758902edfb6b896).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17022: Aqp 271

2017-02-21 Thread ahshahid
Github user ahshahid closed the pull request at:

https://github.com/apache/spark/pull/17022


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17001
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73260/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17001
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17001
  
**[Test build #73260 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73260/testReport)**
 for PR 17001 at commit 
[`bacd528`](https://github.com/apache/spark/commit/bacd528749ef26b99bf813081bef480ab4c24f97).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-21 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16938
  
@tejasapatil 

* throw exception is the result of the test, It is really happened in 
current spark master branch

* Hive CTAS not support for partition table 
[hive-doc](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS))

* `default warehouse table path exist` means that we already create a table 
path under warehouse path, before we create the table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17022: Aqp 271

2017-02-21 Thread ahshahid
Github user ahshahid commented on the issue:

https://github.com/apache/spark/pull/17022
  
Sorry I used the wrong base to generate pull request. I will close this. I
will later push an appropriate branch, as the bug does exist in master

On 21 Feb 2017 21:00, "Hyukjin Kwon"  wrote:

> @ahshahid , it seems mistakenly open. Could
> you close this please?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17001
  
**[Test build #73261 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73261/testReport)**
 for PR 17001 at commit 
[`3f6e061`](https://github.com/apache/spark/commit/3f6e06195412bad45d84114df68c729d7b0fa237).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-02-21 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16910
  
@gatorsmile I have fixed the review above~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...

2017-02-21 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16996
  
@yhuai could you help to review this? thanks~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17022: Aqp 271

2017-02-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17022
  
@ahshahid, it seems mistakenly open. Could you close this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/17001
  
Agreed, I process the logic in create/get database in HiveClientImpl


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17001
  
**[Test build #73260 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73260/testReport)**
 for PR 17001 at commit 
[`bacd528`](https://github.com/apache/spark/commit/bacd528749ef26b99bf813081bef480ab4c24f97).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2017-02-21 Thread jisookim0513
Github user jisookim0513 commented on the issue:

https://github.com/apache/spark/pull/12436
  
@sitalkedia have you had a chance to work on this issue and open a new PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17021: [SPARK-19694][ML] Add missing 'setTopicDistributionCol' ...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17021
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17021: [SPARK-19694][ML] Add missing 'setTopicDistributionCol' ...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17021
  
**[Test build #73255 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73255/testReport)**
 for PR 17021 at commit 
[`367a681`](https://github.com/apache/spark/commit/367a681ffa457f906a1f54cced4b8f8219e2f888).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17021: [SPARK-19694][ML] Add missing 'setTopicDistributionCol' ...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17021
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73255/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16744
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73248/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16744
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16744
  
**[Test build #73248 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73248/testReport)**
 for PR 16744 at commit 
[`da18da0`](https://github.com/apache/spark/commit/da18da0d98d1f4e433480de8df6f6c34b1e0fb39).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16928: [SPARK-18699][SQL] Put malformed tokens into a new field...

2017-02-21 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16928
  
@HyukjinKwon @cloud-fan okay, all tests passed. Also, I made a pr to fix 
the json behaviour #17023.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16928: [SPARK-18699][SQL] Put malformed tokens into a new field...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16928
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16928: [SPARK-18699][SQL] Put malformed tokens into a new field...

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16928
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73251/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16928: [SPARK-18699][SQL] Put malformed tokens into a new field...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16928
  
**[Test build #73251 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73251/testReport)**
 for PR 16928 at commit 
[`619094a`](https://github.com/apache/spark/commit/619094a4dbb0e400daac0d94905b40df07b650b6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16970: [SPARK-19497][SS]Implement streaming deduplicatio...

2017-02-21 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16970#discussion_r102381390
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1996,7 +1996,7 @@ class Dataset[T] private[sql](
   def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
--- End diff --

You have to add more documentation for streaming usage! especially you have 
to document that this will keep all past data as intermediate state, and you 
can use the `withWatermark` to limit how late the duplicate data can be and 
system will accordingly limit the state.

Also, double the docs on withWatermark and make sure its consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17023: [SPARK-19695][SQL] Throw an exception if a `columnNameOf...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17023
  
**[Test build #73259 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73259/testReport)**
 for PR 17023 at commit 
[`11c2850`](https://github.com/apache/spark/commit/11c2850ee053f251b34ce55cdd053e54ff9ab1cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17022: Aqp 271

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17022
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >