date:20170315

[GitHub] spark issue #17308: [SPARK-19968][SS] Use a cached instance of `KafkaProduce...

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17308
  
**[Test build #74644 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74644/testReport)**
 for PR 17308 at commit 
[`febf387`](https://github.com/apache/spark/commit/febf3874cf07bad04e574b571f1caa839c9c28b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17291: [SPARK-19949][SQL][WIP] unify bad record handling in CSV...

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17291
  
**[Test build #74645 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74645/testReport)**
 for PR 17291 at commit 
[`23c1c3e`](https://github.com/apache/spark/commit/23c1c3e01b64879e5889d6d08c8f824283574574).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17308: [SPARK-19968][SS] Use a cached instance of `Kafka...

2017-03-15 Thread ScrapCodes

GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/17308

[SPARK-19968][SS] Use a cached instance of `KafkaProducer` instead of 
creating one every batch.

## What changes were proposed in this pull request?
Changes include a new API for doing cleanup of resources in KafkaSink is 
added to Sink trait.

In summary, cost of recreating a KafkaProducer for writing every batch is 
high as it starts a lot threads and make connections and then closes them. A 
KafkaProducer instance is promised to be thread safe in Kafka docs. Reuse of 
KafkaProducer instance while writing via multiple threads is encouraged.

Furthermore, I have performance improvement of 10x in latency, with this 
patch.

TODO: post exact results.

## How was this patch tested?
Running distributed benchmarks comparing runs with this patch and without 
it.
Added relevant unit tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark cached-kafka-producer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17308


commit febf3874cf07bad04e574b571f1caa839c9c28b7
Author: Prashant Sharma 
Date:   2017-03-15T11:03:45Z

[SPARK-19968][SS] Use a cached instance of KafkaProducer instead of 
creating one every batch.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17270: [SPARK-19929] [SQL] Showing Hive Managed table's LOATION...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17270
  
Yeah. you need to close the PR by yourself. We are unable to close it. 
thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16626
  
Could you add a scenario when users add a column name that already exists 
in the table schema?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17287
  
**[Test build #74643 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74643/testReport)**
 for PR 17287 at commit 
[`d82e8ed`](https://github.com/apache/spark/commit/d82e8eda4eed494604b131f1448fd93be3c1e33a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r106342233
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
---
@@ -71,7 +71,6 @@ class JDBCSuite extends SparkFunSuite
 conn.prepareStatement("insert into test.people values ('mary', 
2)").executeUpdate()
 conn.prepareStatement(
   "insert into test.people values ('joe ''foo'' \"bar\"', 
3)").executeUpdate()
-conn.commit()
--- End diff --

Why?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r106342123
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1860,4 +1860,72 @@ class HiveDDLSuite
   }
 }
   }
+
+  Seq("PARQUET", "ORC", "TEXTFILE", "SEQUENCEFILE", "RCFILE", 
"AVRO").foreach { tableType =>
+test(s"alter hive serde table add columns -- partitioned - 
$tableType") {
+  withTable("alter_add_partitioned") {
--- End diff --

The name is confusing. Let us just simplify it to `tab`. We already can 
know the scenario by the test case name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...

2017-03-15 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17287#discussion_r106342082
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 ---
@@ -1270,6 +1376,7 @@ class SessionCatalogSuite extends PlanTest {
   }
 
   assert(cause.getMessage.contains("Undefined function: 
'undefined_fn'"))
+  catalog.reset()
--- End diff --

yes, you are right, let me add a try catch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r106341966
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1860,4 +1860,72 @@ class HiveDDLSuite
   }
 }
   }
+
+  Seq("PARQUET", "ORC", "TEXTFILE", "SEQUENCEFILE", "RCFILE", 
"AVRO").foreach { tableType =>
--- End diff --

If the list is complete, we can create a variable and reuse it in the 
future test cases in `HiveCatalogedDDLSuite `. Let us create it now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17287
  
**[Test build #74642 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74642/testReport)**
 for PR 17287 at commit 
[`25da5f6`](https://github.com/apache/spark/commit/25da5f6bfe99e1bf81856a353e7d572a8594a759).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r106341499
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -175,6 +178,78 @@ case class AlterTableRenameCommand(
 }
 
 /**
+ * A command that add columns to a table
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   ALTER TABLE table_identifier
+ *   ADD COLUMNS (col_name data_type [COMMENT col_comment], ...);
+ * }}}
+*/
+case class AlterTableAddColumnsCommand(
+table: TableIdentifier,
+columns: Seq[StructField]) extends RunnableCommand {
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val catalogTable = verifyAlterTableAddColumn(catalog, table)
+
+// If an exception is thrown here we can just assume the table is 
uncached;
+// this can happen with Hive tables when the underlying catalog is 
in-memory.
+val wasCached = 
Try(sparkSession.catalog.isCached(table.unquotedString)).getOrElse(false)
+if (wasCached) {
+  try {
+sparkSession.catalog.uncacheTable(table.unquotedString)
+  } catch {
+case NonFatal(e) => log.warn(e.toString, e)
+  }
+}
--- End diff --

No need to check if it is cached or not. Just uncache it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17242: [SPARK-19902][SQL] Support more expression canonicalizat...

2017-03-15 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17242
  
anyway, I will move it to optimizer in next update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17286: [SPARK-19915][SQL] Exclude cartesian product cand...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17286#discussion_r106341181
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -696,6 +696,13 @@ object SQLConf {
   .intConf
   .createWithDefault(12)
 
+  val JOIN_REORDER_CARD_WEIGHT =
+buildConf("spark.sql.cbo.joinReorder.card.weight")
+  .doc("The weight of cardinality (number of rows) for plan cost 
comparison in join reorder: " +
+"rows * weight + size * (1 - weight).")
+  .doubleConf
+  .createWithDefault(0.7)
--- End diff --

What is boundary of this? adding `check`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17287#discussion_r106340900
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 ---
@@ -1270,6 +1376,7 @@ class SessionCatalogSuite extends PlanTest {
   }
 
   assert(cause.getMessage.contains("Undefined function: 
'undefined_fn'"))
+  catalog.reset()
--- End diff --

Then, this `reset()` could be skipped if hitting an exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-03-15 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/17084
  
Thanks for the PR. I think this is helpful. Will take a look next week. 
Quite swamped recently. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-15 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16867
  
@kayousterhout 
Thanks a lot for the comments :) very helpful. 
I've refined, please take another look when you have time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16722: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2017-03-15 Thread imatiach-msft

Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/16722
  
@jkbradley might you be able to take a look at the changes from @sethah ?  
Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11780: [SPARK-8884][MLlib] 1-sample Anderson-Darling Goo...

2017-03-15 Thread hhbyyh

Github user hhbyyh closed the pull request at:

https://github.com/apache/spark/pull/11780


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11780: [SPARK-8884][MLlib] 1-sample Anderson-Darling Goodness-o...

2017-03-15 Thread hhbyyh

Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/11780
  
Close this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-15 Thread jinxing64

Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16867#discussion_r106340513
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
@@ -893,6 +893,7 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 val taskSet = FakeTask.createTaskSet(4)
 // Set the speculation multiplier to be 0 so speculative tasks are 
launched immediately
 sc.conf.set("spark.speculation.multiplier", "0.0")
+sc.conf.set("spark.speculation", "true")
--- End diff --

This should be set. Because the duration is inserted to `MedianHeap` only 
when `spark.speculation`(e.g. If I remove this, `MedianHeap` will be empty when 
call `checkSpeculatableTasks`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-15 Thread jinxing64

Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16867#discussion_r106340321
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -172,7 +172,7 @@ private[spark] class TaskSchedulerImpl 
private[scheduler](
 
 if (!isLocal && conf.getBoolean("spark.speculation", false)) {
   logInfo("Starting speculative execution thread")
-  speculationScheduler.scheduleAtFixedRate(new Runnable {
+  speculationScheduler.scheduleWithFixedDelay(new Runnable {
--- End diff --

I was thinking `checkSpeculatableTasks` will synchronize 
`TaskSchedulerImpl`. If `checkSpeculatableTasks` doesn't finish with 100ms, 
then the possibility exists for that thread to release and then immediately 
re-acquire the lock. Should this be included in this pr? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17287#discussion_r106340219
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 ---
@@ -999,257 +1083,279 @@ class SessionCatalogSuite extends PlanTest {
   expectedParts: CatalogTablePartition*): Boolean = {
 // ExternalCatalog may set a default location for partitions, here we 
ignore the partition
 // location when comparing them.
-actualParts.map(p => p.copy(storage = p.storage.copy(locationUri = 
None))).toSet ==
-  expectedParts.map(p => p.copy(storage = p.storage.copy(locationUri = 
None))).toSet
+val actualPartsNormalize = actualParts.map(p =>
--- End diff --

You need to leave a comment to explain it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2017-03-15 Thread imatiach-msft

Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/17123
  
@crackcell I'm not sure about changing the UDF to be on a row instead of a 
column, I've found that the serialization costs are much higher and the spark 
code performs much less.  Maybe an expert like @cloud-fan can comment more 
here?  Can you keep the UDF on a column instead of a row?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2017-03-15 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17123#discussion_r106339981
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 
---
@@ -171,34 +173,34 @@ object Bucketizer extends 
DefaultParamsReadable[Bucketizer] {
* Binary searching in several buckets to place each data point.
* @param splits array of split points
* @param feature data point
-   * @param keepInvalid NaN flag.
-   *Set "true" to make an extra bucket for NaN values;
-   *Set "false" to report an error for NaN values
+   * @param keepInvalid NaN/NULL flag.
+   *Set "true" to make an extra bucket for NaN/NULL 
values;
+   *Set "false" to report an error for NaN/NULL values
* @return bucket for each data point
* @throws SparkException if a feature is < splits.head or > splits.last
*/
 
   private[feature] def binarySearchForBuckets(
   splits: Array[Double],
-  feature: Double,
+  feature: Option[Double],
   keepInvalid: Boolean): Double = {
-if (feature.isNaN) {
+if (feature.getOrElse(Double.NaN).isNaN) {
--- End diff --

I think you can equivalently write this as:
if (feature.isEmpty) { 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17251
  
**[Test build #74641 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74641/testReport)**
 for PR 17251 at commit 
[`c951084`](https://github.com/apache/spark/commit/c9510847c8eeb5f5da3b63c38ac835d1c3491815).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2017-03-15 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17123#discussion_r106339731
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 
---
@@ -105,20 +106,21 @@ final class Bucketizer @Since("1.4.0") 
(@Since("1.4.0") override val uid: String
 transformSchema(dataset.schema)
 val (filteredDataset, keepInvalid) = {
   if (getHandleInvalid == Bucketizer.SKIP_INVALID) {
-// "skip" NaN option is set, will filter out NaN values in the 
dataset
+// "skip" NaN/NULL option is set, will filter out NaN/NULL values 
in the dataset
 (dataset.na.drop().toDF(), false)
   } else {
 (dataset.toDF(), getHandleInvalid == Bucketizer.KEEP_INVALID)
   }
 }
 
-val bucketizer: UserDefinedFunction = udf { (feature: Double) =>
+val bucketizer: UserDefinedFunction = udf { (row: Row) =>
--- End diff --

I believe you should try to avoid using a udf on a row because the 
serialization costs will be more expensive... hmm how could we make this 
perform well and handle nulls?  Does it work with Option[Double] instead of Row?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...

2017-03-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17251
  
Thank you, @cloud-fan ! I updated the PR according to the review comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16867
  
**[Test build #74640 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74640/testReport)**
 for PR 16867 at commit 
[`104e867`](https://github.com/apache/spark/commit/104e86773d9e688e35a2273ce71379e8d03b9f81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-15 Thread hhbyyh

Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/17130
  
Refined some comments and minor things. This should be ready for review. 
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16209
  
@sureshthalamati https://github.com/apache/spark/pull/17171 has been 
resolved. Can you update your PR by allowing users to specify the schema in DDL 
format?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17085: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-03-15 Thread imatiach-msft

Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/17085
  
ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang 
@srowen could you please take a look?  thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17251: [SPARK-19910][SQL] `stack` should not reject NULL...

2017-03-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/17251#discussion_r106338768
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -590,6 +591,23 @@ object TypeCoercion {
   }
 
   /**
+   * Coerces NullTypes of a Stack function to the corresponding column 
types.
+   */
+  object StackCoercion extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
+  case s @ Stack(children) if s.childrenResolved && 
s.children.head.dataType == IntegerType &&
+  s.children.head.foldable =>
+val schema = s.elementSchema
+Stack(children.zipWithIndex.map {
+  case (e, 0) => e
+  case (Literal(null, NullType), index: Int) =>
+Literal.create(null, schema.fields((index - 1) % 
schema.length).dataType)
--- End diff --

Yep.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-03-15 Thread imatiach-msft

Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/17086
  
ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang 
@srowen could you please take a look?  thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-03-15 Thread imatiach-msft

Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/17084
  
ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang 
@srowen could you please take a look?  thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17289: [SPARK-19948] Document that saveAsTable uses catalog as ...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17289
  
This is the design decision we need to make here. 

Spark SQL is kind of a federation system. Two write APIs behave 
differently. The `saveAsTable` API expects users to register it in the **global 
catalog** before usage. The `save` API skips the global catalog and relies on 
the connectors to communicate with the **local catalog**. The users might not 
realize the difference.

```Scala
df.write.format("xyz").mode(SaveMode.ErrorIfExists)
  .saveAsTable("j1")
```

```Scala
df.write.format("xyz").mode(SaveMode.ErrorIfExists)
  .save()
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17286: [SPARK-19915][SQL] Exclude cartesian product cand...

2017-03-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17286#discussion_r106338345
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
@@ -128,38 +128,43 @@ case class CostBasedJoinReorder(conf: CatalystConf) 
extends Rule[LogicalPlan] wi
 object JoinReorderDP extends PredicateHelper {
 
   def search(
-  conf: CatalystConf,
+  conf: SQLConf,
   items: Seq[LogicalPlan],
   conditions: Set[Expression],
   topOutput: AttributeSet): Option[LogicalPlan] = {
 
 // Level i maintains all found plans for i + 1 items.
 // Create the initial plans: each plan is a single item with zero cost.
-val itemIndex = items.zipWithIndex
+val itemIndex = items.zipWithIndex.map(_.swap).toMap
 val foundPlans = mutable.Buffer[JoinPlanMap](itemIndex.map {
-  case (item, id) => Set(id) -> JoinPlan(Set(id), item, Set(), Cost(0, 
0))
-}.toMap)
+  case (id, item) => Set(id) -> JoinPlan(Set(id), item, Set(), Cost(0, 
0))
+})
 
-for (lev <- 1 until items.length) {
+// Build plans for next levels until the last level has only one plan. 
This plan contains
+// all items that can be joined, so there's no need to continue.
+while (foundPlans.size < items.length && foundPlans.last.size > 1) {
   // Build plans for the next level.
   foundPlans += searchLevel(foundPlans, conf, conditions, topOutput)
 }
 
-val plansLastLevel = foundPlans(items.length - 1)
-if (plansLastLevel.isEmpty) {
-  // Failed to find a plan, fall back to the original plan
-  None
-} else {
-  // There must be only one plan at the last level, which contains all 
items.
-  assert(plansLastLevel.size == 1 && plansLastLevel.head._1.size == 
items.length)
-  Some(plansLastLevel.head._2.plan)
+// Find the best plan
+assert(foundPlans.last.size <= 1)
--- End diff --

how about
```
while (foundPlans.size < items.length && foundPlans.last.size > 0)
```
When we end the while loop, either we have reached the level n, or the 
current level has 0 entries. Then we pick the last level which has non-zero 
entries, and pick the best entry from this level, and construct the final join 
plan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17307: [SPARK-13369] Make number of consecutive fetch fa...

2017-03-15 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/17307#discussion_r106337950
  
--- Diff: docs/configuration.md ---
@@ -1506,6 +1506,11 @@ Apart from these, the following properties are also 
available, and may be useful
 of this setting is to act as a safety-net to prevent runaway 
uncancellable tasks from rendering
 an executor unusable.
   
+  spark.stage.maxConsecutiveAttempts
+  4
+  
+Number of consecutive stage retries allowed before a stage is aborted.
--- End diff --

Hah sorry for all of the comment changes from the combination of Imran and 
me!! But I agree that this was an issue before and would be good to update.  
Thanks for the many updates here @sitalkedia.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17297: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17297
  
**[Test build #74631 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74631/testReport)**
 for PR 17297 at commit 
[`901c9bf`](https://github.com/apache/spark/commit/901c9bf55247f0489519d976ca9729e5babbd292).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17297: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17297
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74631/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17297: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17297
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16476: [SPARK-19084][SQL] Implement expression field

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16476
  
**[Test build #74639 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74639/testReport)**
 for PR 16476 at commit 
[`4e60b7c`](https://github.com/apache/spark/commit/4e60b7c52c0ca9e20296256607ce78741d80cea3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17251: [SPARK-19910][SQL] `stack` should not reject NULL...

2017-03-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/17251#discussion_r106337249
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -156,9 +156,21 @@ case class Stack(children: Seq[Expression]) extends 
Generator {
 }
   }
 
+  private def findDataType(column: Integer): DataType = {
--- End diff --

Right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17251: [SPARK-19910][SQL] `stack` should not reject NULL...

2017-03-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/17251#discussion_r106337141
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -590,6 +591,23 @@ object TypeCoercion {
   }
 
   /**
+   * Coerces NullTypes of a Stack function to the corresponding column 
types.
+   */
+  object StackCoercion extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
+  case s @ Stack(children) if s.childrenResolved && 
s.children.head.dataType == IntegerType &&
--- End diff --

Yep.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...

2017-03-15 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17287#discussion_r106336735
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 ---
@@ -76,468 +102,500 @@ class SessionCatalogSuite extends PlanTest {
   }
 
   test("create databases using invalid names") {
-val catalog = new SessionCatalog(newEmptyCatalog())
-testInvalidName(name => catalog.createDatabase(newDb(name), 
ignoreIfExists = true))
+withEmptyCatalog { catalog =>
+  testInvalidName(
+name => catalog.createDatabase(newDb(name), ignoreIfExists = true))
+}
   }
 
   test("get database when a database exists") {
-val catalog = new SessionCatalog(newBasicCatalog())
-val db1 = catalog.getDatabaseMetadata("db1")
-assert(db1.name == "db1")
-assert(db1.description.contains("db1"))
+withBasicCatalog { catalog =>
+  val db1 = catalog.getDatabaseMetadata("db1")
+  assert(db1.name == "db1")
+  assert(db1.description.contains("db1"))
+}
   }
 
   test("get database should throw exception when the database does not 
exist") {
-val catalog = new SessionCatalog(newBasicCatalog())
-intercept[NoSuchDatabaseException] {
-  catalog.getDatabaseMetadata("db_that_does_not_exist")
+withBasicCatalog { catalog =>
+  intercept[NoSuchDatabaseException] {
+catalog.getDatabaseMetadata("db_that_does_not_exist")
+  }
 }
   }
 
   test("list databases without pattern") {
-val catalog = new SessionCatalog(newBasicCatalog())
-assert(catalog.listDatabases().toSet == Set("default", "db1", "db2", 
"db3"))
+withBasicCatalog { catalog =>
+  assert(catalog.listDatabases().toSet == Set("default", "db1", "db2", 
"db3"))
+}
   }
 
   test("list databases with pattern") {
-val catalog = new SessionCatalog(newBasicCatalog())
-assert(catalog.listDatabases("db").toSet == Set.empty)
-assert(catalog.listDatabases("db*").toSet == Set("db1", "db2", "db3"))
-assert(catalog.listDatabases("*1").toSet == Set("db1"))
-assert(catalog.listDatabases("db2").toSet == Set("db2"))
+withBasicCatalog { catalog =>
+  assert(catalog.listDatabases("db").toSet == Set.empty)
+  assert(catalog.listDatabases("db*").toSet == Set("db1", "db2", 
"db3"))
+  assert(catalog.listDatabases("*1").toSet == Set("db1"))
+  assert(catalog.listDatabases("db2").toSet == Set("db2"))
+}
   }
 
   test("drop database") {
-val catalog = new SessionCatalog(newBasicCatalog())
-catalog.dropDatabase("db1", ignoreIfNotExists = false, cascade = false)
-assert(catalog.listDatabases().toSet == Set("default", "db2", "db3"))
+withBasicCatalog { catalog =>
+  catalog.dropDatabase("db1", ignoreIfNotExists = false, cascade = 
false)
+  assert(catalog.listDatabases().toSet == Set("default", "db2", "db3"))
+}
   }
 
   test("drop database when the database is not empty") {
 // Throw exception if there are functions left
-val externalCatalog1 = newBasicCatalog()
-val sessionCatalog1 = new SessionCatalog(externalCatalog1)
-externalCatalog1.dropTable("db2", "tbl1", ignoreIfNotExists = false, 
purge = false)
-externalCatalog1.dropTable("db2", "tbl2", ignoreIfNotExists = false, 
purge = false)
-intercept[AnalysisException] {
-  sessionCatalog1.dropDatabase("db2", ignoreIfNotExists = false, 
cascade = false)
+withBasicCatalog { catalog =>
+  catalog.externalCatalog.dropTable("db2", "tbl1", ignoreIfNotExists = 
false, purge = false)
+  catalog.externalCatalog.dropTable("db2", "tbl2", ignoreIfNotExists = 
false, purge = false)
+  intercept[AnalysisException] {
+catalog.dropDatabase("db2", ignoreIfNotExists = false, cascade = 
false)
+  }
 }
-
-// Throw exception if there are tables left
-val externalCatalog2 = newBasicCatalog()
-val sessionCatalog2 = new SessionCatalog(externalCatalog2)
-externalCatalog2.dropFunction("db2", "func1")
-intercept[AnalysisException] {
-  sessionCatalog2.dropDatabase("db2", ignoreIfNotExists = false, 
cascade = false)
+withBasicCatalog { catalog =>
+  // Throw exception if there are tables left
+  catalog.externalCatalog.dropFunction("db2", "func1")
+  intercept[AnalysisException] {
+catalog.dropDatabase("db2", ignoreIfNotExists = false, cascade = 
false)
+  }
 }
 
-// When cascade is true, it should drop them
-val externalCatalog3 = newBasicCatalog()
-val sessionCatalog3 = new SessionCatalog(externalCatalog3)

[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17287
  
**[Test build #74638 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74638/testReport)**
 for PR 17287 at commit 
[`4214379`](https://github.com/apache/spark/commit/421437951df5d3bb551dc62428bbd3c23cd94f4e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17287
  
**[Test build #74637 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74637/testReport)**
 for PR 17287 at commit 
[`80df8c7`](https://github.com/apache/spark/commit/80df8c74fc2280d9ca3d9fa2c6a624c6970ed6da).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16981: [SPARK-19637][SQL] Add to_json in FunctionRegistry

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16981
  
cc @maropu https://github.com/apache/spark/pull/17171 is merged. Are you 
interested in working on `from_json`?

JIRA: https://issues.apache.org/jira/browse/SPARK-19967


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...

2017-03-15 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17287#discussion_r106336045
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 ---
@@ -999,257 +1083,279 @@ class SessionCatalogSuite extends PlanTest {
   expectedParts: CatalogTablePartition*): Boolean = {
 // ExternalCatalog may set a default location for partitions, here we 
ignore the partition
 // location when comparing them.
-actualParts.map(p => p.copy(storage = p.storage.copy(locationUri = 
None))).toSet ==
-  expectedParts.map(p => p.copy(storage = p.storage.copy(locationUri = 
None))).toSet
+val actualPartsNormalize = actualParts.map(p =>
+  p.copy(parameters = Map.empty, storage = p.storage.copy(
+properties = Map.empty, locationUri = None, serde = None))).toSet
+
+val expectedPartsNormalize = expectedParts.map(p =>
+p.copy(parameters = Map.empty, storage = p.storage.copy(
+  properties = Map.empty, locationUri = None, serde = None))).toSet
+
+actualPartsNormalize == expectedPartsNormalize
+//actualParts.map(p =>
+//  p.copy(storage = p.storage.copy(
+//properties = Map.empty, locationUri = None))).toSet ==
+//  expectedParts.map(p =>
+//p.copy(storage = p.storage.copy(properties = Map.empty, 
locationUri = None))).toSet
--- End diff --

sorry, let me remove it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...

2017-03-15 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17287#discussion_r106335967
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 ---
@@ -1270,6 +1376,7 @@ class SessionCatalogSuite extends PlanTest {
   }
 
   assert(cause.getMessage.contains("Undefined function: 
'undefined_fn'"))
+  catalog.reset()
--- End diff --

here the `SessionCatalog` is instanced with different `conf` parameter.
In `withBasicCatalog`, it just leave it default


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17088: [SPARK-19753][CORE] Un-register all shuffle outpu...

2017-03-15 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/17088#discussion_r106335824
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1365,19 +1369,27 @@ class DAGScheduler(
*/
   private[scheduler] def handleExecutorLost(
   execId: String,
-  filesLost: Boolean,
+  fileLost: Boolean,
+  hostLost: Boolean = false,
+  maybeHost: Option[String] = None,
--- End diff --

I find this method pretty confusing now, but it was also confusing before, 
and I'm not sure how to clean it up yet.  one minor thing: instead of having a 
`hostLost` and `maybeHost`, could there be a `hostToDeregisterAllShuffleOutput: 
Option[String]`, and you replace `if (hostLost) {...}` with 
`hostToDeregisterAllShuffleOutput.foreach{...}` etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17088: [SPARK-19753][CORE] Un-register all shuffle outpu...

2017-03-15 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/17088#discussion_r106330358
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1331,7 +1328,14 @@ class DAGScheduler(
 
   // TODO: mark the executor as failed only if there were lots of 
fetch failures on it
   if (bmAddress != null) {
-handleExecutorLost(bmAddress.executorId, filesLost = true, 
Some(task.epoch))
+if (!env.blockManager.externalShuffleServiceEnabled) {
--- End diff --

I think these two cases are reversed, aren't they?

Its a bit harder to keep straight with a negation in there, rather than 
switch the bodies, I'd just change it to `if 
(env.blockManager.externalShuffleServiceEnabled)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17088: [SPARK-19753][CORE] Un-register all shuffle outpu...

2017-03-15 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/17088#discussion_r106335566
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -394,6 +394,32 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with Timeou
 assertDataStructuresEmpty()
   }
 
+  test("All shuffle files should on the slave should be cleaned up when 
slave lost") {
+// reset the test context with the right shuffle service config
+afterEach()
+val conf = new SparkConf()
+conf.set("spark.shuffle.service.enabled", "true")
+init(conf)
+runEvent(ExecutorAdded("exec-hostA1", "hostA"))
+runEvent(ExecutorAdded("exec-hostA2", "hostA"))
+runEvent(ExecutorAdded("exec-hostB", "hostB"))
+val shuffleMapRdd = new MyRDD(sc, 3, Nil)
+val shuffleDep = new ShuffleDependency(shuffleMapRdd, new 
HashPartitioner(1))
+val shuffleId = shuffleDep.shuffleId
+val reduceRdd = new MyRDD(sc, 1, List(shuffleDep), tracker = 
mapOutputTracker)
+submit(reduceRdd, Array(0))
+complete(taskSets(0), Seq(
+  (Success, makeMapStatus("hostA", 1)),
+  (Success, makeMapStatus("hostA", 1)),
+  (Success, makeMapStatus("hostB", 1
+scheduler.handleExecutorLost("exec-hostA1", fileLost = false, hostLost 
= true, Some("hostA"))
+runEvent(ExecutorLost("exec-hostA1", SlaveLost("", true)))
+val mapStatus = mapOutputTracker.mapStatuses.get(0).get.filter(_!= 
null)
--- End diff --

I think there are a couple of problems with this test.
* you are trying to change the behavior on a fetch failure, so really you 
should have tasks completing with a `FetchFailed`
* `makeMapStatus` is actually doing the wrong thing in this case, since its 
expecting executor ids to be "exec-$host", but you've got a "1" or "2" appended 
to some of them

I think this is better:

```scala
submit(reduceRdd, Array(0))
// map stage completes successfully, with one task on each executor
complete(taskSets(0), Seq(
  (Success,
MapStatus(BlockManagerId("exec-hostA1", "hostA", 12345), 
Array.fill[Long](1)(2))),
  (Success,
MapStatus(BlockManagerId("exec-hostA2", "hostA", 12345), 
Array.fill[Long](1)(2))),
  (Success, makeMapStatus("hostB", 1))
))
// make sure our test setup is correct
val initialMapStatus = mapOutputTracker.mapStatuses.get(0).get
assert(initialMapStatus.count(_ != null) === 3)
assert(initialMapStatus.map{_.location.executorId}.toSet ===
  Set("exec-hostA1", "exec-hostA2", "exec-hostB"))
// reduce stage fails with a fetch failure from one host
complete(taskSets(1), Seq(
  (FetchFailed(BlockManagerId("exec-hostA2", "hostA", 12345), 
shuffleId, 0, 0, "ignored"),
null)
))
// Here is the main assertion -- make sure that we de-register the map 
output from both executors on hostA
val mapStatus = mapOutputTracker.mapStatuses.get(0).get
assert(mapStatus.count(_ != null) === 1)
assert(mapStatus(2).location.executorId === "exec-hostB")
assert(mapStatus(2).location.host === "hostB")
```

this version fails until you reverse the if / else I pointed out in the 
dagscheduler.

it would also be nice if this included map output from multiple stages 
registered on the given host, so you could check that *all* output is 
deregistered, not just the one shuffleId which had an error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...

2017-03-15 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17287#discussion_r106335778
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 ---
@@ -999,257 +1083,279 @@ class SessionCatalogSuite extends PlanTest {
   expectedParts: CatalogTablePartition*): Boolean = {
 // ExternalCatalog may set a default location for partitions, here we 
ignore the partition
 // location when comparing them.
-actualParts.map(p => p.copy(storage = p.storage.copy(locationUri = 
None))).toSet ==
-  expectedParts.map(p => p.copy(storage = p.storage.copy(locationUri = 
None))).toSet
+val actualPartsNormalize = actualParts.map(p =>
--- End diff --

Yes, it is~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17171: [SPARK-19830] [SQL] Add parseTableSchema API to P...

2017-03-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17171


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17171: [SPARK-19830] [SQL] Add parseTableSchema API to ParserIn...

2017-03-15 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17171
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17289: [SPARK-19948] Document that saveAsTable uses catalog as ...

2017-03-15 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17289
  
It's pretty weird that we need to check if table exists in spark catalog 
and then check if data exists in data source, by the same save mode specified 
by users. I think the new behavior is more reasonable, or we should ask users 
to provide 2 save modes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16971#discussion_r106335040
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 ---
@@ -245,7 +245,7 @@ object ApproximatePercentile {
 val result = new Array[Double](percentages.length)
 var i = 0
 while (i < percentages.length) {
-  result(i) = summaries.query(percentages(i))
+  result(i) = summaries.query(percentages(i)).get
--- End diff --

Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16971
  
ping @zhengruifeng  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-03-15 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r106334932
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +343,105 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types. When the 
parameters have different
+ * types, comparing will be done based on type firstly. For example, 
''999'' 's type is StringType,
+ * while 999's type is IntegerType, so that no further comparison need to 
be done since they have
+ * different types.
+ * If the search expression is NULL, the return value is 0 because NULL 
fails equality comparison
+ * with any value.
+ * To also point out, no implicit cast will be done in this expression.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+// scalastyle:on line.size.limit
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

If we try to cast all types to `DoubleType`, the test case for all 
`StringType` will fail.
While if we try to cast all types to `StringType`, parameters of '3' and 
'3.0' won't be determined as equal.
I have a solution to balance, we look at the 1st parameter, if it's of 
`NumericType`, we implicitly cast all parameters to `DoubleType`, else we cast 
all parameters to `StringType`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17287#discussion_r106334939
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 ---
@@ -999,257 +1083,279 @@ class SessionCatalogSuite extends PlanTest {
   expectedParts: CatalogTablePartition*): Boolean = {
 // ExternalCatalog may set a default location for partitions, here we 
ignore the partition
 // location when comparing them.
-actualParts.map(p => p.copy(storage = p.storage.copy(locationUri = 
None))).toSet ==
-  expectedParts.map(p => p.copy(storage = p.storage.copy(locationUri = 
None))).toSet
+val actualPartsNormalize = actualParts.map(p =>
--- End diff --

Because Hive metastore fills the values after we calling the Hive APIs? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17171: [SPARK-19830] [SQL] Add parseTableSchema API to ParserIn...

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17171
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74635/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17171: [SPARK-19830] [SQL] Add parseTableSchema API to ParserIn...

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17171
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17171: [SPARK-19830] [SQL] Add parseTableSchema API to ParserIn...

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17171
  
**[Test build #74635 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74635/testReport)**
 for PR 17171 at commit 
[`b18ae84`](https://github.com/apache/spark/commit/b18ae84c1f0485d929e58d217c1881d037721881).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17287#discussion_r106334827
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 ---
@@ -999,257 +1094,279 @@ class SessionCatalogSuite extends PlanTest {
   expectedParts: CatalogTablePartition*): Boolean = {
 // ExternalCatalog may set a default location for partitions, here we 
ignore the partition
 // location when comparing them.
-actualParts.map(p => p.copy(storage = p.storage.copy(locationUri = 
None))).toSet ==
-  expectedParts.map(p => p.copy(storage = p.storage.copy(locationUri = 
None))).toSet
+val actualPartsNormalize = actualParts.map(p =>
--- End diff --

Because Hive metastore fills the values after we call the Hive APIs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17287#discussion_r106334626
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 ---
@@ -999,257 +1083,279 @@ class SessionCatalogSuite extends PlanTest {
   expectedParts: CatalogTablePartition*): Boolean = {
 // ExternalCatalog may set a default location for partitions, here we 
ignore the partition
 // location when comparing them.
-actualParts.map(p => p.copy(storage = p.storage.copy(locationUri = 
None))).toSet ==
-  expectedParts.map(p => p.copy(storage = p.storage.copy(locationUri = 
None))).toSet
+val actualPartsNormalize = actualParts.map(p =>
+  p.copy(parameters = Map.empty, storage = p.storage.copy(
+properties = Map.empty, locationUri = None, serde = None))).toSet
+
+val expectedPartsNormalize = expectedParts.map(p =>
+p.copy(parameters = Map.empty, storage = p.storage.copy(
+  properties = Map.empty, locationUri = None, serde = None))).toSet
+
+actualPartsNormalize == expectedPartsNormalize
+//actualParts.map(p =>
+//  p.copy(storage = p.storage.copy(
+//properties = Map.empty, locationUri = None))).toSet ==
+//  expectedParts.map(p =>
+//p.copy(storage = p.storage.copy(properties = Map.empty, 
locationUri = None))).toSet
--- End diff --

?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17287#discussion_r106334485
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 ---
@@ -1270,6 +1376,7 @@ class SessionCatalogSuite extends PlanTest {
   }
 
   assert(cause.getMessage.contains("Undefined function: 
'undefined_fn'"))
+  catalog.reset()
--- End diff --

Instead of adding `reset`, why not using your new function 
`withBasicCatalog`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-03-15 Thread gczsjdy

Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r106334083
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +343,105 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of expr in (expr1, expr2, ...) list 
or 0 if not found.
+ * It takes at least 2 parameters, and all parameters should be subtype of 
AtomicType or NullType.
+ * It's also acceptable to give parameters of different types. When the 
parameters have different
+ * types, comparing will be done based on type firstly. For example, 
''999'' 's type is StringType,
+ * while 999's type is IntegerType, so that no further comparison need to 
be done since they have
+ * different types.
+ * If the search expression is NULL, the return value is 0 because NULL 
fails equality comparison
+ * with any value.
+ * To also point out, no implicit cast will be done in this expression.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in 
the expr1, expr2, ... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  > SELECT _FUNC_('a', 'b', 'c', 'd', 'a');
+   4
+  > SELECT _FUNC_('999', 'a', 999, 9.99, '999');
+   4
+  """)
+// scalastyle:on line.size.limit
+case class Field(children: Seq[Expression]) extends Expression {
+
+  /** Even if expr is not found in (expr1, expr2, ...) list, the value 
will be 0, not null */
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  private val dataTypeMatchIndex: Array[Int] = 
children.zipWithIndex.tail.filter(
+_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

I met a problem, if I first try to cast all parameters to DoubleType in 
`TypeCoercion`(as described in 2nd paragraph last comment). I should really 
cast in order to know it can success or not, but that will 'execute' in 
analysis stage, that seems not right. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17287#discussion_r106333835
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalSessionCatalogSuite.scala
 ---
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import org.apache.spark.sql.catalyst.catalog.{CatalogTestUtils, 
ExternalCatalog, SessionCatalogSuite}
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+
+class HiveExternalSessionCatalogSuite extends SessionCatalogSuite with 
TestHiveSingleton {
+
+  protected override val isHiveExternalCatalog = true
+
+  private val externalCatalog = {
+val catalog = spark.sharedState.externalCatalog
+catalog.asInstanceOf[HiveExternalCatalog].client.reset()
+catalog
+  }
+
+  protected val utils = new CatalogTestUtils {
+override val tableInputFormat: String = 
"org.apache.hadoop.mapred.SequenceFileInputFormat"
+override val tableOutputFormat: String =
+  "org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat"
+override val defaultProvider: String = "parquet"
--- End diff --

The above input and output formats does not match what you specified here. 
Let us change it to `hive`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16626
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16626
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74634/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16626
  
**[Test build #74634 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74634/testReport)**
 for PR 16626 at commit 
[`7fbfc71`](https://github.com/apache/spark/commit/7fbfc7165e3bce388d4dc6e2c58487d4abf8d098).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17289: [SPARK-19948] Document that saveAsTable uses catalog as ...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17289
  
```Scala
  test("saveAsTable API with SaveMode.Overwrite") {
val df = spark.createDataFrame(sparkContext.parallelize(arr1x2), 
schema2)
spark.read.jdbc(url1, "test.people", properties).show()

df.write.format("jdbc").mode(SaveMode.ErrorIfExists)
  .option("url", url1)
  .option("dbtable", "test.people")
  .options(properties.asScala)
  .saveAsTable("j1")
spark.read.jdbc(url1, "test.people", properties).show()
  }
```

This is a test case I used. Previously, we respected the user-specified 
mode `SaveMode.ErrorIfExists`. Now, we are not sending the[ mode to the 
_createRelation_ API 
](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L181).
 It might be an unexpected behavior change to the external data source 
connector.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17307
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17307
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74633/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17307
  
**[Test build #74633 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74633/testReport)**
 for PR 17307 at commit 
[`ffd6bde`](https://github.com/apache/spark/commit/ffd6bdeb543556d5e7f448c888ff4f00b5ba152d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17242: [SPARK-19902][SQL] Support more expression canonicalizat...

2017-03-15 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17242
  
hmm, so you don't think canonicalizer should use this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17242: [SPARK-19902][SQL] Support more expression canonicalizat...

2017-03-15 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17242
  
not "integration", but "move". I think this logic belongs to optimizer 
instead of canonicalizer


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...

2017-03-15 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/16905
  
reopened https://issues.apache.org/jira/browse/SPARK-7420 for the failure

Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17289: [SPARK-19948] Document that saveAsTable uses catalog as ...

2017-03-15 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17289
  
it's probably ok as we turn an error case runnable. But we should document 
this in release notes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17251: [SPARK-19910][SQL] `stack` should not reject NULL...

2017-03-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17251#discussion_r106329490
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -590,6 +591,23 @@ object TypeCoercion {
   }
 
   /**
+   * Coerces NullTypes of a Stack function to the corresponding column 
types.
+   */
+  object StackCoercion extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
+  case s @ Stack(children) if s.childrenResolved && 
s.children.head.dataType == IntegerType &&
+  s.children.head.foldable =>
+val schema = s.elementSchema
+Stack(children.zipWithIndex.map {
+  case (e, 0) => e
+  case (Literal(null, NullType), index: Int) =>
+Literal.create(null, schema.fields((index - 1) % 
schema.length).dataType)
--- End diff --

we can call `findDataType((index - 1) % s.numFields)` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17289: [SPARK-19948] Document that saveAsTable uses catalog as ...

2017-03-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17289
  
We introduced a behavior change in Spark 2.2. In Spark 2.1, we reported an 
error if the underlying JDBC table exists. We changed [the mode to 
`SaveMode.Overwrite`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala#L159)
 if the table does not exist in the catalog.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17242: [SPARK-19902][SQL] Support more expression canonicalizat...

2017-03-15 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17242
  
ping @cloud-fan Except for the optimization integration, do you have more 
comments on this change? Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17307: [SPARK-13369] Make number of consecutive fetch fa...

2017-03-15 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/17307#discussion_r106328854
  
--- Diff: docs/configuration.md ---
@@ -1506,6 +1506,11 @@ Apart from these, the following properties are also 
available, and may be useful
 of this setting is to act as a safety-net to prevent runaway 
uncancellable tasks from rendering
 an executor unusable.
   
+  spark.stage.maxConsecutiveAttempts
+  4
+  
+Number of consecutive stage retries allowed before a stage is aborted.
--- End diff --

there is a off-by-one difference between "attempts" and "retries" -- eg. if 
this is set to 1, do you allow one retry, or do you give up after one attempt?  
I realize this is super minor but I remember dealing with confusion about this 
for task failures.  I don't think which one we use matters a ton, but the 
implementation here is "attempt", so how about just rewording the doc to 
"Number of consecutive stage attempts ...".

Same goes for the comments which use "retries".

(I see now this was wording was my fault, from making one suggestion in one 
place, and another elsewhere, sorry about that.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17130
  
**[Test build #74636 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74636/testReport)**
 for PR 17130 at commit 
[`de1bfc8`](https://github.com/apache/spark/commit/de1bfc8eb48015ea629dea5bdc72ba913b76d234).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17130
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74636/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17130
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17251: [SPARK-19910][SQL] `stack` should not reject NULL...

2017-03-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17251#discussion_r106328813
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -156,9 +156,21 @@ case class Stack(children: Seq[Expression]) extends 
Generator {
 }
   }
 
+  private def findDataType(column: Integer): DataType = {
--- End diff --

`column: Int`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17251: [SPARK-19910][SQL] `stack` should not reject NULL...

2017-03-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17251#discussion_r106328573
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -590,6 +591,23 @@ object TypeCoercion {
   }
 
   /**
+   * Coerces NullTypes of a Stack function to the corresponding column 
types.
+   */
+  object StackCoercion extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
+  case s @ Stack(children) if s.childrenResolved && 
s.children.head.dataType == IntegerType &&
--- End diff --

we can put `s.children.head.dataType == IntegerType && 
s.children.head.foldable` in `Stack` as a method


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17171: [SPARK-19830] [SQL] Add parseTableSchema API to ParserIn...

2017-03-15 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17171
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-15 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17295
  
> The penalty comes when transferring that encrypted data from disk. If the
data ends up in memory again, it is as efficient as before; but if the
evicted block needs to be transferred directly to a remote executor, then
there's now a performance penalty, since the code now uses a custom
FileRegion implementation to decrypt the data before transferring.

What's the actual difference? previously we transfer encrypted data?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15363: [SPARK-17791][SQL] Join reordering using star schema det...

2017-03-15 Thread ioana-delaney

Github user ioana-delaney commented on the issue:

https://github.com/apache/spark/pull/15363
  
@gatorsmile @hvanhovell @wzhfy @ron8hu Please let me know if we can move 
forward with this review. @wzhfy  I removed the star-join call from 
CostBasedJoinReorder until the two are integrated. Because of that, our 
previous discussion is now hidden. Please take a look at my comment and let me 
know if you agree. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16905
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74629/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16905
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16905
  
**[Test build #74629 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74629/testReport)**
 for PR 16905 at commit 
[`479c01d`](https://github.com/apache/spark/commit/479c01d43de71d03b3276cdd59f12083e7da31c9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FakeSchedulerBackend extends SchedulerBackend `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17307
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17307
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74624/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...

2017-03-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17307
  
**[Test build #74624 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74624/testReport)**
 for PR 17307 at commit 
[`88800fa`](https://github.com/apache/spark/commit/88800fa933f2b036a4e58ec744e3e57482d5095b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17307
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74625/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...

2017-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17307
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 554 matches

Mail list logo