date:20160831

[GitHub] spark issue #14910: [SPARK-17271] [SQL] Remove redundant `semanticEquals()` ...

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14910
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14905: [SPARK-17318][Tests]Fix ReplSuite replicating blocks of ...

2016-08-31 Thread ericl

Github user ericl commented on the issue:

https://github.com/apache/spark/pull/14905
  
Ah, too bad then. Lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r77117696
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -184,4 +184,9 @@ abstract class ExternalCatalog {
 
   def listFunctions(db: String, pattern: String): Seq[String]
 
+  // 
--
+  // Resources
+  // 
--
+
+  def addJar(path: String): Unit
--- End diff --

this also implies that `InMemoryCatalog` can't work if users specify a 
custom SerDe class in CREATE TABLE. Considering this, should we throw exception 
in `InMemoryCatalog.addJar`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14911: [SPARK-17355] Workaround for HIVE-14684 / HiveResultSetM...

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14911
  
**[Test build #64761 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64761/consoleFull)**
 for PR 14911 at commit 
[`6b56880`](https://github.com/apache/spark/commit/6b56880aa78a599fdf255d3668a848d9ad09691b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r77117555
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -184,4 +184,9 @@ abstract class ExternalCatalog {
 
   def listFunctions(db: String, pattern: String): Seq[String]
 
+  // 
--
+  // Resources
+  // 
--
+
+  def addJar(path: String): Unit
--- End diff --

LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionState to...

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14883
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract...

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14864#discussion_r77117501
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -156,24 +156,56 @@ case class FileSourceScanExec(
 false
   }
 
-  override val outputPartitioning: Partitioning = {
+  override val (outputPartitioning, outputOrdering): (Partitioning, 
Seq[SortOrder]) = {
 val bucketSpec = if 
(relation.sparkSession.sessionState.conf.bucketingEnabled) {
   relation.bucketSpec
 } else {
   None
 }
-bucketSpec.map { spec =>
-  val numBuckets = spec.numBuckets
-  val bucketColumns = spec.bucketColumnNames.flatMap { n =>
-output.find(_.name == n)
-  }
-  if (bucketColumns.size == spec.bucketColumnNames.size) {
-HashPartitioning(bucketColumns, numBuckets)
-  } else {
-UnknownPartitioning(0)
-  }
-}.getOrElse {
-  UnknownPartitioning(0)
+bucketSpec match {
+  case Some(spec) =>
+val numBuckets = spec.numBuckets
+val bucketColumns = spec.bucketColumnNames.flatMap { n =>
+  output.find(_.name == n)
+}
+if (bucketColumns.size == spec.bucketColumnNames.size) {
+  val partitioning = HashPartitioning(bucketColumns, numBuckets)
+
+  val sortOrder = if (spec.sortColumnNames.nonEmpty) {
+// In case of bucketing, its possible to have multiple files 
belonging to the
+// same bucket in a given relation. Each of these files are 
locally sorted
+// but those files combined together are not globally sorted. 
Given that,
+// the RDD partition will not be sorted even if the relation 
has sort columns set
+// Current solution is to check if all the buckets have a 
single file in it
+
+val files =
+  
relation.location.listFiles(partitionFilters).flatMap(partition => 
partition.files)
+val bucketToFilesGrouping =
+  files.map(_.getPath.getName).groupBy(file => 
BucketingUtils.getBucketId(file))
+val singleFilePartitions = bucketToFilesGrouping.forall(p => 
p._2.length <= 1)
--- End diff --

listing files and grouping by bucket id can be expensive, if there are a 
lot of files. What's worse, we will do it again in `createBucketedReadRDD`.

Instead of doing this, I'd like to fix the sorting problem for bucketed 
table first, then we don't need to scan file names to get the `outputOrdering`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionState to...

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14883
  
**[Test build #64755 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64755/consoleFull)**
 for PR 14883 at commit 
[`813d987`](https://github.com/apache/spark/commit/813d987816c037becbe0515353a100b1cdc4bb44).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14907: [SPARK-17351] Refactor JDBCRDD to expose ResultSet -> Se...

2016-08-31 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/14907
  
Please merge #14911 ahead of this so that I can bring this up-to-date with 
that change. Merging in this order reduces the amount of work to backport 
#14911.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14911: [SPARK-17355] Workaround for HIVE-14684 / HiveRes...

2016-08-31 Thread JoshRosen

GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/14911

[SPARK-17355] Workaround for HIVE-14684 / HiveResultSetMetaData.isSigned 
exception

## What changes were proposed in this pull request?

Attempting to use Spark SQL's JDBC data source against the Hive 
ThriftServer results in a `java.sql.SQLException: Method` not supported 
exception from `org.apache.hive.jdbc.HiveResultSetMetaData.isSigned`. Here are 
two user reports of this issue:

- 
https://stackoverflow.com/questions/34067686/spark-1-5-1-not-working-with-hive-jdbc-1-2-0
- https://stackoverflow.com/questions/32195946/method-not-supported-in-spark

I have filed HIVE-14684 to attempt to fix this in Hive by implementing the 
isSigned method, but in the meantime / for compatibility with older JDBC 
drivers I think we should add special-case error handling to work around this 
bug.

This patch updates `JDBCRDD`'s `ResultSetMetadata` to schema conversion to 
catch the "Method not supported" exception from Hive and return `isSigned = 
true`. I believe that this is safe because, as far as I know, Hive does not 
support unsigned numeric types.

## How was this patch tested?

Tested manually against a Spark Thrift Server.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark hive-jdbc-workaround

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14911.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14911


commit 6b56880aa78a599fdf255d3668a848d9ad09691b
Author: Josh Rosen 
Date:   2016-09-01T05:43:51Z

Workaround for HIVE-14684




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14910: [SPARK-17271] [SQL] Remove redundant `semanticEquals()` ...

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14910
  
**[Test build #64760 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64760/consoleFull)**
 for PR 14910 at commit 
[`56eb557`](https://github.com/apache/spark/commit/56eb55711581d68c9dbd6c01004f6f4cb45a7b6f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14841: [SPARK-17271] [SQL] Planner adds un-necessary Sor...

2016-08-31 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/14841#discussion_r77117090
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala
 ---
@@ -61,6 +61,9 @@ case class SortOrder(child: Expression, direction: 
SortDirection)
   override def sql: String = child.sql + " " + direction.sql
 
   def isAscending: Boolean = direction == Ascending
+
+  def semanticEquals(other: SortOrder): Boolean =
--- End diff --

@cloud-fan : I see what you were trying to say before. I tried that and it 
worked. I have created a PR to clean it up : 
https://github.com/apache/spark/pull/14910 Thanks for pointing this out !!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14910: [SPARK-17271] [SQL] Remove redundant `semanticEquals()` ...

2016-08-31 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/14910
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14910: [SPARK-17271] [SQL] Remove redundant `semanticEqu...

2016-08-31 Thread tejasapatil

GitHub user tejasapatil opened a pull request:

https://github.com/apache/spark/pull/14910

[SPARK-17271] [SQL] Remove redundant `semanticEquals()` from `SortOrder`

## What changes were proposed in this pull request?

Removing `semanticEquals()` from `SortOrder` because it can use the 
`semanticEquals()` provided by its parent class (`Expression`). This was as per 
suggestion by @cloud-fan at 
https://github.com/apache/spark/pull/14841/files/7192418b3a26a14642fc04fc92bf496a954ffa5d#r77106801

## How was this patch tested?

Ran the test added in https://github.com/apache/spark/pull/14841

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tejasapatil/spark 
SPARK-17271_remove_semantic_ordering

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14910.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14910


commit 56eb55711581d68c9dbd6c01004f6f4cb45a7b6f
Author: Tejas Patil 
Date:   2016-09-01T05:44:14Z

[SPARK-17271] [SQL] Remove redundant `semanticEquals()` from `SortOrder`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] F...

2016-08-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14531#discussion_r77116198
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -660,6 +662,236 @@ class HiveDDLSuite
 }
   }
 
+  test("CREATE TABLE LIKE a temporary view") {
+val sourceViewName = "tab1"
+val targetTabName = "tab2"
+withTempView(sourceViewName) {
+  withTable(targetTabName) {
+spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+  .createTempView(sourceViewName)
+sql(s"CREATE TABLE $targetTabName LIKE $sourceViewName")
+
+val sourceTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(sourceViewName, None))
+val targetTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(targetTabName, Some("default")))
+
+checkCreateTableLike(sourceTable, targetTable)
+  }
+}
+  }
+
+  test("CREATE TABLE LIKE a data source table") {
+val sourceTabName = "tab1"
+val targetTabName = "tab2"
+withTable(sourceTabName, targetTabName) {
+  spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+.write.format("json").saveAsTable(sourceTabName)
+  sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName")
+
+  val sourceTable =
+
spark.sessionState.catalog.getTableMetadata(TableIdentifier(sourceTabName, 
Some("default")))
+  val targetTable =
+
spark.sessionState.catalog.getTableMetadata(TableIdentifier(targetTabName, 
Some("default")))
+  // The table type of the source table should be a Hive-managed data 
source table
+  assert(DDLUtils.isDatasourceTable(sourceTable))
+  assert(sourceTable.tableType == CatalogTableType.MANAGED)
+
+  checkCreateTableLike(sourceTable, targetTable)
+}
+  }
+
+  test("CREATE TABLE LIKE an external data source table") {
+val sourceTabName = "tab1"
+val targetTabName = "tab2"
+withTable(sourceTabName, targetTabName) {
+  withTempPath { dir =>
+val path = dir.getCanonicalPath
+spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+  .write.format("parquet").save(path)
+sql(s"CREATE TABLE $sourceTabName USING parquet OPTIONS (PATH 
'$path')")
+sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName")
+
+// The source table should be an external data source table
+val sourceTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(sourceTabName, Some("default")))
+val targetTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(targetTabName, Some("default")))
+// The table type of the source table should be an external data 
source table
+assert(DDLUtils.isDatasourceTable(sourceTable))
+assert(sourceTable.tableType == CatalogTableType.EXTERNAL)
+
+checkCreateTableLike(sourceTable, targetTable)
+  }
+}
+  }
+
+  test("CREATE TABLE LIKE a managed Hive serde table") {
+val catalog = spark.sessionState.catalog
+val sourceTabName = "tab1"
+val targetTabName = "tab2"
+withTable(sourceTabName, targetTabName) {
+  sql(s"CREATE TABLE $sourceTabName TBLPROPERTIES('prop1'='value1') AS 
SELECT 1 key, 'a'")
+  sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName")
+
+  val sourceTable = 
catalog.getTableMetadata(TableIdentifier(sourceTabName, Some("default")))
+  assert(sourceTable.tableType == CatalogTableType.MANAGED)
+  assert(sourceTable.properties.get("prop1").nonEmpty)
+  val targetTable = 
catalog.getTableMetadata(TableIdentifier(targetTabName, Some("default")))
+
+  checkCreateTableLike(sourceTable, targetTable)
+}
+  }
+
+  test("CREATE TABLE LIKE an external Hive serde table") {
+val catalog = spark.sessionState.catalog
+withTempDir { tmpDir =>
+  val basePath = tmpDir.getCanonicalPath
+  val sourceTabName = "tab1"
+  val targetTabName = "tab2"
+  withTable(sourceTabName, targetTabName) {
+assert(tmpDir.listFiles.isEmpty)
+sql(
+  s"""
+ |CREATE EXTERNAL TABLE $sourceTabName (key INT comment 
'test', value STRING)
+ |COMMENT 'Apache Spark'
+ |PARTITIONED BY (ds STRING, hr STRING)
+ |LOCATION '$basePath'
+   """.stripMargin)
+for (ds <- Seq("2008-04-08", "2008-04-09"); hr <- Seq("11", "12")) 
{
+  sql(
+s"""
+   |INSERT OVERWRITE TABLE

[GitHub] spark pull request #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] F...

2016-08-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14531#discussion_r77116211
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -660,6 +662,236 @@ class HiveDDLSuite
 }
   }
 
+  test("CREATE TABLE LIKE a temporary view") {
+val sourceViewName = "tab1"
+val targetTabName = "tab2"
+withTempView(sourceViewName) {
+  withTable(targetTabName) {
+spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+  .createTempView(sourceViewName)
+sql(s"CREATE TABLE $targetTabName LIKE $sourceViewName")
+
+val sourceTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(sourceViewName, None))
+val targetTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(targetTabName, Some("default")))
+
+checkCreateTableLike(sourceTable, targetTable)
+  }
+}
+  }
+
+  test("CREATE TABLE LIKE a data source table") {
+val sourceTabName = "tab1"
+val targetTabName = "tab2"
+withTable(sourceTabName, targetTabName) {
+  spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+.write.format("json").saveAsTable(sourceTabName)
+  sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName")
+
+  val sourceTable =
+
spark.sessionState.catalog.getTableMetadata(TableIdentifier(sourceTabName, 
Some("default")))
+  val targetTable =
+
spark.sessionState.catalog.getTableMetadata(TableIdentifier(targetTabName, 
Some("default")))
+  // The table type of the source table should be a Hive-managed data 
source table
+  assert(DDLUtils.isDatasourceTable(sourceTable))
+  assert(sourceTable.tableType == CatalogTableType.MANAGED)
+
+  checkCreateTableLike(sourceTable, targetTable)
+}
+  }
+
+  test("CREATE TABLE LIKE an external data source table") {
+val sourceTabName = "tab1"
+val targetTabName = "tab2"
+withTable(sourceTabName, targetTabName) {
+  withTempPath { dir =>
+val path = dir.getCanonicalPath
+spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+  .write.format("parquet").save(path)
+sql(s"CREATE TABLE $sourceTabName USING parquet OPTIONS (PATH 
'$path')")
+sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName")
+
+// The source table should be an external data source table
+val sourceTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(sourceTabName, Some("default")))
+val targetTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(targetTabName, Some("default")))
+// The table type of the source table should be an external data 
source table
+assert(DDLUtils.isDatasourceTable(sourceTable))
+assert(sourceTable.tableType == CatalogTableType.EXTERNAL)
+
+checkCreateTableLike(sourceTable, targetTable)
+  }
+}
+  }
+
+  test("CREATE TABLE LIKE a managed Hive serde table") {
+val catalog = spark.sessionState.catalog
+val sourceTabName = "tab1"
+val targetTabName = "tab2"
+withTable(sourceTabName, targetTabName) {
+  sql(s"CREATE TABLE $sourceTabName TBLPROPERTIES('prop1'='value1') AS 
SELECT 1 key, 'a'")
+  sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName")
+
+  val sourceTable = 
catalog.getTableMetadata(TableIdentifier(sourceTabName, Some("default")))
+  assert(sourceTable.tableType == CatalogTableType.MANAGED)
+  assert(sourceTable.properties.get("prop1").nonEmpty)
+  val targetTable = 
catalog.getTableMetadata(TableIdentifier(targetTabName, Some("default")))
+
+  checkCreateTableLike(sourceTable, targetTable)
+}
+  }
+
+  test("CREATE TABLE LIKE an external Hive serde table") {
+val catalog = spark.sessionState.catalog
+withTempDir { tmpDir =>
+  val basePath = tmpDir.getCanonicalPath
+  val sourceTabName = "tab1"
+  val targetTabName = "tab2"
+  withTable(sourceTabName, targetTabName) {
+assert(tmpDir.listFiles.isEmpty)
+sql(
+  s"""
+ |CREATE EXTERNAL TABLE $sourceTabName (key INT comment 
'test', value STRING)
+ |COMMENT 'Apache Spark'
+ |PARTITIONED BY (ds STRING, hr STRING)
+ |LOCATION '$basePath'
+   """.stripMargin)
+for (ds <- Seq("2008-04-08", "2008-04-09"); hr <- Seq("11", "12")) 
{
+  sql(
+s"""
+   |INSERT OVERWRITE TABLE

[GitHub] spark issue #14823: [SPARK-17257][SQL] the physical plan of CREATE TABLE or ...

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14823
  
**[Test build #64759 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64759/consoleFull)**
 for PR 14823 at commit 
[`00bf25b`](https://github.com/apache/spark/commit/00bf25b86f8d0f854013f17ae1850552156eda8e).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14823: [SPARK-17257][SQL] the physical plan of CREATE TABLE or ...

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14823
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64759/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-08-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r77115989
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -184,4 +184,9 @@ abstract class ExternalCatalog {
 
   def listFunctions(db: String, pattern: String): Seq[String]
 
+  // 
--
+  // Resources
+  // 
--
+
+  def addJar(path: String): Unit
--- End diff --

Let me rephrase it. 

>Add a JAR resource to the underlying external catalog for DDL (e.g. CREATE 
TABLE) and DML (e.g., LOAD TABLE) operations.

>For example, when users create a Hive serde table, they can specify a 
custom Serializer-Deserializer (SerDe) class. When Hive metastore is unable to 
access the custom SerDe JAR (e.g., not on the Hive classpath), the JAR file 
must be added at runtime using this API.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] Fix mult...

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14531
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64754/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14823: [SPARK-17257][SQL] the physical plan of CREATE TABLE or ...

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14823
  
**[Test build #64759 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64759/consoleFull)**
 for PR 14823 at commit 
[`00bf25b`](https://github.com/apache/spark/commit/00bf25b86f8d0f854013f17ae1850552156eda8e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] Fix mult...

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14531
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] Fix mult...

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14531
  
**[Test build #64754 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64754/consoleFull)**
 for PR 14531 at commit 
[`4ce96e6`](https://github.com/apache/spark/commit/4ce96e62adaa28965fb7c85e246ce2e1c86eba60).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...

2016-08-31 Thread kevinyu98

Github user kevinyu98 commented on the issue:

https://github.com/apache/spark/pull/12646
  
@chenghao-intel I have updated the codes based on your comments. Thanks a 
lot.

Sure, I will work on that jira, so the fix is to just remove the space, 
nothing else, right? Will that break the existing applications which rely on 
this function to remove space and other characters less than x20 and great than 
0?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14909: revert PR#10896 and PR#14865

2016-08-31 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/14909
  
okay, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14909: revert PR#10896 and PR#14865

2016-08-31 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14909


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14876: showcase, DO NOT MERGE

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14876
  
closing, @maropu will take over


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14876: showcase, DO NOT MERGE

2016-08-31 Thread cloud-fan

Github user cloud-fan closed the pull request at:

https://github.com/apache/spark/pull/14876


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14909: revert PR#10896 and PR#14865

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14909
  
merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14841: [SPARK-17271] [SQL] Planner adds un-necessary Sor...

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14841#discussion_r77114998
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala
 ---
@@ -61,6 +61,9 @@ case class SortOrder(child: Expression, direction: 
SortDirection)
   override def sql: String = child.sql + " " + direction.sql
 
   def isAscending: Boolean = direction == Ascending
+
+  def semanticEquals(other: SortOrder): Boolean =
--- End diff --

yea I understand in `EnsureRequirements` we should use `semanticEquals` 
instead of `==` to compare `SortOrder`, but why we need to implement 
`samanticEquals` again in `SortOrder`? What's wrong with the default 
implementation?

I mean, there is no need to "introduce" a `semanticEquals` in `SortOrder`, 
it already has.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-31 Thread Sherry302

Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
@steveloughran Thank you very much. I have updated the PR based on your 
comments. Also, I have added an unit test. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14783
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14783
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64756/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14783
  
**[Test build #64756 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64756/consoleFull)**
 for PR 14783 at commit 
[`77fa9b4`](https://github.com/apache/spark/commit/77fa9b4bb121455d51b43ba8705d876e2549850c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-31 Thread Sherry302

Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
@srowen Thanks all the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14868: [SPARK-16283][SQL] Implements percentile_approx a...

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14868#discussion_r77114814
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 ---
@@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import java.nio.ByteBuffer
+
+import com.google.common.primitives.{Doubles, Ints, Longs}
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.{InternalRow}
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import 
org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, 
TypeCheckSuccess}
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile.{PercentileDigest}
+import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData}
+import org.apache.spark.sql.catalyst.util.QuantileSummaries
+import 
org.apache.spark.sql.catalyst.util.QuantileSummaries.{defaultCompressThreshold, 
Stats}
+import org.apache.spark.sql.types._
+
+/**
+ * The ApproximatePercentile function returns the approximate 
percentile(s) of a column at the given
+ * percentage(s). A percentile is a watermark value below which a given 
percentage of the column
+ * values fall. For example, the percentile of column `col` at percentage 
50% is the median of
+ * column `col`.
+ *
+ * This function supports partial aggregation.
+ *
+ * @param child child expression that can produce column value with 
`child.eval(inputRow)`
+ * @param percentageExpression Expression that represents a single 
percentage value or
+ * an array of percentage values. Each 
percentage value must be between
+ * 0.0 and 1.0.
+ * @param accuracyExpression Integer literal expression of approximation 
accuracy. Higher value
+ *   yields better accuracy, the default value is
+ *   DEFAULT_PERCENTILE_ACCURACY.
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(col, percentage [, accuracy]) - Returns the approximate 
percentile value of numeric
+  column `col` at the given percentage. The value of percentage must 
be between 0.0
+  and 1.0. The `accuracy` parameter (default: 1) is a positive 
integer literal which
+  controls approximation accuracy at the cost of memory. Higher value 
of `accuracy` yields
+  better accuracy, `1.0/accuracy` is the relative error of the 
approximation.
+
+  _FUNC_(col, array(percentage1 [, percentage2]...) [, accuracy]) - 
Returns the approximate
+  percentile array of column `col` at the given percentage array. Each 
value of the
+  percentage array must be between 0.0 and 1.0. The `accuracy` 
parameter (default: 1) is
+   a positive integer literal which controls approximation accuracy at 
the cost of memory.
+   Higher value of `accuracy` yields better accuracy, `1.0/accuracy` 
is the relative error of
+   the approximation.
+""")
+case class ApproximatePercentile(
+child: Expression,
+percentageExpression: Expression,
+accuracyExpression: Expression,
+override val mutableAggBufferOffset: Int,
+override val inputAggBufferOffset: Int) extends 
TypedImperativeAggregate[PercentileDigest] {
+
+  def this(child: Expression, percentageExpression: Expression, 
accuracyExpression: Expression) = {
+this(child, percentageExpression, accuracyExpression, 0, 0)
+  }
+
+  def this(child: Expression, percentageExpression: Expression) = {
+this(child, percentageExpression, 
Literal(ApproximatePercentile.DEFAULT_PERCENTILE_ACCURACY))
+  }
+
+  // Mark as lazy so that accuracyExpression is not evaluated during tree 
transformation.
+  private lazy

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14452
  
**[Test build #64758 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64758/consoleFull)**
 for PR 14452 at commit 
[`e9b0952`](https://github.com/apache/spark/commit/e9b09527ca98b3f99b43be3a028f04a207422389).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14659
  
**[Test build #64757 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64757/consoleFull)**
 for PR 14659 at commit 
[`ae42093`](https://github.com/apache/spark/commit/ae42093e59e37d0a4fda4280f2bbffec18c594d3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-08-31 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r77114469
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -184,4 +184,9 @@ abstract class ExternalCatalog {
 
   def listFunctions(db: String, pattern: String): Seq[String]
 
+  // 
--
+  // Resources
+  // 
--
+
+  def addJar(path: String): Unit
--- End diff --

Add a jar resource to the underlying external catalog system for DDL 
operations. And followed by the example of Hive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-08-31 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r77114400
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -184,4 +184,9 @@ abstract class ExternalCatalog {
 
   def listFunctions(db: String, pattern: String): Seq[String]
 
+  // 
--
+  // Resources
+  // 
--
+
+  def addJar(path: String): Unit
--- End diff --

yea, I don't think we should limit `addJar` semantics to Hive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14868: [SPARK-16283][SQL] Implements percentile_approx a...

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14868#discussion_r77114139
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 ---
@@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import java.nio.ByteBuffer
+
+import com.google.common.primitives.{Doubles, Ints, Longs}
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.{InternalRow}
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import 
org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, 
TypeCheckSuccess}
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile.{PercentileDigest}
+import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData}
+import org.apache.spark.sql.catalyst.util.QuantileSummaries
+import 
org.apache.spark.sql.catalyst.util.QuantileSummaries.{defaultCompressThreshold, 
Stats}
+import org.apache.spark.sql.types._
+
+/**
+ * The ApproximatePercentile function returns the approximate 
percentile(s) of a column at the given
+ * percentage(s). A percentile is a watermark value below which a given 
percentage of the column
+ * values fall. For example, the percentile of column `col` at percentage 
50% is the median of
+ * column `col`.
+ *
+ * This function supports partial aggregation.
+ *
+ * @param child child expression that can produce column value with 
`child.eval(inputRow)`
+ * @param percentageExpression Expression that represents a single 
percentage value or
+ * an array of percentage values. Each 
percentage value must be between
+ * 0.0 and 1.0.
+ * @param accuracyExpression Integer literal expression of approximation 
accuracy. Higher value
+ *   yields better accuracy, the default value is
+ *   DEFAULT_PERCENTILE_ACCURACY.
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(col, percentage [, accuracy]) - Returns the approximate 
percentile value of numeric
+  column `col` at the given percentage. The value of percentage must 
be between 0.0
+  and 1.0. The `accuracy` parameter (default: 1) is a positive 
integer literal which
+  controls approximation accuracy at the cost of memory. Higher value 
of `accuracy` yields
+  better accuracy, `1.0/accuracy` is the relative error of the 
approximation.
+
+  _FUNC_(col, array(percentage1 [, percentage2]...) [, accuracy]) - 
Returns the approximate
+  percentile array of column `col` at the given percentage array. Each 
value of the
+  percentage array must be between 0.0 and 1.0. The `accuracy` 
parameter (default: 1) is
+   a positive integer literal which controls approximation accuracy at 
the cost of memory.
+   Higher value of `accuracy` yields better accuracy, `1.0/accuracy` 
is the relative error of
+   the approximation.
+""")
+case class ApproximatePercentile(
+child: Expression,
+percentageExpression: Expression,
+accuracyExpression: Expression,
+override val mutableAggBufferOffset: Int,
+override val inputAggBufferOffset: Int) extends 
TypedImperativeAggregate[PercentileDigest] {
+
+  def this(child: Expression, percentageExpression: Expression, 
accuracyExpression: Expression) = {
+this(child, percentageExpression, accuracyExpression, 0, 0)
+  }
+
+  def this(child: Expression, percentageExpression: Expression) = {
+this(child, percentageExpression, 
Literal(ApproximatePercentile.DEFAULT_PERCENTILE_ACCURACY))
+  }
+
+  // Mark as lazy so that accuracyExpression is not evaluated during tree 
transformation.
+  private lazy

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-31 Thread angolon

Github user angolon commented on the issue:

https://github.com/apache/spark/pull/14710
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14909: revert PR#10896 and PR#14865

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14909
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14909: revert PR#10896 and PR#14865

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14909
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64752/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14909: revert PR#10896 and PR#14865

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14909
  
**[Test build #64752 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64752/consoleFull)**
 for PR 14909 at commit 
[`78cf93b`](https://github.com/apache/spark/commit/78cf93bf7c7aafd2fdbfe8d1e3f7c3c6391a0429).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML][WIP] Add initial model to kmeans

2016-08-31 Thread yinxusen

Github user yinxusen commented on the issue:

https://github.com/apache/spark/pull/9
  
Thanks @sethah and @dbtsai, I'll fix them soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14841: [SPARK-17271] [SQL] Planner adds un-necessary Sor...

2016-08-31 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/14841#discussion_r77113690
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala
 ---
@@ -61,6 +61,9 @@ case class SortOrder(child: Expression, direction: 
SortDirection)
   override def sql: String = child.sql + " " + direction.sql
 
   def isAscending: Boolean = direction == Ascending
+
+  def semanticEquals(other: SortOrder): Boolean =
--- End diff --

@cloud-fan : If you look at the old version of `EnsureRequirements` below 
at L253, it compared raw `SortOrder` objects which will use `equals()` 
generated for it. In scala, `equals()` for case classes is merely doing 
`equals()` over all its fields so that lead to `Expression`'s `equals()` being 
used instead of its `semanticEquals()`.

My fix here was to introduce a `semanticEquals` in `SortOrder` which 
compares the underlying `Expression` semantically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r77113584
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -184,4 +184,9 @@ abstract class ExternalCatalog {
 
   def listFunctions(db: String, pattern: String): Seq[String]
 
+  // 
--
+  // Resources
+  // 
--
+
+  def addJar(path: String): Unit
--- End diff --

Do we have to mention hive here? I'd like to add some documents here to 
describe the semantic, which can explain why `InMemoryCatalog` can do nothing 
but `HiveExternalCatalog` need some extra logic


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-08-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r77113302
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -184,4 +184,9 @@ abstract class ExternalCatalog {
 
   def listFunctions(db: String, pattern: String): Seq[String]
 
+  // 
--
+  // Resources
+  // 
--
+
+  def addJar(path: String): Unit
--- End diff --

Add a resource to the underlying Hive metastore for DDL operations

For example, if we do not use HiveClient to pass the `ADD JAR` command to 
Hive metastore, we are unable to create the table. Thus, it sounds fine to put 
`addJar` into `ExternalCatalog`.
```Scala
val testJar = 
TestHive.getHiveFile("hive-hcatalog-core-0.13.1.jar").getCanonicalPath
sql(s"ADD JAR $testJar")
sql(
  """
|CREATE TABLE t1(a string, b string)
|ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
  """.stripMargin)
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r77113222
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala
 ---
@@ -88,24 +85,53 @@ case class AnalyzeTableCommand(tableName: String) 
extends RunnableCommand {
 }
   }.getOrElse(0L)
 
-// Update the Hive metastore if the total size of the table is 
different than the size
-// recorded in the Hive metastore.
-// This logic is based on 
org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats().
-if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
-  sessionState.catalog.alterTable(
-catalogTable.copy(
-  properties = relation.catalogTable.properties +
-(AnalyzeTableCommand.TOTAL_SIZE_FIELD -> 
newTotalSize.toString)))
-}
+updateTableStats(
+  catalogTable,
+  oldTotalSize = 
catalogTable.stats.map(_.sizeInBytes.toLong).getOrElse(0L),
+  oldRowCount = 
catalogTable.stats.flatMap(_.rowCount.map(_.toLong)).getOrElse(-1L),
+  newTotalSize = newTotalSize)
+
+  // data source tables have been converted into LogicalRelations
+  case logicalRel: LogicalRelation if 
logicalRel.catalogTable.isDefined =>
+updateTableStats(
+  logicalRel.catalogTable.get,
+  oldTotalSize = logicalRel.statistics.sizeInBytes.toLong,
+  oldRowCount = 
logicalRel.statistics.rowCount.map(_.toLong).getOrElse(-1L),
+  newTotalSize = logicalRel.relation.sizeInBytes)
--- End diff --

looks like `logicalRel.relation.sizeInBytes` is always equal to 
`logicalRel.statistics.sizeInBytes.toLong`?
```
@transient override lazy val statistics: Statistics = {
   catalogTable.flatMap(_.stats.map(_.copy(sizeInBytes = 
relation.sizeInBytes))).getOrElse(
   Statistics(sizeInBytes = relation.sizeInBytes))
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r77113174
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala
 ---
@@ -52,7 +52,8 @@ case class LogicalRelation(
 
   // Logical Relations are distinct if they have different output for the 
sake of transformations.
   override def equals(other: Any): Boolean = other match {
-case l @ LogicalRelation(otherRelation, _, _) => relation == 
otherRelation && output == l.output
+case l @ LogicalRelation(otherRelation, _, _) =>
+  relation == otherRelation && output == l.output
--- End diff --

unnecessary change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...

2016-08-31 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14856


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r77113054
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala
 ---
@@ -88,24 +85,53 @@ case class AnalyzeTableCommand(tableName: String) 
extends RunnableCommand {
 }
   }.getOrElse(0L)
 
-// Update the Hive metastore if the total size of the table is 
different than the size
-// recorded in the Hive metastore.
-// This logic is based on 
org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats().
-if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
-  sessionState.catalog.alterTable(
-catalogTable.copy(
-  properties = relation.catalogTable.properties +
-(AnalyzeTableCommand.TOTAL_SIZE_FIELD -> 
newTotalSize.toString)))
-}
+updateTableStats(
+  catalogTable,
+  oldTotalSize = 
catalogTable.stats.map(_.sizeInBytes.toLong).getOrElse(0L),
+  oldRowCount = 
catalogTable.stats.flatMap(_.rowCount.map(_.toLong)).getOrElse(-1L),
+  newTotalSize = newTotalSize)
+
+  // data source tables have been converted into LogicalRelations
+  case logicalRel: LogicalRelation if 
logicalRel.catalogTable.isDefined =>
+updateTableStats(
+  logicalRel.catalogTable.get,
+  oldTotalSize = logicalRel.statistics.sizeInBytes.toLong,
+  oldRowCount = 
logicalRel.statistics.rowCount.map(_.toLong).getOrElse(-1L),
+  newTotalSize = logicalRel.relation.sizeInBytes)
--- End diff --

looks like `logicalRel.relation.sizeInBytes` is always equal to 
`logicalRel.statistics.sizeInBytes.toLong`?
```
  @transient override lazy val statistics: Statistics = Statistics(
sizeInBytes = BigInt(relation.sizeInBytes)
  )
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm should hav...

2016-08-31 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14856
  
Thanks @keypointt for the PR and @junyangq @felixcheung for reviewing. 
Merging this into master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14905: [SPARK-17318][Tests]Fix ReplSuite replicating blocks of ...

2016-08-31 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/14905
  
> sc.jobProgressListener.waitUntilExecutorsUp(2, 3)

It's not a public API. So I cannot use it in the repl


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14823: [SPARK-17257][SQL] the physical plan of CREATE TABLE or ...

2016-08-31 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14823
  
LGTM except one minor comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14783
  
**[Test build #64756 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64756/consoleFull)**
 for PR 14783 at commit 
[`77fa9b4`](https://github.com/apache/spark/commit/77fa9b4bb121455d51b43ba8705d876e2549850c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14783
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14903: [SparkR][Minor] Fix windowPartitionBy example

2016-08-31 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14903


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14903: [SparkR][Minor] Fix windowPartitionBy example

2016-08-31 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14903
  
Merging this into master and branch-2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-31 Thread angolon

Github user angolon commented on the issue:

https://github.com/apache/spark/pull/14710
  
...*sigh*


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14710
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14710
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64751/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14710
  
**[Test build #64751 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64751/consoleFull)**
 for PR 14710 at commit 
[`0772e81`](https://github.com/apache/spark/commit/0772e8195443566d37c9837798ef075eaa79c66b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class AlterViewAsCommand(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-08-31 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14388
  
@mallman Thanks. I will not share that file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14712
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64750/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14712
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14712
  
**[Test build #64750 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64750/consoleFull)**
 for PR 14712 at commit 
[`aa438c4`](https://github.com/apache/spark/commit/aa438c43f78d5edd679fd3e6294d953181a40268).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14823: [SPARK-17257][SQL] the physical plan of CREATE TA...

2016-08-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14823#discussion_r77111446
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 ---
@@ -123,10 +108,7 @@ case class CreateDataSourceTableCommand(
  * }}}
  */
 case class CreateDataSourceTableAsSelectCommand(
-tableIdent: TableIdentifier,
-provider: String,
-partitionColumns: Array[String],
-bucketSpec: Option[BucketSpec],
+table: CatalogTable,
 mode: SaveMode,
 options: Map[String, String],
--- End diff --

This can be removed. Not used after this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14908: [WEBUI][SPARK-17352]Executor computing time can be negat...

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14908
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64747/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14908: [WEBUI][SPARK-17352]Executor computing time can be negat...

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14908
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] F...

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14531#discussion_r7752
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -660,6 +662,236 @@ class HiveDDLSuite
 }
   }
 
+  test("CREATE TABLE LIKE a temporary view") {
+val sourceViewName = "tab1"
+val targetTabName = "tab2"
+withTempView(sourceViewName) {
+  withTable(targetTabName) {
+spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+  .createTempView(sourceViewName)
+sql(s"CREATE TABLE $targetTabName LIKE $sourceViewName")
+
+val sourceTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(sourceViewName, None))
+val targetTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(targetTabName, Some("default")))
+
+checkCreateTableLike(sourceTable, targetTable)
+  }
+}
+  }
+
+  test("CREATE TABLE LIKE a data source table") {
+val sourceTabName = "tab1"
+val targetTabName = "tab2"
+withTable(sourceTabName, targetTabName) {
+  spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+.write.format("json").saveAsTable(sourceTabName)
+  sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName")
+
+  val sourceTable =
+
spark.sessionState.catalog.getTableMetadata(TableIdentifier(sourceTabName, 
Some("default")))
+  val targetTable =
+
spark.sessionState.catalog.getTableMetadata(TableIdentifier(targetTabName, 
Some("default")))
+  // The table type of the source table should be a Hive-managed data 
source table
+  assert(DDLUtils.isDatasourceTable(sourceTable))
+  assert(sourceTable.tableType == CatalogTableType.MANAGED)
+
+  checkCreateTableLike(sourceTable, targetTable)
+}
+  }
+
+  test("CREATE TABLE LIKE an external data source table") {
+val sourceTabName = "tab1"
+val targetTabName = "tab2"
+withTable(sourceTabName, targetTabName) {
+  withTempPath { dir =>
+val path = dir.getCanonicalPath
+spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+  .write.format("parquet").save(path)
+sql(s"CREATE TABLE $sourceTabName USING parquet OPTIONS (PATH 
'$path')")
+sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName")
+
+// The source table should be an external data source table
+val sourceTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(sourceTabName, Some("default")))
+val targetTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(targetTabName, Some("default")))
+// The table type of the source table should be an external data 
source table
+assert(DDLUtils.isDatasourceTable(sourceTable))
+assert(sourceTable.tableType == CatalogTableType.EXTERNAL)
+
+checkCreateTableLike(sourceTable, targetTable)
+  }
+}
+  }
+
+  test("CREATE TABLE LIKE a managed Hive serde table") {
+val catalog = spark.sessionState.catalog
+val sourceTabName = "tab1"
+val targetTabName = "tab2"
+withTable(sourceTabName, targetTabName) {
+  sql(s"CREATE TABLE $sourceTabName TBLPROPERTIES('prop1'='value1') AS 
SELECT 1 key, 'a'")
+  sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName")
+
+  val sourceTable = 
catalog.getTableMetadata(TableIdentifier(sourceTabName, Some("default")))
+  assert(sourceTable.tableType == CatalogTableType.MANAGED)
+  assert(sourceTable.properties.get("prop1").nonEmpty)
+  val targetTable = 
catalog.getTableMetadata(TableIdentifier(targetTabName, Some("default")))
+
+  checkCreateTableLike(sourceTable, targetTable)
+}
+  }
+
+  test("CREATE TABLE LIKE an external Hive serde table") {
+val catalog = spark.sessionState.catalog
+withTempDir { tmpDir =>
+  val basePath = tmpDir.getCanonicalPath
+  val sourceTabName = "tab1"
+  val targetTabName = "tab2"
+  withTable(sourceTabName, targetTabName) {
+assert(tmpDir.listFiles.isEmpty)
+sql(
+  s"""
+ |CREATE EXTERNAL TABLE $sourceTabName (key INT comment 
'test', value STRING)
+ |COMMENT 'Apache Spark'
+ |PARTITIONED BY (ds STRING, hr STRING)
+ |LOCATION '$basePath'
+   """.stripMargin)
+for (ds <- Seq("2008-04-08", "2008-04-09"); hr <- Seq("11", "12")) 
{
+  sql(
+s"""
+   |INSERT OVERWRITE TABLE

[GitHub] spark issue #14908: [WEBUI][SPARK-17352]Executor computing time can be negat...

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14908
  
**[Test build #64747 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64747/consoleFull)**
 for PR 14908 at commit 
[`0908a36`](https://github.com/apache/spark/commit/0908a365970ced444fea0b9107c37484189d209d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14900
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64749/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14900
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] F...

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14531#discussion_r77111051
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -660,6 +662,236 @@ class HiveDDLSuite
 }
   }
 
+  test("CREATE TABLE LIKE a temporary view") {
+val sourceViewName = "tab1"
+val targetTabName = "tab2"
+withTempView(sourceViewName) {
+  withTable(targetTabName) {
+spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+  .createTempView(sourceViewName)
+sql(s"CREATE TABLE $targetTabName LIKE $sourceViewName")
+
+val sourceTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(sourceViewName, None))
+val targetTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(targetTabName, Some("default")))
+
+checkCreateTableLike(sourceTable, targetTable)
+  }
+}
+  }
+
+  test("CREATE TABLE LIKE a data source table") {
+val sourceTabName = "tab1"
+val targetTabName = "tab2"
+withTable(sourceTabName, targetTabName) {
+  spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+.write.format("json").saveAsTable(sourceTabName)
+  sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName")
+
+  val sourceTable =
+
spark.sessionState.catalog.getTableMetadata(TableIdentifier(sourceTabName, 
Some("default")))
+  val targetTable =
+
spark.sessionState.catalog.getTableMetadata(TableIdentifier(targetTabName, 
Some("default")))
+  // The table type of the source table should be a Hive-managed data 
source table
+  assert(DDLUtils.isDatasourceTable(sourceTable))
+  assert(sourceTable.tableType == CatalogTableType.MANAGED)
+
+  checkCreateTableLike(sourceTable, targetTable)
+}
+  }
+
+  test("CREATE TABLE LIKE an external data source table") {
+val sourceTabName = "tab1"
+val targetTabName = "tab2"
+withTable(sourceTabName, targetTabName) {
+  withTempPath { dir =>
+val path = dir.getCanonicalPath
+spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd)
+  .write.format("parquet").save(path)
+sql(s"CREATE TABLE $sourceTabName USING parquet OPTIONS (PATH 
'$path')")
+sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName")
+
+// The source table should be an external data source table
+val sourceTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(sourceTabName, Some("default")))
+val targetTable = spark.sessionState.catalog.getTableMetadata(
+  TableIdentifier(targetTabName, Some("default")))
+// The table type of the source table should be an external data 
source table
+assert(DDLUtils.isDatasourceTable(sourceTable))
+assert(sourceTable.tableType == CatalogTableType.EXTERNAL)
+
+checkCreateTableLike(sourceTable, targetTable)
+  }
+}
+  }
+
+  test("CREATE TABLE LIKE a managed Hive serde table") {
+val catalog = spark.sessionState.catalog
+val sourceTabName = "tab1"
+val targetTabName = "tab2"
+withTable(sourceTabName, targetTabName) {
+  sql(s"CREATE TABLE $sourceTabName TBLPROPERTIES('prop1'='value1') AS 
SELECT 1 key, 'a'")
+  sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName")
+
+  val sourceTable = 
catalog.getTableMetadata(TableIdentifier(sourceTabName, Some("default")))
+  assert(sourceTable.tableType == CatalogTableType.MANAGED)
+  assert(sourceTable.properties.get("prop1").nonEmpty)
+  val targetTable = 
catalog.getTableMetadata(TableIdentifier(targetTabName, Some("default")))
+
+  checkCreateTableLike(sourceTable, targetTable)
+}
+  }
+
+  test("CREATE TABLE LIKE an external Hive serde table") {
+val catalog = spark.sessionState.catalog
+withTempDir { tmpDir =>
+  val basePath = tmpDir.getCanonicalPath
+  val sourceTabName = "tab1"
+  val targetTabName = "tab2"
+  withTable(sourceTabName, targetTabName) {
+assert(tmpDir.listFiles.isEmpty)
+sql(
+  s"""
+ |CREATE EXTERNAL TABLE $sourceTabName (key INT comment 
'test', value STRING)
+ |COMMENT 'Apache Spark'
+ |PARTITIONED BY (ds STRING, hr STRING)
+ |LOCATION '$basePath'
+   """.stripMargin)
+for (ds <- Seq("2008-04-08", "2008-04-09"); hr <- Seq("11", "12")) 
{
+  sql(
+s"""
+   |INSERT OVERWRITE TABLE

[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14900
  
**[Test build #64749 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64749/consoleFull)**
 for PR 14900 at commit 
[`d32d1e1`](https://github.com/apache/spark/commit/d32d1e1596cd44ccfcfc9d262d1f3ddeb263d31e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14900
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14900
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64748/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14900
  
**[Test build #64748 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64748/consoleFull)**
 for PR 14900 at commit 
[`d32d1e1`](https://github.com/apache/spark/commit/d32d1e1596cd44ccfcfc9d262d1f3ddeb263d31e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...

2016-08-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14859
  
@shivaram @felixcheung Thanks for your feedback.

I will also test if your comments are actually feasible or not (building 
nightly & filtering commits).
Then, I will try to clean up and double-check the comment and then turn it 
into a .md (with filling up more details).

I do like the detection but to be honest I would like to avoid adding a lot 
of logics here although it seems feasible. So, please let me do the filtering 
commits things here first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-08-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r77110718
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -184,4 +184,9 @@ abstract class ExternalCatalog {
 
   def listFunctions(db: String, pattern: String): Seq[String]
 
+  // 
--
+  // Resources
+  // 
--
+
+  def addJar(path: String): Unit
--- End diff --

I'm thinking of how to define the semantic of `ExternalCatalog.addJar`, any 
ideas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14858: [SPARK-17219][ML] Add NaN value handling in Bucketizer

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14858
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64753/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14858: [SPARK-17219][ML] Add NaN value handling in Bucketizer

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14858
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14858: [SPARK-17219][ML] Add NaN value handling in Bucketizer

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14858
  
**[Test build #64753 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64753/consoleFull)**
 for PR 14858 at commit 
[`a16ea15`](https://github.com/apache/spark/commit/a16ea154aa5ea3680ada20639c6b4696adb537f3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionState to...

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14883
  
**[Test build #64755 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64755/consoleFull)**
 for PR 14883 at commit 
[`813d987`](https://github.com/apache/spark/commit/813d987816c037becbe0515353a100b1cdc4bb44).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14553: [SPARK-16963] Changes to Source trait and related implem...

2016-08-31 Thread ScrapCodes

Github user ScrapCodes commented on the issue:

https://github.com/apache/spark/pull/14553
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-08-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r77110077
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala 
---
@@ -37,13 +38,13 @@ case class AddJarCommand(path: String) extends 
RunnableCommand {
   }
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
-sparkSession.sessionState.addJar(path)
+sparkSession.sharedState.addJar(path)
 Seq(Row(0))
   }
 }
 
 /**
- * Adds a file to the current session so it can be used.
+ * Adds a cross-session file so it can be used.
--- End diff --

Also updated the command of `ADD FILE`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #10225: [SPARK-12196][Core] Store/retrieve blocks in diff...

2016-08-31 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10225#discussion_r77109895
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala ---
@@ -50,35 +50,98 @@ private[spark] class DiskBlockManager(conf: SparkConf, 
deleteFilesOnStop: Boolea
 
   private val shutdownHook = addShutdownHook()
 
+  private abstract class FileAllocationStrategy {
+def apply(filename: String): File
+
+protected def getFile(filename: String, storageDirs: Array[File]): 
File = {
+  require(storageDirs.nonEmpty, "could not find file when the 
directories are empty")
+
+  // Figure out which local directory it hashes to, and which 
subdirectory in that
+  val hash = Utils.nonNegativeHash(filename)
+  val dirId = localDirs.indexOf(storageDirs(hash % storageDirs.length))
+  val subDirId = (hash / storageDirs.length) % subDirsPerLocalDir
+
+  // Create the subdirectory if it doesn't already exist
+  val subDir = subDirs(dirId).synchronized {
+val old = subDirs(dirId)(subDirId)
+if (old != null) {
+  old
+} else {
+  val newDir = new File(localDirs(dirId), "%02x".format(subDirId))
+  if (!newDir.exists() && !newDir.mkdir()) {
--- End diff --

I see. This may not be an important issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14861: [SPARK-17287] [PYSPARK] Add recursive kwarg to Py...

2016-08-31 Thread jpiper

Github user jpiper commented on a diff in the pull request:

https://github.com/apache/spark/pull/14861#discussion_r77109772
  
--- Diff: python/test_support/test_folder/test_folder2/hello.txt ---
@@ -0,0 +1 @@
+Hello World!
--- End diff --

I wanted to ensure that the recursiveness was working and it seemed a bit 
heavy handed to distribute the entire `/test_support/sql/` folder using 
`addFile` - however I'm happy to just use that if you think it's better 
practice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14531: [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bugs in C...

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14531
  
**[Test build #64754 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64754/consoleFull)**
 for PR 14531 at commit 
[`4ce96e6`](https://github.com/apache/spark/commit/4ce96e62adaa28965fb7c85e246ce2e1c86eba60).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14861: [SPARK-17287] [PYSPARK] Add recursive kwarg to Py...

2016-08-31 Thread zjffdu

Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14861#discussion_r77109177
  
--- Diff: python/test_support/test_folder/test_folder2/hello.txt ---
@@ -0,0 +1 @@
+Hello World!
--- End diff --

Sorry didn't notice this is for test. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14861: [SPARK-17287] [PYSPARK] Add recursive kwarg to Py...

2016-08-31 Thread zjffdu

Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14861#discussion_r77109142
  
--- Diff: python/test_support/test_folder/test_folder2/hello.txt ---
@@ -0,0 +1 @@
+Hello World!
--- End diff --

Please remove this file


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

2016-08-31 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14712
  
Looks much better now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-31 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r77108572
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala
 ---
@@ -72,9 +73,11 @@ case class LogicalRelation(
   // expId can be different but the relation is still the same.
   override lazy val cleanArgs: Seq[Any] = Seq(relation)
 
-  @transient override lazy val statistics: Statistics = Statistics(
-sizeInBytes = BigInt(relation.sizeInBytes)
-  )
+  // inheritedStats is inherited from a CatalogRelation
--- End diff --

The comment is not correct now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14686: [SPARK-16253][SQL] make spark sql compatible with hive s...

2016-08-31 Thread zenglinxi0615

Github user zenglinxi0615 commented on the issue:

https://github.com/apache/spark/pull/14686
  
sorry for long time no response. 
yes, you are right, when you can change the sql from using '/temp/test.py' 
to using 'python /temp/test.py', there's no need for changing the spark source 
code.
However, this patch is work for the case when there are already many hive 
sql which using '/temp/test.py', it cost too much time for modifing these hive 
sql, so we want to spark sql compatible with hive sql that using python script 
transform like using 'xxx.py'.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14452: [SPARK-16849][SQL] Improve subquery execution by ...

2016-08-31 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14452#discussion_r77108228
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/subquery/CommonSubquery.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.subquery
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.QueryPlan
+import org.apache.spark.sql.catalyst.plans.logical
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, 
Statistics}
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.util.Utils
+
+private[sql] case class CommonSubquery(
+output: Seq[Attribute],
+@transient child: SparkPlan)(
+@transient val logicalChild: LogicalPlan,
+private[sql] val _statistics: Statistics,
+@transient private[sql] var _computedOutput: RDD[InternalRow] = null)
--- End diff --

I was thinking that `_computedOutput` will not be kept for all 
`CommonSubquery` sharing it. But it is not true. It will. So I think it is no 
problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14866
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64746/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-08-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14866
  
**[Test build #3242 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3242/consoleFull)**
 for PR 14866 at commit 
[`d5113f3`](https://github.com/apache/spark/commit/d5113f33c012f58bb079474296fd6cef6f583b1f).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-08-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14866
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 631 matches

Mail list logo