[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...

2017-01-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16528


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...

2017-01-16 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16528#discussion_r96314156
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala
 ---
@@ -131,17 +131,15 @@ object GenerateOrdering extends 
CodeGenerator[Seq[SortOrder], Ordering[InternalR
   return 0;
 """
   },
-  foldFunctions = { funCalls =>
-funCalls.zipWithIndex.map { case (funCall, i) =>
-  val comp = ctx.freshName("comp")
-  s"""
-int $comp = $funCall;
-if ($comp != 0) {
-  return $comp;
-}
-  """
-}.mkString
-  })
+  foldFunctions = _.map { funCall =>
--- End diff --

How about we revert this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...

2017-01-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16528#discussion_r96108967
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -138,7 +139,27 @@ def listColumns(self, tableName, dbName=None):
 
 @since(2.0)
 def createExternalTable(self, tableName, path=None, source=None, 
schema=None, **options):
-"""Creates an external table based on the dataset in a data source.
+"""Creates a table based on the dataset in a data source.
+
+It returns the DataFrame associated with the external table.
+
+The data source is specified by the ``source`` and a set of 
``options``.
+If ``source`` is not specified, the default data source configured 
by
+``spark.sql.sources.default`` will be used.
+
+Optionally, a schema can be provided as the schema of the returned 
:class:`DataFrame` and
+created external table.
+
+:return: :class:`DataFrame`
+"""
+warnings.warn(
+"createExternalTable is deprecated since Spark 2.2, please use 
createTable instead.",
+DeprecationWarning)
+return self.createTable(tableName, path, source, schema, **options)
--- End diff --

Yeah. Got it. I also manually tried it in `pyspark`. It works fine. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...

2017-01-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16528#discussion_r96108357
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -138,7 +139,27 @@ def listColumns(self, tableName, dbName=None):
 
 @since(2.0)
 def createExternalTable(self, tableName, path=None, source=None, 
schema=None, **options):
-"""Creates an external table based on the dataset in a data source.
+"""Creates a table based on the dataset in a data source.
+
+It returns the DataFrame associated with the external table.
+
+The data source is specified by the ``source`` and a set of 
``options``.
+If ``source`` is not specified, the default data source configured 
by
+``spark.sql.sources.default`` will be used.
+
+Optionally, a schema can be provided as the schema of the returned 
:class:`DataFrame` and
+created external table.
+
+:return: :class:`DataFrame`
+"""
+warnings.warn(
+"createExternalTable is deprecated since Spark 2.2, please use 
createTable instead.",
+DeprecationWarning)
+return self.createTable(tableName, path, source, schema, **options)
--- End diff --

it's python syntax, like what we do in scala: `func(options: _*)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...

2017-01-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16528#discussion_r96033171
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -138,7 +139,27 @@ def listColumns(self, tableName, dbName=None):
 
 @since(2.0)
 def createExternalTable(self, tableName, path=None, source=None, 
schema=None, **options):
-"""Creates an external table based on the dataset in a data source.
+"""Creates a table based on the dataset in a data source.
+
+It returns the DataFrame associated with the external table.
+
+The data source is specified by the ``source`` and a set of 
``options``.
+If ``source`` is not specified, the default data source configured 
by
+``spark.sql.sources.default`` will be used.
+
+Optionally, a schema can be provided as the schema of the returned 
:class:`DataFrame` and
+created external table.
+
+:return: :class:`DataFrame`
+"""
+warnings.warn(
+"createExternalTable is deprecated since Spark 2.2, please use 
createTable instead.",
+DeprecationWarning)
+return self.createTable(tableName, path, source, schema, **options)
--- End diff --

`**options` -> `options`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...

2017-01-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16528#discussion_r95505435
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala ---
@@ -187,82 +189,169 @@ abstract class Catalog {
   def functionExists(dbName: String, functionName: String): Boolean
 
   /**
-   * :: Experimental ::
-   * Creates an external table from the given path and returns the 
corresponding DataFrame.
+   * Creates a table from the given path and returns the corresponding 
DataFrame.
* It will use the default data source configured by 
spark.sql.sources.default.
*
* @since 2.0.0
*/
+  @deprecated("use createTable instead.", "2.2.0")
+  def createExternalTable(tableName: String, path: String): DataFrame = {
+createTable(tableName, path)
+  }
+
+  /**
+   * :: Experimental ::
+   * Creates a table from the given path and returns the corresponding 
DataFrame.
--- End diff --

we already documented this in programming guide. If we wanna explain the 
custom path here, there are a lot of similar places we need to add comments 
too. So I'd like to leave it as it was.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...

2017-01-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16528#discussion_r95431635
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala ---
@@ -187,82 +189,169 @@ abstract class Catalog {
   def functionExists(dbName: String, functionName: String): Boolean
 
   /**
-   * :: Experimental ::
-   * Creates an external table from the given path and returns the 
corresponding DataFrame.
+   * Creates a table from the given path and returns the corresponding 
DataFrame.
* It will use the default data source configured by 
spark.sql.sources.default.
*
* @since 2.0.0
*/
+  @deprecated("use createTable instead.", "2.2.0")
+  def createExternalTable(tableName: String, path: String): DataFrame = {
+createTable(tableName, path)
+  }
+
+  /**
+   * :: Experimental ::
+   * Creates a table from the given path and returns the corresponding 
DataFrame.
--- End diff --

Maybe we can explain here to say the data files will not be dropped when 
dropping the table?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...

2017-01-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16528#discussion_r95355659
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 ---
@@ -71,15 +71,6 @@ case class CreateDataSourceTableCommand(table: 
CatalogTable, ignoreIfExists: Boo
 options = table.storage.properties ++ pathOption,
 catalogTable = Some(tableWithDefaultOptions)).resolveRelation()
 
-dataSource match {
-  case fs: HadoopFsRelation =>
-if (table.tableType == CatalogTableType.EXTERNAL && 
fs.location.rootPaths.isEmpty) {
-  throw new AnalysisException(
-"Cannot create a file-based external data source table without 
path")
--- End diff --

We will never hit this branch after this PR. There is no public API to set 
the table type and users can only set custom table path, so we will never 
create an external table without path


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...

2017-01-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16528#discussion_r95355325
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala
 ---
@@ -131,17 +131,15 @@ object GenerateOrdering extends 
CodeGenerator[Seq[SortOrder], Ordering[InternalR
   return 0;
 """
   },
-  foldFunctions = { funCalls =>
-funCalls.zipWithIndex.map { case (funCall, i) =>
-  val comp = ctx.freshName("comp")
-  s"""
-int $comp = $funCall;
-if ($comp != 0) {
-  return $comp;
-}
-  """
-}.mkString
-  })
+  foldFunctions = _.map { funCall =>
--- End diff --

minor code cleanup, not related to this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...

2017-01-10 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/16528

[SPARK-19148][SQL] do not expose the external table concept in Catalog

## What changes were proposed in this pull request?

In https://github.com/apache/spark/pull/16296 , we reached a consensus that 
we should hide the external/managed table concept to users and only expose 
custom table path.

This PR renames `Catalog.createExternalTable` to `createTable`(still keep 
the old versions for backward compatibility), and only set the table type to 
EXTERNAL if `path` is specified in options.

## How was this patch tested?

new tests in `CatalogSuite`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark create-table

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16528.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16528


commit f06ed16e184117ce7b748a53abc4209215df3699
Author: Wenchen Fan 
Date:   2017-01-10T12:29:36Z

do not expose the external table concept in Catalog




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org