[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16528 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16528#discussion_r96314156 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala --- @@ -131,17 +131,15 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR return 0; """ }, - foldFunctions = { funCalls => -funCalls.zipWithIndex.map { case (funCall, i) => - val comp = ctx.freshName("comp") - s""" -int $comp = $funCall; -if ($comp != 0) { - return $comp; -} - """ -}.mkString - }) + foldFunctions = _.map { funCall => --- End diff -- How about we revert this change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16528#discussion_r96108967 --- Diff: python/pyspark/sql/catalog.py --- @@ -138,7 +139,27 @@ def listColumns(self, tableName, dbName=None): @since(2.0) def createExternalTable(self, tableName, path=None, source=None, schema=None, **options): -"""Creates an external table based on the dataset in a data source. +"""Creates a table based on the dataset in a data source. + +It returns the DataFrame associated with the external table. + +The data source is specified by the ``source`` and a set of ``options``. +If ``source`` is not specified, the default data source configured by +``spark.sql.sources.default`` will be used. + +Optionally, a schema can be provided as the schema of the returned :class:`DataFrame` and +created external table. + +:return: :class:`DataFrame` +""" +warnings.warn( +"createExternalTable is deprecated since Spark 2.2, please use createTable instead.", +DeprecationWarning) +return self.createTable(tableName, path, source, schema, **options) --- End diff -- Yeah. Got it. I also manually tried it in `pyspark`. It works fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16528#discussion_r96108357 --- Diff: python/pyspark/sql/catalog.py --- @@ -138,7 +139,27 @@ def listColumns(self, tableName, dbName=None): @since(2.0) def createExternalTable(self, tableName, path=None, source=None, schema=None, **options): -"""Creates an external table based on the dataset in a data source. +"""Creates a table based on the dataset in a data source. + +It returns the DataFrame associated with the external table. + +The data source is specified by the ``source`` and a set of ``options``. +If ``source`` is not specified, the default data source configured by +``spark.sql.sources.default`` will be used. + +Optionally, a schema can be provided as the schema of the returned :class:`DataFrame` and +created external table. + +:return: :class:`DataFrame` +""" +warnings.warn( +"createExternalTable is deprecated since Spark 2.2, please use createTable instead.", +DeprecationWarning) +return self.createTable(tableName, path, source, schema, **options) --- End diff -- it's python syntax, like what we do in scala: `func(options: _*)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16528#discussion_r96033171 --- Diff: python/pyspark/sql/catalog.py --- @@ -138,7 +139,27 @@ def listColumns(self, tableName, dbName=None): @since(2.0) def createExternalTable(self, tableName, path=None, source=None, schema=None, **options): -"""Creates an external table based on the dataset in a data source. +"""Creates a table based on the dataset in a data source. + +It returns the DataFrame associated with the external table. + +The data source is specified by the ``source`` and a set of ``options``. +If ``source`` is not specified, the default data source configured by +``spark.sql.sources.default`` will be used. + +Optionally, a schema can be provided as the schema of the returned :class:`DataFrame` and +created external table. + +:return: :class:`DataFrame` +""" +warnings.warn( +"createExternalTable is deprecated since Spark 2.2, please use createTable instead.", +DeprecationWarning) +return self.createTable(tableName, path, source, schema, **options) --- End diff -- `**options` -> `options`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16528#discussion_r95505435 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala --- @@ -187,82 +189,169 @@ abstract class Catalog { def functionExists(dbName: String, functionName: String): Boolean /** - * :: Experimental :: - * Creates an external table from the given path and returns the corresponding DataFrame. + * Creates a table from the given path and returns the corresponding DataFrame. * It will use the default data source configured by spark.sql.sources.default. * * @since 2.0.0 */ + @deprecated("use createTable instead.", "2.2.0") + def createExternalTable(tableName: String, path: String): DataFrame = { +createTable(tableName, path) + } + + /** + * :: Experimental :: + * Creates a table from the given path and returns the corresponding DataFrame. --- End diff -- we already documented this in programming guide. If we wanna explain the custom path here, there are a lot of similar places we need to add comments too. So I'd like to leave it as it was. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16528#discussion_r95431635 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala --- @@ -187,82 +189,169 @@ abstract class Catalog { def functionExists(dbName: String, functionName: String): Boolean /** - * :: Experimental :: - * Creates an external table from the given path and returns the corresponding DataFrame. + * Creates a table from the given path and returns the corresponding DataFrame. * It will use the default data source configured by spark.sql.sources.default. * * @since 2.0.0 */ + @deprecated("use createTable instead.", "2.2.0") + def createExternalTable(tableName: String, path: String): DataFrame = { +createTable(tableName, path) + } + + /** + * :: Experimental :: + * Creates a table from the given path and returns the corresponding DataFrame. --- End diff -- Maybe we can explain here to say the data files will not be dropped when dropping the table? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16528#discussion_r95355659 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -71,15 +71,6 @@ case class CreateDataSourceTableCommand(table: CatalogTable, ignoreIfExists: Boo options = table.storage.properties ++ pathOption, catalogTable = Some(tableWithDefaultOptions)).resolveRelation() -dataSource match { - case fs: HadoopFsRelation => -if (table.tableType == CatalogTableType.EXTERNAL && fs.location.rootPaths.isEmpty) { - throw new AnalysisException( -"Cannot create a file-based external data source table without path") --- End diff -- We will never hit this branch after this PR. There is no public API to set the table type and users can only set custom table path, so we will never create an external table without path --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16528#discussion_r95355325 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala --- @@ -131,17 +131,15 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR return 0; """ }, - foldFunctions = { funCalls => -funCalls.zipWithIndex.map { case (funCall, i) => - val comp = ctx.freshName("comp") - s""" -int $comp = $funCall; -if ($comp != 0) { - return $comp; -} - """ -}.mkString - }) + foldFunctions = _.map { funCall => --- End diff -- minor code cleanup, not related to this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/16528 [SPARK-19148][SQL] do not expose the external table concept in Catalog ## What changes were proposed in this pull request? In https://github.com/apache/spark/pull/16296 , we reached a consensus that we should hide the external/managed table concept to users and only expose custom table path. This PR renames `Catalog.createExternalTable` to `createTable`(still keep the old versions for backward compatibility), and only set the table type to EXTERNAL if `path` is specified in options. ## How was this patch tested? new tests in `CatalogSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark create-table Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16528.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16528 commit f06ed16e184117ce7b748a53abc4209215df3699 Author: Wenchen FanDate: 2017-01-10T12:29:36Z do not expose the external table concept in Catalog --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org