[GitHub] spark issue #22813: [SPARK-25818][CORE] WorkDirCleanup should only remove th...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/22813 @srowen Your suggestion is very good, but sometimes the maintenance engineers have limited skills in this area. If they configures the operating system root directory as `SPARK_WORK_DIR` due to disk damage, it will bring a catastrophic accident. So, I think it is necessary to add this condition to avoid this production accidents. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22876: [SPARK-25869] [YARN] the original diagnostics is missing...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/22876 @srowen CC --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22813: [SPARK-25818][CORE] WorkDirCleanup should only remove th...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/22813 @dongjoon-hyun Yes, you can think so. So I want to solve this problem on the spark platform to reduce the risk of some misoperations of operation and maintenance engineer. `WorkDirCleanUp `is only responsible for cleaning up the directories that it generates. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22813: [SPARK-25818][CORE] WorkDirCleanup should only remove th...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/22813 As far as I know, when a spark program is submitted to the cluster, a directory will be created under `SPARK_WORK_DIR`. The directory name consists of application, timestamp, and five-digit serial number. `WorkDirCleanUp `should only delete expired application directories. @srowen Could you tell me if there are other types of directories or files being created under `SPARK_WORK_DIR`? In addition to the application directory. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22813: [SPARK-25818][CORE] WorkDirCleanup should only remove th...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/22813 Yes, it happened in our real environment. The scenario as follows: Some disk corruption in the production cluster which is normal. SPARK_LOCAL_DIRS = /data1/bigdata/spark/tmp, disk `data1 `is broken, the maintenance engineer modified `data1 `to `data2 ` or another. Unfortunately. the config SPARK_WORK_DIR = /data2/bigdata/spark/tmp, and then we start a `Thriftserver` process, some temporay folder will be created at the new config path which is the same as SPARK_WORK_DIR, but when the cleanup cycle time is reached, the folder created by `Thriftserver` will be removed by `WorkDirCleanUP`, so it will cause the Beeline and JDBC query to fail. There is a very extreme situation that the user configures the operating system directory, which will cause a lot of trouble. So i think add this condition could reduce some unnecessary risks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22813: [SPARK-25818][CORE] WorkDirCleanup should only re...
GitHub user ouyangxiaochen opened a pull request: https://github.com/apache/spark/pull/22813 [SPARK-25818][CORE] WorkDirCleanup should only remove the directory at the beginning of t⦠## What changes were proposed in this pull request? The cleanup mechanism will clear all the eligible directories under SPARK_WORK_DIR. If the other configured paths are the same as the SPARK_WORK_DIR configuration, this will cause the file directories of other configuration items to be deleted by mistake. For example, the SPARK_LOCAL_DIRS and SPARK_WORK_DIR settings are the same. We should add an another condition which start with 'app-' when removing the app-* directories in SPARK_WORK_DIR ## How was this patch tested? manual test You can merge this pull request into a Git repository by running: $ git pull https://github.com/ouyangxiaochen/spark SPARK-25818 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22813.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22813 commit cf8b30f64f8855ea1574fa47955a7ce42c7d0703 Author: ouyangxiaochen Date: 2018-10-24T06:48:24Z WorkDirCleanup should only remove the directory at the beginning of the app- --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22220: [SPARK-25229][SQL] Partition fields are uniformly...
Github user ouyangxiaochen closed the pull request at: https://github.com/apache/spark/pull/0 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22220: [SPARK-25229][SQL] Partition fields are uniformly...
GitHub user ouyangxiaochen opened a pull request: https://github.com/apache/spark/pull/0 [SPARK-25229][SQL] Partition fields are uniformly processed into lowercase ## What changes were proposed in this pull request? `scala> spark.version res0: String = 2.3.0 scala> spark.sql("create table t(id int,name string) partitioned by(aA string)") res1: org.apache.spark.sql.DataFrame = [] scala> spark.sql("insert into table t values(1,'Donahue','US')") res2: org.apache.spark.sql.DataFrame = [] scala> spark.sql("select id,name from t where aA = 'US'").show(1) org.apache.spark.sql.AnalysisException: Expected only partition pruning predicates: List(isnotnull(aA#25), (aA#25 = US)); at org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:145) at org.apache.spark.sql.hive.MetastoreRelation.getHiveQlPartitions(MetastoreRelation.scala:158) at org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$10.apply(HiveTableScanExec.scala:151) at org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$10.apply(HiveTableScanExec.scala:150) at org.apache.spark.util.Utils$.withDummyCallSite(Utils.scala:2393) at org.apache.spark.sql.hive.execution.HiveTableScanExec.doExecute(HiveTableScanExec.scala:149) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:240) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:323) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:39) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2194) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2547) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2193) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2200) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1936) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1935) at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2577) at org.apache.spark.sql.Dataset.head(Dataset.scala:1935) at org.apache.spark.sql.Dataset.take(Dataset.scala:2150) at org.apache.spark.sql.Dataset.showString(Dataset.scala:240) at org.apache.spark.sql.Dataset.show(Dataset.scala:527) at org.apache.spark.sql.Dataset.show(Dataset.scala:487) ... 48 elided` ## How was this patch tested? update existing test case Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ouyangxiaochen/spark SPARK-CaseSensitive Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/0.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #0 commit 426dd2e8c9aba6bff15cc5d4c1755730bc378fca Author: ouyangxiaochen Date: 2018-08-24T13:15:49Z ExternalCatalogUtils.prunePartitionsByFilter throw an AnalysisException when partition name contains upper letter --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21881: [SPARK-24930][SQL] Improve exception information ...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/21881#discussion_r206377644 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -337,7 +337,11 @@ case class LoadDataCommand( new File(file.getAbsolutePath).exists() } if (!exists) { - throw new AnalysisException(s"LOAD DATA input path does not exist: $path") + // If user have no permission to access the given input path, `File.exists()` return false + // , `LOAD DATA input path does not exist` can confuse users. + throw new AnalysisException(s"LOAD DATA input path does not exist: `$path` or current " + --- End diff -- OK, Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21881: [SPARK-24930][SQL] Improve exception information when us...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/21881 @gatorsmile Hi, i am not sure how to build this scene in test case, just assert if the exception info contains the key message `have no permission to access the input path`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21881: [SPARK-24930][SQL] Improve exception information ...
GitHub user ouyangxiaochen opened a pull request: https://github.com/apache/spark/pull/21881 [SPARK-24930][SQL] Improve exception information when using LOAD DATA LOCAL INPATH ## What changes were proposed in this pull request? ## How was this patch tested? existing test case Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ouyangxiaochen/spark SPARK-24930 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21881.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21881 commit 0eb648cb981d5e2c8bb988d4d1948e8199cdba90 Author: ouyangxiaochen Date: 2018-07-26T09:10:24Z Improve exception information when using LOAD DATA LOCAL INPATH --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19380: [SPARK-22157] [SQL] The uniux_timestamp method handles t...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/19380 In fact, there are many scenarios that need to be accurate to milliseconds, should we try to solve this problem together? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19380: [SPARK-22157] [SQL] The uniux_timestamp method handles t...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/19380 Since the RDMS keep the milliseconds, we should follow it. This proposal LGTM. @gatorsmile CC --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17681: [SPARK-20383][SQL] Supporting Create [temporary] ...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/17681#discussion_r125651028 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -1132,6 +1132,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat client.dropFunction(db, name) } + override protected def doAlterFunction( + db: String, funcDefinition: CatalogFunction): Unit = withClient { +requireDbExists(db) +val functionName = funcDefinition.identifier.funcName.toLowerCase(Locale.ROOT) --- End diff -- LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17681: [SPARK-20383][SQL] Supporting Create [temporary] ...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/17681#discussion_r125575412 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala --- @@ -46,26 +46,47 @@ case class CreateFunctionCommand( functionName: String, className: String, resources: Seq[FunctionResource], -isTemp: Boolean) +isTemp: Boolean, +ifNotExists: Boolean, +replace: Boolean) extends RunnableCommand { + if (ifNotExists && replace) { +throw new AnalysisException("CREATE FUNCTION with both IF NOT EXISTS and REPLACE" + + " is not allowed.") + } + + // Disallows 'CREATE TEMPORARY FUNCTION IF NOT EXISTS' to be consistent + // with 'CREATE TEMPORARY FUNCTION' --- End diff -- OK. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17681: [SPARK-20383][SQL] Supporting Create [temporary] ...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/17681#discussion_r125567012 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala --- @@ -46,26 +46,47 @@ case class CreateFunctionCommand( functionName: String, className: String, resources: Seq[FunctionResource], -isTemp: Boolean) +isTemp: Boolean, +ifNotExists: Boolean, +replace: Boolean) extends RunnableCommand { + if (ifNotExists && replace) { +throw new AnalysisException("CREATE FUNCTION with both IF NOT EXISTS and REPLACE" + + " is not allowed.") + } + + // Disallows 'CREATE TEMPORARY FUNCTION IF NOT EXISTS' to be consistent + // with 'CREATE TEMPORARY FUNCTION' --- End diff -- @cloud-fan The logic of the synax `CREATE TEMPORAY FUNCTION IF NOT EXISTS` like this 1. If the function is already exists, we do nothing not override. 2. If hte function is not exists , we create a new one. The logic of the synax `CREATE OR REPLACE TEMPORAY FUNCTION` like this 1. If the function is already exists, override it. 2. If hte function is not exists , we create a new one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17681: [SPARK-20383][SQL] Supporting Create [temporary] ...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/17681#discussion_r125564468 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala --- @@ -46,26 +46,47 @@ case class CreateFunctionCommand( functionName: String, className: String, resources: Seq[FunctionResource], -isTemp: Boolean) +isTemp: Boolean, +ifNotExists: Boolean, +replace: Boolean) extends RunnableCommand { + if (ifNotExists && replace) { +throw new AnalysisException("CREATE FUNCTION with both IF NOT EXISTS and REPLACE" + + " is not allowed.") + } + + // Disallows 'CREATE TEMPORARY FUNCTION IF NOT EXISTS' to be consistent + // with 'CREATE TEMPORARY FUNCTION' --- End diff -- Ah, do you mean that the value of the `overrideIfExists `parameter can be `ifnotexists `or `replace` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17681: [SPARK-20383][SQL] Supporting Create [temporary] ...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/17681#discussion_r125563990 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -1316,8 +1316,10 @@ abstract class SessionCatalogSuite extends AnalysisTest { val funcMeta2 = newFunc("yes_me", None) val tempFunc1 = (e: Seq[Expression]) => e.head val tempFunc2 = (e: Seq[Expression]) => e.last - catalog.createFunction(newFunc("func2", Some("db2")), ignoreIfExists = false) - catalog.createFunction(newFunc("not_me", Some("db2")), ignoreIfExists = false) --- End diff -- Sorry about it , I mistake it when i resolving the conflicts. I'll revert this later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18510: [SPARK-21284][SQL] rename SessionCatalog.register...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/18510#discussion_r125258324 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -1104,10 +1104,10 @@ class SessionCatalog( */ def registerFunction( --- End diff -- OK, I will do this change after your PR is merged into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17681: [SPARK-20383][SQL] Supporting Create [temporary] ...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/17681#discussion_r125253313 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -590,6 +590,14 @@ class InMemoryCatalog( catalog(db).functions.remove(funcName) } + override protected def doAlterFunction(db: String, func: CatalogFunction): Unit = synchronized { +requireDbExists(db) +requireFunctionExists(db, func.identifier.funcName) +catalog(db).functions.remove(func.identifier.funcName) +requireFunctionNotExists(db, func.identifier.funcName) --- End diff -- Ah, there is really no need to do this. OK. remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17681: [SPARK-20383][SQL] Supporting Create [temporary] ...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/17681#discussion_r125250502 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -1056,6 +1056,27 @@ class SessionCatalog( } /** + * overwirte a metastore function in the database specified in `funcDefinition`.. + * If no database is specified, assume the function is in the current database. + */ + def alterFunction(funcDefinition: CatalogFunction): Unit = { +val db = formatDatabaseName(funcDefinition.identifier.database.getOrElse(getCurrentDatabase)) +requireDbExists(db) +val identifier = FunctionIdentifier(funcDefinition.identifier.funcName, Some(db)) +val newFuncDefinition = funcDefinition.copy(identifier = identifier) +if (functionExists(identifier)) { + if (functionRegistry.functionExists(identifier)) { +// If we have loaded this function into the FunctionRegistry, +// also drop it from there. +// For a permanent function, because we loaded it to the FunctionRegistry +// when it's first used, we also need to drop it from the FunctionRegistry. +functionRegistry.dropFunction(identifier) + } + externalCatalog.alterFunction(db, newFuncDefinition) +} --- End diff -- yes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 @gatorsmile @cloud-fan @HyukjinKwon Could you review this PR when you have sometime? Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 @felixcheung Thank you very much! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 @dongjoon-hyun Thank uï¼ your suspicions about the impl of `InMemoryCatalog `are correct. If we add alterFunction test case in ExternalCatalogSuite that we should impl this in both InMemoryCatalog and HiveExternalCatalogï¼otherwise the test case will be fail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 @SparkQA Please rebuild it , Jerkins encountered some problems, all test cases run successfully. @gatorsmile CC --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17681: [SPARK-20383][SQL] Supporting Create [temporary] ...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/17681#discussion_r124490206 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -590,6 +590,14 @@ class InMemoryCatalog( catalog(db).functions.remove(funcName) } + override protected def doAlterFunction(db: String, func: CatalogFunction): Unit = synchronized { +requireDbExists(db) +requireFunctionExists(db, func.identifier.funcName) +catalog(db).functions.remove(func.identifier.funcName) +requireFunctionNotExists(db, func.identifier.funcName) --- End diff -- Hi @gatorsmile if we add `alterFunction ` test case in `ExternalCatalogSuite` that we should impl this in both `InMemoryCatalog` and `HiveExternalCatalog`ï¼otherwise the test case will be fail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 @SparkQA why failed, retest it please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17681: [SPARK-20383][SQL] Supporting Create [temporary] ...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/17681#discussion_r124442925 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -590,6 +590,10 @@ class InMemoryCatalog( catalog(db).functions.remove(funcName) } + override protected def doAlterFunction(db: String, func: CatalogFunction): Unit = synchronized { +// do nothing --- End diff -- /** * An in-memory (ephemeral) implementation of the system catalog. * * This is a dummy implementation that does not require setting up external systems. * It is intended for testing or exploration purposes only and should not be used * in production. * * All public methods should be synchronized for thread-safety. */ This is the comments in `InMemoryCatalog`, so i think there is no need to impl this. @dongjoon-hyun --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17681: [SPARK-20383][SQL] Supporting Create [temporary] ...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/17681#discussion_r124435476 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -590,6 +590,10 @@ class InMemoryCatalog( catalog(db).functions.remove(funcName) } + override protected def doAlterFunction(db: String, func: CatalogFunction): Unit = synchronized { +// do nothing --- End diff -- I implement this in my first commit, but gatorsmile suggest me that there is no need to impl this, So i'm not sure whether should we impl this in `ImemoryCatalog` or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17681: [SPARK-20383][SQL] Supporting Create [temporary] ...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/17681#discussion_r124429467 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/events.scala --- @@ -140,6 +140,16 @@ case class DropFunctionPreEvent(database: String, name: String) extends Function case class DropFunctionEvent(database: String, name: String) extends FunctionEvent /** + * Event fired before a function is altered. + */ +case class AlterFunctionPreEvent(database: String, name: String) extends FunctionEvent + +/** + * Event fired after a function has been dropped. --- End diff -- yes, I am sorry about it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 OK, I fixed the error in the test case, retest it please, Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 I found only one test case which name is `create temporary function with if not exists` , but Jenkins throw me an error `org.scalatest.exceptions.DuplicateTestNameException: Duplicate test name: create temporary function with if not exists`. Could u give me a help? @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17681: [SPARK-20383][SQL] Supporting Create [temporary] ...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/17681#discussion_r124183913 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala --- @@ -46,27 +46,53 @@ case class CreateFunctionCommand( functionName: String, className: String, resources: Seq[FunctionResource], -isTemp: Boolean) +isTemp: Boolean, +ifNotExists: Boolean, +replace: Boolean) extends RunnableCommand { + if (ifNotExists && replace) { +throw new AnalysisException("CREATE FUNCTION with both IF NOT EXISTS and REPLACE" + + " is not allowed.") + } + + // Disallows 'CREATE TEMPORARY FUNCTION IF NOT EXISTS' to be consistent + // with 'CREATE TEMPORARY FUNCTION' + if (ifNotExists && isTemp) { +throw new AnalysisException( + "It is not allowed to define a TEMPORARY function with IF NOT EXISTS.") + } + + // Temporary function names should not contain database prefix like "database.function" + if (databaseName.isDefined && isTemp) { +throw new AnalysisException(s"Specifying a database in CREATE TEMPORARY FUNCTION " + + s"is not allowed: '${databaseName.get}'") + } + override def run(sparkSession: SparkSession): Seq[Row] = { val catalog = sparkSession.sessionState.catalog -val func = CatalogFunction(FunctionIdentifier(functionName, databaseName), className, resources) +val func = FunctionIdentifier(functionName, databaseName) if (isTemp) { - if (databaseName.isDefined) { -throw new AnalysisException(s"Specifying a database in CREATE TEMPORARY FUNCTION " + - s"is not allowed: '${databaseName.get}'") - } // We first load resources and then put the builder in the function registry. // Please note that it is allowed to overwrite an existing temp function. catalog.loadFunctionResources(resources) - catalog.registerFunction(func, ignoreIfExists = false) + // Handles `CREATE OR REPLACE TEMPORARY FUNCTION AS ... USING ...` + // We drop the temp function in FunctionRegistry firstly and then create a new one + if (replace && catalog.functionExists(func)) { +catalog.dropTempFunction(functionName, ignoreIfNotExists = true) + } + catalog.registerFunction(CatalogFunction(func, className, resources), ignoreIfExists = false) } else { - // For a permanent, we will store the metadata into underlying external catalog. - // This function will be loaded into the FunctionRegistry when a query uses it. - // We do not load it into FunctionRegistry right now. - // TODO: should we also parse "IF NOT EXISTS"? - catalog.createFunction(func, ignoreIfExists = false) + // Handles `CREATE OR REPLACE FUNCTION AS ... USING ...` + if (replace && catalog.functionExists(func)) { +// alter the function in the metastore +catalog.alterFunction(CatalogFunction(func, className, resources)) --- End diff -- I added the test case in `SessionCatalogSuite` at line 1230. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 retest this please,Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 Jenkins, please build and run test. @gatorsmile review the added test cases,Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 please review it, Thanks @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 OK, I'll work on this PR after your PR #18142 is merged into master. @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 Sorry, I have not found the impl of this PR in #18142 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17681: [SPARK-20383][SQL] Supporting Create [temporary] ...
GitHub user ouyangxiaochen reopened a pull request: https://github.com/apache/spark/pull/17681 [SPARK-20383][SQL] Supporting Create [temporary] Function with the keyword 'IF NOT EXISTS' ## What changes were proposed in this pull request? support to create [temporary] function with the keyword 'IF NOT EXISTS' ## How was this patch tested? manual test and added test case Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ouyangxiaochen/spark spark-419 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17681.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17681 commit 35139e92c6c2c941d7c9c089e8aa889bfad1058c Author: ouyangxiaochen <ou.yangxiaoc...@zte.com.cn> Date: 2017-04-19T07:23:08Z support to create function with keyword âIF NOT EXISTSâ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17681: [SPARK-20383][SQL] Supporting Create [temporary] ...
Github user ouyangxiaochen closed the pull request at: https://github.com/apache/spark/pull/17681 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 I am sorry about it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 Sorry, I have not found the impl of this PR in #18142, so it could not be closed. At least the file sqlbase.g4 was not modified. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 Since the issue is resolved, i'll close this PR later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 OK. After that, leave me a message, i will work on this. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 The new synax of creating function like this `CREATE (OR REPLACE)? TEMPORARY? FUNCTION (IF NOT EXISTS)? qualifiedName AS className=STRING (USING resource (',' resource)*)? ` OK, let's combing the logic : 1. creating the function with both `IF NOT EXISTS` and `REPLACE` is not allowed. 2. It is not allowed to define a `TEMPORARY `function with `IF NOT EXISTS`. 3. specifying a database in `CREATE TEMPORARY FUNCTION `is not allowed. 4. creating the function which is exists in hivemetastore and functionRegistry: a. created only with `IF NOT EXISTS`, we will override the function. b. create only with `OR REPLACE`,we will create new one. @gatorsmile @cloud-fan CC and give me some suggestion, Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 It seems more reasonable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 The synax of DBMS like this` CREATE OR REPLACE FUNCTION `, it means that if function exists we will replace it, otherwise create one. But i think we'd better not destroy the original permanent function in Spark. The purpose of supporting this keyword is that user can choose a way to avoid this exception `FunctionAlreadyExistsException` . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 I think it is necessary to support this key word in 'create [temporary] fucntion ...' clause, For example: I hava an application which has three steps as follow : 1. insert into table t1 as select id from t where ... 2. create function AA as ... using jar ... 3. insert into table t2 select AA(id) from t1 my application run succussfully first time and it will throw an exeception at step 2 when it run again. So, I think it is reasonable and friendly to support this keyword. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 @gatorsmile CC is it reasonable? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17681: [SPARK-20383][SQL] Supporting Create [temporary] ...
GitHub user ouyangxiaochen opened a pull request: https://github.com/apache/spark/pull/17681 [SPARK-20383][SQL] Supporting Create [temporary] Function with the keyword 'IF NOT EXISTS' ## What changes were proposed in this pull request? support to create [temporary] fucntion with the keyword 'IF NOT EXISTS' ## How was this patch tested? manual test and added test case Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ouyangxiaochen/spark spark-419 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17681.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17681 commit 4a05538ac1c770b77b5b49b32edd74b3928fe47f Author: ouyangxiaochen <ou.yangxiaoc...@zte.com.cn> Date: 2017-04-19T07:23:08Z support to create function with keyword âIF NOT EXISTSâ commit 98eb520c23d9282f5cc95de9c650d3498b2ab663 Author: ouyangxiaochen <ou.yangxiaoc...@zte.com.cn> Date: 2017-04-19T07:33:56Z dd commit d77ff39323e7477e5a440f737e8d81b846beb89f Author: ouyangxiaochen <ou.yangxiaoc...@zte.com.cn> Date: 2017-04-19T07:41:21Z dd --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17628: [SPARK-20316][SQL] Val and Var should strictly follow th...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17628 OK, I got it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17628: [SPARK-20316][SQL] Val and Var should strictly follow th...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17628 @srowen Do u mean that my regex can not match all variable name? because scala's variable name can be consisted with letter|digital|_|$? I serached the instances of `var foo="string" `with new regex `\s*var [a-z|A-Z|$|_]* = "*"` and found 33 of them. Maybe really there are no other instances. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17628: [SPARK-20316][SQL] Variable and Method should strictly f...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17628 I serached the instance of var foo = "string" with the RegEx ` \s*var [a-z|A-Z]* = "*"` in whole project and there are 31 of them. The usage of them are almost reasonable except the file `SparkSQLCLIDriver`. @srowen cc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17628: [SPARK-20316][SQL] Variable and Method should strictly f...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17628 OK, I will work on this again, Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17628: [SPARK-20316][SQL] Variable and Method should strictly f...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17628 I tried to serach the instance of `var foo = "string"` with the RegEx `var [a-z|A-Z] = "*` in whole project and there are 740 of them. The usage of them are almost reasonable. As @viirya commented above, this is very old code. maybe no committer attention it . @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17628: [SPARK-20316][SQL] Variable and Method should strictly f...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17628 This is a very heavy task. So far, I found a problem with this when I was reading SparkSQL source code. So, whether we need to merge to the master or not? @srowen @viirya As for the keyword "return", I found that there are so many scala methods use this keyword. So, this problem can be reverted to the origin. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17629: [SPARK-20317] [REPL] Removing 'return' keyword in...
Github user ouyangxiaochen closed the pull request at: https://github.com/apache/spark/pull/17629 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17628: [SPARK-20316][SQL] val and var should strictly follow th...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17628 OK, I'll merge such changes together into this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17629: [SPARK-20317] [REPL] Removing 'return' keyword in...
GitHub user ouyangxiaochen opened a pull request: https://github.com/apache/spark/pull/17629 [SPARK-20317] [REPL] Removing 'return' keyword in Scala method ## What changes were proposed in this pull request? Make the code cleaner and remove the redundant code ## How was this patch tested? manual test and exsiting test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/ouyangxiaochen/spark spark-413-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17629.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17629 commit 421709a9c38075e0fbe0c7865da76f8b308f2e82 Author: ouyangxiaochen <ou.yangxiaoc...@zte.com.cn> Date: 2017-04-13T06:18:54Z use of the 'return' keyword is redundant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17628: [SPARK-20316][SQL] val and var should strictly fo...
GitHub user ouyangxiaochen opened a pull request: https://github.com/apache/spark/pull/17628 [SPARK-20316][SQL] val and var should strictly follow the Scala syntax What changes were proposed in this pull request? val and var should strictly follow the Scala syntax How was this patch tested? manual test and exisiting test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/ouyangxiaochen/spark spark-413 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17628.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17628 commit 32b12a031b56b07b47d5ffb728038f82eb0cd597 Author: ouyangxiaochen <ou.yangxiaoc...@zte.com.cn> Date: 2017-04-13T05:03:50Z val and var should strictly follow the Scala syntax --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17270: [SPARK-19929] [SQL] Showing Hive Managed table's ...
Github user ouyangxiaochen closed the pull request at: https://github.com/apache/spark/pull/17270 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17270: [SPARK-19929] [SQL] Showing Hive Managed table's LOATION...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17270 Yeah, U are right. In Hive. the table is created by specifing LOCATION that will be a MANAGED table. In Spark2.x. the table is created by specifing LOCATION that will be an EXTERNAL table. So, Hiveâs logical and Spark2.x's logical are reasonable. Please close the issue, Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17270: [SPARK-19929] [SQL] Showing Hive Managed table's LOATION...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17270 @gatorsmile cc ,is it reasonable? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17270: [SPARK-19929] [SQL] Showing Hive Managed table's LOATION...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17270 As an application developer we want to know hive table's location, we can use the command 'show create table ...' to get the table's data location when the hive table is EXTERNAL but MANAGED. So, I think both hive EXTERNAL table and hive MANAGED table should show the LOACATION property, it can be conveniently and friendly for the user. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17270: [SPARK-19929] [SQL] Showing Hive Managed table's ...
GitHub user ouyangxiaochen opened a pull request: https://github.com/apache/spark/pull/17270 [SPARK-19929] [SQL] Showing Hive Managed table's LOATION when Using 'show create table ...' ## What changes were proposed in this pull request? Show Hive table's LOCATION property When we using the command 'show create table ...' on Hive MANAGED table ## How was this patch tested? manual test You can merge this pull request into a Git repository by running: $ git pull https://github.com/ouyangxiaochen/spark spark-19929 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17270.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17270 commit cf5bf30bea8ec2c6e760abce84022616df63951b Author: 欧é³ææ¨10208179 <ou.yangxiaoc...@zte.com.cn> Date: 2017-03-10T03:12:42Z fvff commit f8396b67c220158b1bb444748fcd2ade27ccb369 Author: ouyangxiaochen <ou.yangxiaoc...@zte.com.cn> Date: 2017-03-13T02:40:49Z Hive EXTERNAL table and Hive Managed table should show LOCATION properties --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16868 Very thoughtful consideration. Thanks for your explanation and suggestion! @tejasapatil what do you think? @gatorsmile @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16868 Very serious consideration. Thanks for your explanation and suggestion! what do you think? @gatorsmile @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16868 Do you mean that we don't need to do check whether the targetTable.storage.locationUri is the same with sourceTable.storage.locationUri or not ? @tejasapatil --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16868 In @tejasapatil's comment, Whether we need to be exactly the same as Hive? @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16868 I think @tejasapatil's suggestion is reasonable, because the location is specified by users, So the sourceTable.storage.locationUri and targetTable.storage.locationUri can be same or different, Whether we need to be exactly the same as Hive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16868 I think there is no need to do this validation, because the location is specified by users, So the targetTable.storage.lcaotionUri and sourceTable.storage.locationUri can be same or different. @tejasapatil --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16868: [SPARK-19115] [SQL] Supporting Create External Ta...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/16868#discussion_r100712792 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -969,15 +1053,19 @@ class HiveDDLSuite val targetTable = spark.sessionState.catalog.getTableMetadata( TableIdentifier(targetTabName, Some("default"))) -checkCreateTableLike(sourceView, targetTable) +checkCreateTableLike(sourceView, targetTable, tableType) } } } - private def checkCreateTableLike(sourceTable: CatalogTable, targetTable: CatalogTable): Unit = { -// The created table should be a MANAGED table with empty view text and original text. -assert(targetTable.tableType == CatalogTableType.MANAGED, - "the created table must be a Hive managed table") + private def checkCreateTableLike( +sourceTable: CatalogTable, +targetTable: CatalogTable, +tableType: CatalogTableType): Unit = { +// The created table should be a MANAGED table or EXTERNAL table with empty view text +// and original text. +assert(targetTable.tableType == tableType, + s"the created table must be a Hive ${tableType.name} table") --- End diff -- Sorry for the late reply.Ok, i will follow your suggestion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16868 I have run test cases successfully. Please run the test cases again.Thanks a lot! @SparkQA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16868: [SPARK-19115] [SQL] Supporting Create External Ta...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/16868#discussion_r100450487 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -926,38 +1006,63 @@ class HiveDDLSuite |CREATE EXTERNAL TABLE $sourceTabName (key INT comment 'test', value STRING) |COMMENT 'Apache Spark' |PARTITIONED BY (ds STRING, hr STRING) - |LOCATION '$basePath' - """.stripMargin) + |LOCATION '$basePath1' + """.stripMargin) for (ds <- Seq("2008-04-08", "2008-04-09"); hr <- Seq("11", "12")) { sql( s""" |INSERT OVERWRITE TABLE $sourceTabName |partition (ds='$ds',hr='$hr') |SELECT 1, 'a' - """.stripMargin) + """.stripMargin) +} + +if ( location.isEmpty ) { + sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName") +} else { + sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName" + +s" LOCATION '${location.getOrElse("")}'") --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16868: [SPARK-19115] [SQL] Supporting Create External Ta...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/16868#discussion_r100257474 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -833,54 +833,107 @@ class HiveDDLSuite } test("CREATE TABLE LIKE a temporary view") { +// CREATE TABLE LIKE a temporary view. +withCreateTableLikeTempView(None) + +// CREATE TABLE LIKE a temporary view location ... +withTempDir {tmpDir => + withCreateTableLikeTempView(Some(tmpDir.getPath)) +} + } + + private def withCreateTableLikeTempView(location : Option[String]): Unit = { val sourceViewName = "tab1" val targetTabName = "tab2" +var createdTableType = CatalogTableType.MANAGED --- End diff -- Fine, I'll update it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 I have created a PR at [https://github.com/apache/spark/pull/16868](url), please review it, Thanks! @gatorsmile @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16868 I have created a new PR. Please review it, Thanks! @gatorsmile @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16868: [SPARK-19115] [SQL] Supporting Create External Ta...
GitHub user ouyangxiaochen opened a pull request: https://github.com/apache/spark/pull/16868 [SPARK-19115] [SQL] Supporting Create External Table Like Location What changes were proposed in this pull request? Support CREATE [EXTERNAL] TABLE LIKE LOCATION... syntax for Hive tables. In this PR,we follow SparkSQL design rules : supporting create external table like view or physical table or temporary view with location. creating an external table without location,we will throw an OpreationNotAllowed exception. creating a managed table with location,this table will be an external table other than managed table. How was this patch tested? Add new test cases and update existing test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/ouyangxiaochen/spark spark19115 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16868.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16868 commit 222180901fd65c01bd04844a548cf046f671478d Author: ouyangxiaochen <ou.yangxiaoc...@zte.com.cn> Date: 2017-02-09T07:30:21Z resolve the conflicts --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 OK. I'll try it immediately. Thank U very much! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 Oh, I See, I miss a step âgit remote add upstream ...â. But now, I have delete my repository in my profile. So this PR canât know which repository should be associated. So, do u have a method to help me cover this problem? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 Here's how I create a PR: 1.fork the master of Apache; 2.create a new branch in my master branch 3.select my new branch menu and create a new PR. 4.edit my new branch code. 5.commit and push. Can u point lost or mistake steps for me, Thank u for your guidancesï¼ @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 My master branch with the master of Apache is not synchronized, and then I did the pull operation, my master branch still not synchronized, and finally I removed my remote repository. But I do not know how to associate a new branch with this PR? I Think I made a misopreation. @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 Should I delete my local master repository firstlyï¼and fork a new one again? @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 I met some troubles when I resolving the conflict, So can u give me some guidances? Thanks a lot! @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16638: [SPARK-19115] [SQL] Supporting Create External Ta...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/16638#discussion_r99984076 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -812,150 +812,234 @@ class HiveDDLSuite } } - test("CREATE TABLE LIKE a temporary view") { -val sourceViewName = "tab1" -val targetTabName = "tab2" -withTempView(sourceViewName) { - withTable(targetTabName) { -spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) - .createTempView(sourceViewName) -sql(s"CREATE TABLE $targetTabName LIKE $sourceViewName") - -val sourceTable = spark.sessionState.catalog.getTempViewOrPermanentTableMetadata( - TableIdentifier(sourceViewName)) -val targetTable = spark.sessionState.catalog.getTableMetadata( - TableIdentifier(targetTabName, Some("default"))) - -checkCreateTableLike(sourceTable, targetTable) + test("CREATE TABLE LIKE a temporary view [LOCATION]...") { +var createdTableType = "MANAGED" +for ( i <- 0 to 1 ) { --- End diff -- I write this for the purpose of reusing this piece of public code, because the basic logic of these two scenarios are almost the same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 ping @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 Happy Chinese New Year ! @gatorsmile The Spring Festival holiday just ended, and I return to work today, what work do I need to do? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 I have fixed the error of test cases and they run successfully. So,please run the test cases again.Thanks a lot! @SparkQA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16638: [SPARK-19115] [SQL] Supporting Create External Ta...
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/16638#discussion_r97454893 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -1140,14 +1140,18 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { * * For example: * {{{ - * CREATE TABLE [IF NOT EXISTS] [db_name.]table_name - * LIKE [other_db_name.]existing_table_name + * CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name + * LIKE [other_db_name.]existing_table_name [locationSpec] * }}} */ override def visitCreateTableLike(ctx: CreateTableLikeContext): LogicalPlan = withOrigin(ctx) { val targetTable = visitTableIdentifier(ctx.target) val sourceTable = visitTableIdentifier(ctx.source) -CreateTableLikeCommand(targetTable, sourceTable, ctx.EXISTS != null) +val location = Option(ctx.locationSpec).map(visitLocationSpec) +if (ctx.EXTERNAL != null && location.isEmpty) { --- End diff -- OK, I'll do it later, Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 I am sorry that I did't grasp the key points of your question. In Hive, if there are data files under the specified path while creating an external table, then Hive will identify the files as table data files. In many spark applications, external table data is generated by other applications under the external table path. So, Hive did nothing with the directory specified in the LOCATION. Thank you for your patience and guidance. @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16638: spark-19115
Github user ouyangxiaochen commented on a diff in the pull request: https://github.com/apache/spark/pull/16638#discussion_r96993195 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -58,6 +58,7 @@ import org.apache.spark.util.Utils case class CreateTableLikeCommand( --- End diff -- ok,i will update it later,Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: spark-19115
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 Here is the differences between Hive and Spark2.x as follow: 1.Hive create table test(id int); --> MANAGED_TABLE create table test(id int) location '/warehouse/test'; --> MANAGED_TABLE create external table test(id int) location '/warehouse/test'; --> EXTERNAL_TABLE create external table test(id int); --> EXTERNAL_TABLE 2.Spark2.x: create table test(id int); --> MANAGED_TABLE create table test(id int) location '/warehouse/test'; --> EXTERNAL_TABLE create external table test(id int) location '/warehouse/test'; --> EXTERNAL_TABLE create external table test(id int); --> operationNotAllowed("CREATE EXTERNAL TABLE must be accompanied by LOCATION", ctx) So,this PR follows the spark2.xâs design rules,thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16638: spark-19115
GitHub user ouyangxiaochen opened a pull request: https://github.com/apache/spark/pull/16638 spark-19115 ## What changes were proposed in this pull request? sparksql supports the command : create external table if not exists gen_tbl like src_tbl location '/warehouse/gen_tbl' in spark2.X ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/ouyangxiaochen/spark spark19115 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16638.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16638 commit adde008588cf8e05cf261c086201c27a8dd5584f Author: ouyangxiaochen <ou.yangxiaoc...@zte.com.cn> Date: 2017-01-19T03:15:17Z spark-19115 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org