[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14426 Rebased to resolve the conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15399: [SPARK-17819][SQL] Specified database in JDBC URL...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/15399 [SPARK-17819][SQL] Specified database in JDBC URL is ignored when con⦠## What changes were proposed in this pull request? ```sql $ bin/beeline -u jdbc:hive2://localhost:1 -e "create database testdb" $ bin/beeline -u jdbc:hive2://localhost:1/testdb -e "create table t(a int)" $ bin/beeline -u jdbc:hive2://localhost:1/testdb -e "show tables" ... ++--+--+ | tableName | isTemporary | ++--+--+ | t | false| ++--+--+ 1 row selected (0.347 seconds) $ bin/beeline -u jdbc:hive2://localhost:1 -e "show tables" ... ++--+--+ | tableName | isTemporary | ++--+--+ ++--+--+ No rows selected (0.098 seconds) ``` ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) â¦necting to thriftserver You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-17819 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15399.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15399 commit f9a294e815a31e4a3ca2d8d00fdfa53028efae52 Author: Dongjoon Hyun Date: 2016-10-08T01:05:04Z [SPARK-17819][SQL] Specified database in JDBC URL is ignored when connecting to thriftserver --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15399 Hi, @rxin . Sorry, but could you give me some advice about a proper testsuite for this kind of issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15399 Oops. Sorry. I'll fix soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15399 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15399 Hi, @gatorsmile . Could you review this PR when you have sometime? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14527: [SPARK-16938][SQL] `drop/dropDuplicate` should ha...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/14527 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15399 Thank you for commenting, @gatorsmile . So far, I asked @rxin and you. I'll find another committer more closer this part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15399 Hi, @liancheng . Could you review this PR about Thrift Server? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15399 Thank you for your advice, @rxin !! Which suite do you mean? The suites inside `HiveThriftServer2Suites.scala` look not proper to me for this kind of cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15399 Or, should I create a new testsuite for this kind of testcases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15399 Hi, @rxin . Since this kind of tests need to change their URI and also assume that the non-default database exists prior, I made a new testsuite for this. Could you review this again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15399 Hi, @rxin . Could you review this again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15376 Oh. Indeed. I thought it's a supported way since it works on Hive 1.2. ```sql hive> load data local inpath '/data/t/*.txt' INTO TABLE x; Loading data to table default.x Table default.x stats: [numFiles=12, totalSize=613224000] OK Time taken: 3.712 seconds ``` According to your advice and the URL, it seems not a normal or recommended way. Thank you for @srowen . You prefer close this PR and the issue, do you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15376 Thank you for review, @srowen . I'll update the PR. Also, I'll investigate more if there is a reason not to recommend this way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15399 Hi, @rxin or @liancheng . Could you review this PR when you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15399: [SPARK-17819][SQL] Support default database in connectio...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15399 Thank you, @rxin ! Sure. I'll create a backport PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15507: [SPARK-17819][SQL][BRANCH-2.0] Support default da...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/15507 [SPARK-17819][SQL][BRANCH-2.0] Support default database in connection URIs for Spark Thrift Server ## What changes were proposed in this pull request? Currently, Spark Thrift Server ignores the default database in URI. This PR supports that like the following. ```sql $ bin/beeline -u jdbc:hive2://localhost:1 -e "create database testdb" $ bin/beeline -u jdbc:hive2://localhost:1/testdb -e "create table t(a int)" $ bin/beeline -u jdbc:hive2://localhost:1/testdb -e "show tables" ... ++--+--+ | tableName | isTemporary | ++--+--+ | t | false| ++--+--+ 1 row selected (0.347 seconds) $ bin/beeline -u jdbc:hive2://localhost:1 -e "show tables" ... ++--+--+ | tableName | isTemporary | ++--+--+ ++--+--+ No rows selected (0.098 seconds) ``` ## How was this patch tested? Pass the Jenkins with a newly added testsuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-17819-BACK Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15507.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15507 commit 43a9d48a70b8400a4692e5f8d2b44686e7f50b4e Author: Dongjoon Hyun Date: 2016-10-17T03:45:25Z [SPARK-17819][SQL] Support default database in connection URIs for Spark Thrift Server --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15507: [SPARK-17819][SQL][BRANCH-2.0] Support default database ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15507 @rxin . This is the backport for `branch-2.0`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15507: [SPARK-17819][SQL][BRANCH-2.0] Support default database ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15507 Thank you again, @rxin . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15507: [SPARK-17819][SQL][BRANCH-2.0] Support default da...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/15507 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15376: [SPARK-17796][SQL] Support wildcard character in ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15376#discussion_r83582717 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -246,7 +247,27 @@ case class LoadDataCommand( val loadPath = if (isLocal) { val uri = Utils.resolveURI(path) -if (!new File(uri.getPath()).exists()) { +val filePath = uri.getPath() +val exists = if (filePath.contains("*")) { + val splitPath = filePath.split(File.separator) + val filePattern = splitPath.last + val dir = splitPath.dropRight(1).mkString(File.separator) --- End diff -- Now, it uses `java.nio.file.Path` APIs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15376 `KafkaSourceSuite` failure seems to be irrelevant. ``` [info] KafkaSourceSuite: [info] - cannot stop Kafka stream (1 minute, 1 second) [info] - subscribing topic by name from latest offsets *** FAILED *** (10 seconds, 511 milliseconds) [info] The code passed to eventually never returned normally. Attempted 669 times over 10.01201477801 seconds. Last failure message: assertion failed: Partition [topic-2, 0] metadata not propagated after timeout. (KafkaTestUtils.scala:312) [info] org.scalatest.exceptions.TestFailedDueToTimeoutException: ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15376 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15302 Hi, @hvanhovell . When using `Expression`, I faced two situations. - `checkAnalysis` raises exceptions because the column is unresolved, e.g., `country` is unresolved. - As a workaround, I tried to use string literal 'country', but then optimizer `ConstantFolding` replaces that as `false` because 'country' < 'KR' is `false`. ```sql ALTER TABLE sales DROP PARTITION (country < 'KR') ``` To avoid this situations, I can add some rule to `checkAnalysis`. But, it seems not a good idea. Could you give some advice for this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15376: [SPARK-17796][SQL] Support wildcard character in ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15376#discussion_r83707763 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -246,7 +247,28 @@ case class LoadDataCommand( val loadPath = if (isLocal) { val uri = Utils.resolveURI(path) -if (!new File(uri.getPath()).exists()) { +val filePath = uri.getPath() +val exists = if (filePath.contains("*")) { + val fileSystem = FileSystems.getDefault + val pathPattern = fileSystem.getPath(filePath) + val dir = pathPattern.getParent.toString + val filePattern = pathPattern.getName(pathPattern.getNameCount - 1).toString --- End diff -- Thanks. I'll use that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15376: [SPARK-17796][SQL] Support wildcard character in ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15376#discussion_r83707938 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -1886,6 +1887,37 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } + test("SPARK-17796 Support wildcard character in filename for LOAD DATA LOCAL INPATH") { +withTempDir { dir => + for (i <- 1 to 3) { +val writer = new PrintWriter(new File(s"$dir/part-r-$i")) --- End diff -- Sure, I'll use Guava one here, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15376: [SPARK-17796][SQL] Support wildcard character in ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15376#discussion_r83710046 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -246,7 +247,28 @@ case class LoadDataCommand( val loadPath = if (isLocal) { val uri = Utils.resolveURI(path) -if (!new File(uri.getPath()).exists()) { +val filePath = uri.getPath() +val exists = if (filePath.contains("*")) { + val fileSystem = FileSystems.getDefault + val pathPattern = fileSystem.getPath(filePath) + val dir = pathPattern.getParent.toString + val filePattern = pathPattern.getName(pathPattern.getNameCount - 1).toString + if (dir.contains("*")) { +throw new AnalysisException( + s"LOAD DATA input path allows only filename wildcard: $path") + } + + val files = new File(dir).listFiles() + if (files == null) { +false + } else { +val matcher = fileSystem.getPathMatcher("glob:" + filePattern) --- End diff -- Yes. It matches the whole absolute path. ```scala scala> val fs = java.nio.file.FileSystems.getDefault fs: java.nio.file.FileSystem = sun.nio.fs.MacOSXFileSystem@782dc5 scala> fs.getPathMatcher("glob:/x/1.dat").matches(fs.getPath("/x/1.dat")) res0: Boolean = true scala> fs.getPathMatcher("glob:/x/*.dat").matches(fs.getPath("/x/1.dat")) res1: Boolean = true ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15376: [SPARK-17796][SQL] Support wildcard character in ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15376#discussion_r83712851 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -246,7 +247,28 @@ case class LoadDataCommand( val loadPath = if (isLocal) { val uri = Utils.resolveURI(path) -if (!new File(uri.getPath()).exists()) { +val filePath = uri.getPath() +val exists = if (filePath.contains("*")) { + val fileSystem = FileSystems.getDefault + val pathPattern = fileSystem.getPath(filePath) + val dir = pathPattern.getParent.toString + val filePattern = pathPattern.getName(pathPattern.getNameCount - 1).toString + if (dir.contains("*")) { +throw new AnalysisException( + s"LOAD DATA input path allows only filename wildcard: $path") + } + + val files = new File(dir).listFiles() + if (files == null) { +false + } else { +val matcher = fileSystem.getPathMatcher("glob:" + filePattern) --- End diff -- Ah, I think I missed your point. I will update the code to use absolute path here, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15302 With today's master, it's like the following. Should we use expression in `AlterTableDropPartitionCommand`? ```scala org.apache.spark.sql.AnalysisException: cannot resolve '`country`' given input columns: []; line 1 pos 23; 'AlterTableDropPartitionCommand `sales`, [('country < KR)], false, false ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15376 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15546: [SPARK-17892][SQL] SQLBuilder should wrap the gen...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/15546 [SPARK-17892][SQL] SQLBuilder should wrap the generated SQL with parenthesis for LIMIT ## What changes were proposed in this pull request? Currently, `SQLBuilder` handles `LIMIT` by simply adding `LIMIT` at the end of the generated subSQL. It makes `RuntimeException`s like the following. This PR adds a parenthesis to prevent it. **Before** ```scala scala> sql("CREATE TABLE tbl(id INT)") scala> sql("CREATE VIEW v1(id2) AS SELECT id FROM tbl LIMIT 2") java.lang.RuntimeException: Failed to analyze the canonicalized SQL: ... ``` **After** ```scala scala> sql("CREATE TABLE tbl(id INT)") scala> sql("CREATE VIEW v1(id2) AS SELECT id FROM tbl LIMIT 2") scala> sql("SELECT id2 FROM v1") res4: org.apache.spark.sql.DataFrame = [id2: int] ``` ## How was this patch tested? Pass the Jenkins test with a newly added test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-17982 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15546.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15546 commit 85c0686a90395f3a7b56f22d78654e83e7ede7a6 Author: Dongjoon Hyun Date: 2016-10-19T03:15:48Z [SPARK-17892][SQL] SQLBuilder should wrap the generated SQL with parenthesis for LIMIT --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15546: [SPARK-17982][SQL] SQLBuilder should wrap the generated ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15546 Ooops. Thank you for fixing me! @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15546: [SPARK-17982][SQL] SQLBuilder should wrap the generated ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15546 Thank you for review, @rxin . Here, `CREATE VIEW` is not supported by SQLGeneration test suite. CREATE VIEW **v1(id2)** AS SELECT id FROM tbl **LIMIT 2** The case is a view with column names. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15546: [SPARK-17982][SQL] SQLBuilder should wrap the generated ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15546 Sorry, but what do you mean by `the output has ( ) no?`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15546: [SPARK-17982][SQL] SQLBuilder should wrap the generated ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15546 Ah. I see. `( )` means parenthesis, literally. I'll replace `CREATE VIEW` test case with SQL Generation suite. BTW, at the last commit, the PR is changed to add parenthesis if the child of `Limit` is `Project`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15546: [SPARK-17982][SQL] SQLBuilder should wrap the gen...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15546#discussion_r84031195 --- Diff: sql/hive/src/test/resources/sqlgen/limit.sql --- @@ -0,0 +1,4 @@ +-- This file is automatically generated by LogicalPlanToSQLSuite. +SELECT * FROM (SELECT id FROM tbl LIMIT 2) + --- End diff -- Without this PR, this test case fails because it generate the following query. Note that the third line. ``` SELECT `gen_attr_0` AS `id` FROM (SELECT `gen_attr_0` FROM SELECT `gen_attr_0` FROM (SELECT `id` AS `gen_attr_0`, `name` AS `gen_attr_1` FROM `default`.`tbl`) AS gen_subquery_0 LIMIT 2) AS tbl ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15546: [SPARK-17982][SQL] SQLBuilder should wrap the gen...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15546#discussion_r84031451 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala --- @@ -45,7 +45,7 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils { // Used for generating new query answer files by saving private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" - private val goldenSQLPath = getTestResourcePath("sqlgen") + private val goldenSQLPath = "src/test/resources/sqlgen/" --- End diff -- This should be absolute path. It seems to be changed by accidentally by the following PR. * https://github.com/apache/spark/commit/8a6bbe095b6a9aa33989c0deaa5ed0128d70320f#diff-d3bb909d073a402405e6089cd9ee17efR48 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15546: [SPARK-17982][SQL] SQLBuilder should wrap the generated ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15546 @rxin . I moved the test case and fixed a bug in `LogicalPlanToSQLSuite`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15546: [SPARK-17982][SQL] SQLBuilder should wrap the gen...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15546#discussion_r84032070 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala --- @@ -45,7 +45,7 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils { // Used for generating new query answer files by saving private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" - private val goldenSQLPath = getTestResourcePath("sqlgen") + private val goldenSQLPath = "src/test/resources/sqlgen/" --- End diff -- Up to now, `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "hive/test-only *LogicalPlanToSQLSuite"` didn't update the golden files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15376 Thank you for your review and approval, @srowen ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/14116 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14116 The issue might be tackled later after Catalog Federation. Now, I close this PR since it's too stale. Thank you all for spending time at this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15546: [SPARK-17982][SQL] SQLBuilder should wrap the generated ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15546 The current running build has one testcase failure, `when schema inference is turned on, should read partition data`. It seems to be irrelevant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15546: [SPARK-17982][SQL] SQLBuilder should wrap the generated ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15546 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15302 Hi, @hvanhovell . In DDL, do we have an example to use `Expression` like this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15382: [SPARK-17810] [SQL] Default spark.sql.warehouse.dir is r...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15382 Great. Is it going to 2.0.x, too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15546: [SPARK-17982][SQL] SQLBuilder should wrap the gen...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15546#discussion_r84205834 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala --- @@ -138,6 +138,9 @@ class SQLBuilder private ( case g: Generate => generateToSQL(g) +case Limit(limitExpr, child @ Project(_, _)) => + s"(${toSQL(child)} LIMIT ${limitExpr.sql})" + case Limit(limitExpr, child) => --- End diff -- During testing, the previous code is designed to add "LIMIT" string **without** parenthesis to handle the most cases. And, Spark do not allow double parenthesis. - ORDER BY: Limit(_, Sort) - GROUP BY: Limit(_, Aggr) ... `Project` is the only observed cases in `CREAE VIEW` or `SELECT * FROM (SELECT ... LIMIT ..)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15546: [SPARK-17982][SQL] SQLBuilder should wrap the generated ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15546 @rxin . I'll add more testcases for this and figure out the correct answer for your question. > is it always safe to just add limit to any plan? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15376 Thank you, @srowen ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15376: [SPARK-17796][SQL] Support wildcard character in ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15376#discussion_r84347126 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -246,7 +247,27 @@ case class LoadDataCommand( val loadPath = if (isLocal) { val uri = Utils.resolveURI(path) -if (!new File(uri.getPath()).exists()) { +val filePath = uri.getPath() +val exists = if (filePath.contains("*")) { + val fileSystem = FileSystems.getDefault + val pathPattern = fileSystem.getPath(filePath) + val dir = pathPattern.getParent.toString + if (dir.contains("*")) { --- End diff -- Thank you, @cloud-fan and @srowen . Yes. It's the intended behavior of this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15568: [SPARK-18028][SQL] simplify TableFileCatalog
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15568#discussion_r84359055 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -756,6 +756,20 @@ class SessionCatalog( } /** + * List the metadata of partitions that belong to the specified table, assuming it exists, that + * satisfy the given partition-pruning predicate expressions. + */ + def listPartitionsByFilter( + tableName: TableIdentifier, + predicates: Seq[Expression]): Seq[CatalogTablePartition] = { --- End diff -- Thank you for adding this, @cloud-fan ! I can use this in https://github.com/apache/spark/pull/15302 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17191: [SPARK-14471][SQL] Aliases in SELECT could be used in GR...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17191 @maropu . I think you had better ask committers before closing this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17273: [MINOR][CORE] Fix a info message of `prunePartitions`
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17273 Thank you, @rxin and @srowen . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17251: [SPARK-19910][SQL] `stack` should not reject NULL...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17251#discussion_r106337141 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -590,6 +591,23 @@ object TypeCoercion { } /** + * Coerces NullTypes of a Stack function to the corresponding column types. + */ + object StackCoercion extends Rule[LogicalPlan] { +def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions { + case s @ Stack(children) if s.childrenResolved && s.children.head.dataType == IntegerType && --- End diff -- Yep. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17251: [SPARK-19910][SQL] `stack` should not reject NULL...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17251#discussion_r106337249 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -156,9 +156,21 @@ case class Stack(children: Seq[Expression]) extends Generator { } } + private def findDataType(column: Integer): DataType = { --- End diff -- Right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17251: [SPARK-19910][SQL] `stack` should not reject NULL...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17251#discussion_r106338768 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -590,6 +591,23 @@ object TypeCoercion { } /** + * Coerces NullTypes of a Stack function to the corresponding column types. + */ + object StackCoercion extends Rule[LogicalPlan] { +def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions { + case s @ Stack(children) if s.childrenResolved && s.children.head.dataType == IntegerType && + s.children.head.foldable => +val schema = s.elementSchema +Stack(children.zipWithIndex.map { + case (e, 0) => e + case (Literal(null, NullType), index: Int) => +Literal.create(null, schema.fields((index - 1) % schema.length).dataType) --- End diff -- Yep. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17251 Thank you, @cloud-fan ! I updated the PR according to the review comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17251 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17311: [SPARK-19970][SQL] Table owner should be USER ins...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/17311 [SPARK-19970][SQL] Table owner should be USER instead of PRINCIPAL in kerberized clusters ## What changes were proposed in this pull request? In the kerberized hadoop cluster, when Spark creates tables, the owner of tables are filled with PRINCIPAL strings instead of USER names. This is inconsistent with Hive and causes problems when using ROLE in Hive. We had better to fix this. **BEFORE** ```scala scala> sql("create table t(a int)").show scala> sql("desc formatted t").show(false) ... |Owner: |sp...@example.com | | ``` **AFTER** ```scala scala> sql("create table t(a int)").show scala> sql("desc formatted t").show(false) ... |Owner: |spark | | ``` ## How was this patch tested? Manually `create table` and `desc formatted` because this happens in Kerberized clusters. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-19970 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17311.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17311 commit 9dfbf828be5fc99811529ae988b3ffa8fa39fcb7 Author: Dongjoon Hyun Date: 2017-03-16T09:12:37Z [SPARK-19970][SQL] Table owner should be USER instead of PRINCIPAL in kerberized clusters --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17311: [SPARK-19970][SQL] Table owner should be USER instead of...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17311 Hi, @vanzin . Could you review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17311: [SPARK-19970][SQL] Table owner should be USER instead of...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17311 Oh, there exists a version issue. Let me fix that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17311: [SPARK-19970][SQL] Table owner should be USER ins...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17311#discussion_r106385533 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -851,7 +851,7 @@ private[hive] object HiveClientImpl { hiveTable.setFields(schema.asJava) } hiveTable.setPartCols(partCols.asJava) -conf.foreach(c => hiveTable.setOwner(c.getUser)) +conf.foreach(c => hiveTable.setOwner(SessionState.getUserFromAuthenticator())) --- End diff -- `getUserFromAuthenticator` was added 0.13. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17335 Thank you, @jerryshao . I'll test on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17311: [SPARK-19970][SQL] Table owner should be USER ins...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17311#discussion_r106685501 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -851,7 +851,7 @@ private[hive] object HiveClientImpl { hiveTable.setFields(schema.asJava) } hiveTable.setPartCols(partCols.asJava) -conf.foreach(c => hiveTable.setOwner(c.getUser)) +conf.foreach(c => hiveTable.setOwner(SessionState.get().getAuthenticator().getUserName())) --- End diff -- We cannot use the following since it was introduced in 0.13. ``` conf.foreach(c => hiveTable.setOwner(SessionState.getUserFromAuthenticator())) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17311: [SPARK-19970][SQL] Table owner should be USER instead of...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17311 Hi, @gatorsmile . Could you review this issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17251 Hi, @cloud-fan . Could you review `stack` PR? If there is anything to do, please let me know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17311: [SPARK-19970][SQL] Table owner should be USER ins...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17311#discussion_r106699621 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -851,7 +851,7 @@ private[hive] object HiveClientImpl { hiveTable.setFields(schema.asJava) } hiveTable.setPartCols(partCols.asJava) -conf.foreach(c => hiveTable.setOwner(c.getUser)) +conf.foreach(c => hiveTable.setOwner(SessionState.get().getAuthenticator().getUserName())) --- End diff -- Thank you for review, @vanzin ! Which `object` do you mean for `.foreach` here? For `conf`, we already use that. For `SessionState.get()`, it's not `Option`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17311: [SPARK-19970][SQL] Table owner should be USER ins...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17311#discussion_r106700592 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -851,7 +851,7 @@ private[hive] object HiveClientImpl { hiveTable.setFields(schema.asJava) } hiveTable.setPartCols(partCols.asJava) -conf.foreach(c => hiveTable.setOwner(c.getUser)) +conf.foreach(c => hiveTable.setOwner(SessionState.get().getAuthenticator().getUserName())) --- End diff -- Oh, I see. It's about remove `.foreach`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17311: [SPARK-19970][SQL] Table owner should be USER ins...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17311#discussion_r106701151 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -851,7 +851,7 @@ private[hive] object HiveClientImpl { hiveTable.setFields(schema.asJava) } hiveTable.setPartCols(partCols.asJava) -conf.foreach(c => hiveTable.setOwner(c.getUser)) +conf.foreach(c => hiveTable.setOwner(SessionState.get().getAuthenticator().getUserName())) --- End diff -- Ah. Sorry again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17311: [SPARK-19970][SQL] Table owner should be USER instead of...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17311 I fixed that, @vanzin . Thank you again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17311: [SPARK-19970][SQL] Table owner should be USER ins...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17311#discussion_r106704898 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -851,7 +851,7 @@ private[hive] object HiveClientImpl { hiveTable.setFields(schema.asJava) } hiveTable.setPartCols(partCols.asJava) -conf.foreach(c => hiveTable.setOwner(c.getUser)) +conf.foreach { _ => hiveTable.setOwner(SessionState.get().getAuthenticator().getUserName()) } --- End diff -- Oh, I think we can remove `conf` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17311: [SPARK-19970][SQL] Table owner should be USER ins...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17311#discussion_r106706282 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -851,7 +851,7 @@ private[hive] object HiveClientImpl { hiveTable.setFields(schema.asJava) } hiveTable.setPartCols(partCols.asJava) -conf.foreach(c => hiveTable.setOwner(c.getUser)) +conf.foreach { _ => hiveTable.setOwner(SessionState.get().getAuthenticator().getUserName()) } --- End diff -- Yep! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17311: [SPARK-19970][SQL] Table owner should be USER ins...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17311#discussion_r106741052 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -851,7 +851,7 @@ private[hive] object HiveClientImpl { hiveTable.setFields(schema.asJava) } hiveTable.setPartCols(partCols.asJava) -conf.foreach(c => hiveTable.setOwner(c.getUser)) +conf.foreach { _ => hiveTable.setOwner(SessionState.get().getAuthenticator().getUserName()) } --- End diff -- @vanzin . Some test cases seems to related with `conf` is None, it causes failures. In this PR, I'll focus on putting the correct name only with the exactly same situations of the existing logic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17311: [SPARK-19970][SQL] Table owner should be USER instead of...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17311 If there is anything to do more, please let me know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17182: [SPARK-19840][SQL] Disallow creating permanent fu...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/17182 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17182: [SPARK-19840][SQL] Disallow creating permanent functions...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17182 Thank you for review, @holdenk and @gatorsmile . According to the review comment, I'll close this PR and JIRA issue as WON'T FIX for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17251 Hi, @cloud-fan . Is it possible that Spark 2.1.1 includes this fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17311: [SPARK-19970][SQL] Table owner should be USER instead of...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17311 Hi, @vanzin and @srowen . Could you review this PR (again) when you have sometime? I feel guilty because this PR need to be verified by manually on kerberized clusters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17182: [SPARK-19840][SQL] Disallow creating permanent functions...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17182 Thank you, @gatorsmile ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17251 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17311: [SPARK-19970][SQL] Table owner should be USER instead of...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17311 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17311: [SPARK-19970][SQL] Table owner should be USER ins...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17311#discussion_r106956674 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -851,7 +851,7 @@ private[hive] object HiveClientImpl { hiveTable.setFields(schema.asJava) } hiveTable.setPartCols(partCols.asJava) -conf.foreach(c => hiveTable.setOwner(c.getUser)) +conf.foreach { _ => hiveTable.setOwner(SessionState.get().getAuthenticator().getUserName()) } --- End diff -- `state` is in `class HiveClientImpl`, but this function is in `object HiveClientImpl`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17311: [SPARK-19970][SQL] Table owner should be USER instead of...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17311 Thank you so much, @vanzin ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17311: [SPARK-19970][SQL] Table owner should be USER instead of...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17311 Oh, I'll make a backport then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17311: [SPARK-19970][SQL] Table owner should be USER instead of...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17311 The difference is due to https://github.com/apache/spark/commit/3881f342b49efdb1e0d5ee27f616451ea1928c5d#diff-6fd847124f8eae45ba2de1cf7d6296feR855 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17363: [SPARK-19970][SQL][BRANCH-2.1] Table owner should...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/17363 [SPARK-19970][SQL][BRANCH-2.1] Table owner should be USER instead of PRINCIPAL in kerberized clusters ## What changes were proposed in this pull request? In the kerberized hadoop cluster, when Spark creates tables, the owner of tables are filled with PRINCIPAL strings instead of USER names. This is inconsistent with Hive and causes problems when using [ROLE](https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization) in Hive. We had better to fix this. **BEFORE** ```scala scala> sql("create table t(a int)").show scala> sql("desc formatted t").show(false) ... |Owner: |sp...@example.com | | ``` **AFTER** ```scala scala> sql("create table t(a int)").show scala> sql("desc formatted t").show(false) ... |Owner: |spark | | ``` ## How was this patch tested? Manually do `create table` and `desc formatted` because this happens in Kerberized clusters. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-19970-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17363.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17363 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17363: [SPARK-19970][SQL][BRANCH-2.1] Table owner should...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17363#discussion_r106973465 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -828,7 +828,7 @@ private[hive] class HiveClientImpl( hiveTable.setFields(schema.asJava) } hiveTable.setPartCols(partCols.asJava) -hiveTable.setOwner(conf.getUser) +hiveTable.setOwner(state.getAuthenticator().getUserName()) --- End diff -- @vanzin . I made a backport for branch-2.1 and tested in kerberized cluster. This one uses `state` as you advised. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17363: [SPARK-19970][SQL][BRANCH-2.1] Table owner should be USE...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17363 This is also tested manually in kerberized cluster, @vanzin . BTW, Spark 1.6 has the same issue at [ClientWrapper](https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala#L379). According to the email, there exists demands on more Apache Spark 1.6.X. May I create a backport for branch-1.6? How do you think about that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17366: [SPARK-19970][SQL][BRANCH-1.6] Table owner should...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/17366 [SPARK-19970][SQL][BRANCH-1.6] Table owner should be USER instead of PRINCIPAL in kerberized clusters ## What changes were proposed in this pull request? In the kerberized hadoop cluster, when Spark creates tables, the owner of tables are filled with PRINCIPAL strings instead of USER names. This is inconsistent with Hive and causes problems when using [ROLE](https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization) in Hive. We had better to fix this. For Apache Spark 1.6, it happens only with `CREATE TABLE ... AS SELECT` (CTAS) statement. **BEFORE** ```scala scala> sql("create table t_ctas as select 1").show scala> sql("desc formatted t_ctas").show(false) ... |Owner: |sp...@example.com | | ``` **AFTER** ```scala scala> sql("create table t_ctas as select 1").show scala> sql("desc formatted t_ctas").show(false) ... |Owner: |spark | | ``` ## How was this patch tested? Manually do `create table` and `desc formatted` because this happens in Kerberized clusters. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-19970-BRANCH-1.6 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17366.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17366 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17354: [SPARK-20024] [SQL] [test-maven] SessionCatalog A...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17354#discussion_r107032174 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -226,6 +226,7 @@ class SessionCatalog( s"${globalTempViewManager.database}.viewName.") } requireDbExists(dbName) +externalCatalog.setCurrentDatabase(dbName) --- End diff -- Hi, @gatorsmile . Apache Spark [2.1.1](https://github.com/apache/spark/blob/branch-2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L211-L213) should have this fix, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17266: [SPARK-19912][SQL] String literals should be escaped for...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17266 Oh, thank you, @cloud-fan ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17363: [SPARK-19970][SQL][BRANCH-2.1] Table owner should be USE...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17363 Hi, @vanzin . Could you review this backport when you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17251 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17363: [SPARK-19970][SQL][BRANCH-2.1] Table owner should be USE...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17363 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17363: [SPARK-19970][SQL][BRANCH-2.1] Table owner should be USE...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17363 Oo. Thank you so much, @vanzin ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17363: [SPARK-19970][SQL][BRANCH-2.1] Table owner should...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/17363 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17366: [SPARK-19970][SQL][BRANCH-1.6] Table owner should be USE...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17366 I'm closing this for branch-1.6 according to the [advice](https://github.com/apache/spark/pull/17363#issuecomment-288872570). Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17366: [SPARK-19970][SQL][BRANCH-1.6] Table owner should...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/17366 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17271: [SPARK-19912][SQL][NOT FOR MERGE] String literals...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/17271 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org