[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/22356 Thanks for taking my codes. Looks good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21638: [SPARK-22357][CORE] SparkContext.binaryFiles ignore minP...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/21638 Here is the test code, not sure it is right or not --- ``` test("Number of partitions") { sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local") .set("spark.files.maxPartitionBytes", "10") .set("spark.files.openCostInBytes", "0") .set("spark.default.parallelism", "1")) val dir1 = Utils.createTempDir() val dirpath1 = dir1.getAbsolutePath val dir2 = Utils.createTempDir() val dirpath2 = dir2.getAbsolutePath val file1 = new File(dir1, "part-0") val file2 = new File(dir1, "part-1") Files.write("someline1 in file1\nsomeline2 in file1\nsomeline3 in file1", file1, StandardCharsets.UTF_8) Files.write("someline1 in file2\nsomeline2 in file2\nsomeline3 in file2", file2, StandardCharsets.UTF_8) assert(sc.binaryFiles(dirpath1, minPartitions = 1).getNumPartitions == 2) assert(sc.binaryFiles(dirpath1, minPartitions = 2).getNumPartitions == 2) assert(sc.binaryFiles(dirpath1, minPartitions = 50).getNumPartitions == 2) } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21638: [SPARK-22357][CORE] SparkContext.binaryFiles igno...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/21638#discussion_r215022562 --- Diff: core/src/main/scala/org/apache/spark/input/PortableDataStream.scala --- @@ -47,7 +47,7 @@ private[spark] abstract class StreamFileInputFormat[T] def setMinPartitions(sc: SparkContext, context: JobContext, minPartitions: Int) { val defaultMaxSplitBytes = sc.getConf.get(config.FILES_MAX_PARTITION_BYTES) val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES) -val defaultParallelism = sc.defaultParallelism +val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions) --- End diff -- From the codes, you can see the calculation is just the intermediate result and this method won't return any value. Checking the split size does not make sense for this test case because it depends on multiple variables and this is just one of them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21638: [SPARK-22357][CORE] SparkContext.binaryFiles igno...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/21638#discussion_r215010040 --- Diff: core/src/main/scala/org/apache/spark/input/PortableDataStream.scala --- @@ -47,7 +47,7 @@ private[spark] abstract class StreamFileInputFormat[T] def setMinPartitions(sc: SparkContext, context: JobContext, minPartitions: Int) { val defaultMaxSplitBytes = sc.getConf.get(config.FILES_MAX_PARTITION_BYTES) val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES) -val defaultParallelism = sc.defaultParallelism +val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions) --- End diff -- I agree it is hard to test. I appreciate If anyone can give me some hints of how to do these (how to verify and where to put my test cases). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22276: [SPARK-25242][SQL] make sql config setting fluent
Github user bomeng closed the pull request at: https://github.com/apache/spark/pull/22276 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22276: [SPARK-25242][SQL] make sql config setting fluent
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/22276 Ok, closing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22276: [SPARK-25242][SQL] make sql config setting fluent
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/22276 The tests failed due to method signatures' change, but it should not affect the existing test cases and existing usages. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22276: [SPARK-25242][SQL] make sql config setting fluent
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/22276 [SPARK-25242][SQL] make sql config setting fluent ## What changes were proposed in this pull request? User can now set conf more easily by doing this: ``` sparkSession.conf.set(...).set(...).unset(...) ``` ## How was this patch tested? More tests for those writings are added to the existing test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark 25242 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22276.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22276 commit 45f438c650ae44662341f656378106bc31667f4d Author: Bo Meng Date: 2018-08-29T23:10:09Z SPARK-25242: make sql config setting fluent --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22127: [SPARK-25032][SQL] fix drop database issue
Github user bomeng closed the pull request at: https://github.com/apache/spark/pull/22127 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/22127 Good points. I will leave it open for any suggestions for improving the user experience.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22127: [SPARK-25032][SQL] fix drop database issue
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/22127 [SPARK-25032][SQL] fix drop database issue ## What changes were proposed in this pull request? When user tries to drop the current database (other than default database), after the database is deleted, we should set the database to default. ## How was this patch tested? A new test case is added to cover this scenario. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark 25032 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22127.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22127 commit 825533682c98598409e537fa866dcdab915e3948 Author: Bo Meng Date: 2018-08-16T21:58:17Z fix drop database issue --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22115: [SPARK-25082] [SQL] improve the javadoc for expm1()
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/22115 I have already done the global search. That is the only place needs change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22115: [SPARK-25082] [SQL] improve the javadoc for expm1...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/22115 [SPARK-25082] [SQL] improve the javadoc for expm1() ## What changes were proposed in this pull request? Correct the javadoc for expm1() function. ## How was this patch tested? None. It is a minor issue. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark 25082 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22115.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22115 commit 089c31fcff1a5b84634f5de78c1bd440f738b2f4 Author: Bo Meng Date: 2018-08-16T00:09:32Z improve the javadoc --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21638: [SPARK-22357][CORE] SparkContext.binaryFiles igno...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/21638#discussion_r204517923 --- Diff: core/src/main/scala/org/apache/spark/input/PortableDataStream.scala --- @@ -47,7 +47,7 @@ private[spark] abstract class StreamFileInputFormat[T] def setMinPartitions(sc: SparkContext, context: JobContext, minPartitions: Int) { val defaultMaxSplitBytes = sc.getConf.get(config.FILES_MAX_PARTITION_BYTES) val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES) -val defaultParallelism = sc.defaultParallelism +val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions) --- End diff -- BinaryFileRDD will set minPartitions, which will either be defaultMinPartitions, or the values you can set via binaryFiles(path, minPartitions) method. Eventually, this minPartitions value will be passed to setMinPartitions() method. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21638: [SPARK-22357][CORE] SparkContext.binaryFiles ignore minP...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/21638 Either way works for me, but I think since this is not a private method, so people may use it in their own approach. The minimal change will be the best. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21638: [SPARK-22357][CORE] SparkContext.binaryFiles igno...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/21638#discussion_r202907829 --- Diff: core/src/main/scala/org/apache/spark/input/PortableDataStream.scala --- @@ -47,7 +47,7 @@ private[spark] abstract class StreamFileInputFormat[T] def setMinPartitions(sc: SparkContext, context: JobContext, minPartitions: Int) { val defaultMaxSplitBytes = sc.getConf.get(config.FILES_MAX_PARTITION_BYTES) val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES) -val defaultParallelism = sc.defaultParallelism +val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions) --- End diff -- you need to pass in the minPartitions to use this method, what do you mean minParititions is not set? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21638: [SPARK-22357][CORE] SparkContext.binaryFiles ignore minP...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/21638 @HyukjinKwon please review. thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21638: [SPARK-22357][CORE] SparkContext.binaryFiles igno...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/21638 [SPARK-22357][CORE] SparkContext.binaryFiles ignore minPartitions parameter ## What changes were proposed in this pull request? Fix the issue that minPartitions was not used in the method. ## How was this patch tested? I have not provided the additional test since the fix is very straightforward. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark 22357 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21638.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21638 commit b9eea4994c3ad151aa75ed03bbcf807bc3c4ded8 Author: Bo Meng Date: 2018-06-25T20:02:43Z fix: SparkContext.binaryFiles ignore minPartitions parameter commit 0fc35d4e0db34239cd3c52b0cf21445c59d2dede Author: Bo Meng Date: 2018-06-25T20:04:58Z should be max() --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/19614 I will fix the style shortly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19614: update the location of reference paper
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/19614 update the location of reference paper ## What changes were proposed in this pull request? Update the url of reference paper. ## How was this patch tested? It is comments, so nothing tested. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark 22399 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19614.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19614 commit ddc97efed418698b81cce70e8cd0498e46dbcd88 Author: bomeng <bm...@us.ibm.com> Date: 2017-10-30T22:31:05Z update the location of reference paper --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17470: [SPARK-20146][SQL] fix comment missing issue for ...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/17470 [SPARK-20146][SQL] fix comment missing issue for thrift server ## What changes were proposed in this pull request? The column comment was missing while constructing the Hive TableSchema. This fix will preserve the original comment. ## How was this patch tested? I have added a new test case to test the column with/without comment. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-20146 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17470.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17470 commit 69f2172e0c2e422aa88c1365c68786ab8abf1113 Author: bomeng <bm...@us.ibm.com> Date: 2017-03-29T18:28:41Z fix comment missing issue for thrift server --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13720: [SPARK-16004] [SQL] Correctly display "Last Access Time"...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13720 @cloud-fan please review again, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13720: [SPARK-16004] [SQL] improve the display of CatalogTable ...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13720 ok, i will work on it based on comments. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13720: [SPARK-16004] [SQL] improve the display of CatalogTable ...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13720 @cloud-fan Is this one worth to be fixed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12739: [SPARK-14955] [SQL] avoid stride value equals to zero
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/12739 close this pr provided it was fixed by another pr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12739: [SPARK-14955] [SQL] avoid stride value equals to ...
Github user bomeng closed the pull request at: https://github.com/apache/spark/pull/12739 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13140: [SPARK-15230] [SQL] distinct() does not handle column na...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13140 i do not know what happened to jenkin, looks the failure is irrelevant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13140: [SPARK-15230] [SQL] distinct() does not handle column na...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13140 @cloud-fan thanks for your concise codes! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13720: [SPARK-16004] [SQL] improve the display of Catalo...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/13720#discussion_r67958472 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -127,7 +127,7 @@ case class CatalogTable( sortColumnNames: Seq[String] = Seq.empty, bucketColumnNames: Seq[String] = Seq.empty, numBuckets: Int = -1, -owner: String = "", +owner: String = System.getProperty("user.name"), --- End diff -- User name will be complicated. It is set from current SessionState authenticator. I do not believe we have reached that yet. I will revert this part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13720: [SPARK-16004] [SQL] improve the display of Catalo...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/13720#discussion_r67957269 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -180,7 +180,8 @@ case class CatalogTable( Seq(s"Table: ${identifier.quotedString}", if (owner.nonEmpty) s"Owner: $owner" else "", s"Created: ${new Date(createTime).toString}", -s"Last Access: ${new Date(lastAccessTime).toString}", +"Last Access: " + + (if (lastAccessTime == -1) "UNKNOWN" else new Date(lastAccessTime).toString), --- End diff -- Here is the code from Hive (it is using 0 as initial last access value): --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13720: [SPARK-16004] [SQL] improve the display of Catalo...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/13720#discussion_r67812926 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -127,7 +127,7 @@ case class CatalogTable( sortColumnNames: Seq[String] = Seq.empty, bucketColumnNames: Seq[String] = Seq.empty, numBuckets: Int = -1, -owner: String = "", +owner: String = System.getProperty("user.name"), --- End diff -- Let me check what HIVE does tomorrow and get it back to you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13791: [SPARK-16084] [SQL] Minor Javadoc update for "DES...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/13791 [SPARK-16084] [SQL] Minor Javadoc update for "DESCRIBE" table ## What changes were proposed in this pull request? 1. FORMATTED is actually supported, but partition is not supported; 2. Remove parenthesis as it is not necessary just like anywhere else. ## How was this patch tested? Minor issue. I do not think it needs a test case! You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-16084 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13791.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13791 commit 3638ffd0dbb93feb58c96b3163c52220aacf3981 Author: bomeng <bm...@us.ibm.com> Date: 2016-06-20T22:40:50Z minor comments fix commit 5db284dab1aaced5f86cc6bed3e23e42e2c79b74 Author: bomeng <bm...@us.ibm.com> Date: 2016-06-20T22:43:53Z Revert "minor comments fix" This reverts commit 3638ffd0dbb93feb58c96b3163c52220aacf3981. commit e1a5f5421f92dc3ef5d39a189bdd0017b7633662 Author: bomeng <bm...@us.ibm.com> Date: 2016-06-20T22:45:55Z fix java doc issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13720: [SPARK-16004] [SQL] improve the display of CatalogTable ...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13720 @srowen please review. thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13720: [SPAKR-16004] [SQL] improve the disply of Catalog...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/13720 [SPAKR-16004] [SQL] improve the disply of CatalogTable information ## What changes were proposed in this pull request? A few issues found when running "describe extended | formatted [tableName]" command: 1. The last access time is incorrectly displayed something like "Last Access Time: |Wed Dec 31 15:59:59 PST 1969", I think we should display as "UNKNOWN" as Hive does; 2. Owner is always empty, instead of the current login user, who creates the table; 3. Comments fields display "null" instead of empty string when commend is None; ## How was this patch tested? Currently, I have manually tested them - it is very straight-forward to test, but hard to write test cases for them. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-16004 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13720.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13720 commit 358ac0d2e9b27bcf7c3d0448555497b60fc20dd5 Author: bomeng <bm...@us.ibm.com> Date: 2016-06-16T23:21:39Z improve the disply of CatalogTable information --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12739: [SPARK-14955] [SQL] avoid stride value equals to zero
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/12739 @andrewor14 Hey Andrew, could you please review this one as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13695: [SPARK-15978] [SQL] improve 'show tables' command relate...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13695 Thanks for merging ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13695: [SPARK-15978] [SQL] improve 'show tables' command relate...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13695 @rxin could you please review it again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13695: [SPARK-15978] [SQL] remove unnecessary format
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/13695 [SPARK-15978] [SQL] remove unnecessary format ## What changes were proposed in this pull request? I've found some minor issues in "show tables" command: 1. In the `SessionCatalog.scala`, `listTables(db: String)` method will call `listTables(formatDatabaseName(db), "*")` to list all the tables for certain db, but in the method `listTables(db: String, pattern: String)`, this db name is formatted once more. So I think we should remove `formatDatabaseName()` in the caller. 2. I suggest to add sort to listTables(db: String) in InMemoryCatalog.scala, just like listDatabases(). ## How was this patch tested? The existing test cases should cover it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-15978 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13695.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13695 commit 2bbc919105e20a9c766f156e80fad18052395215 Author: bomeng <bm...@us.ibm.com> Date: 2016-06-15T23:04:06Z remove unnecessary format --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13671: [SPARK-15952] [SQL] fix "show databases" ordering issue
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13671 thanks for merging! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13671: [SPARK-15952] [SQL] fix "show databases" ordering issue
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13671 for issue 1, I have updated the existing test case for testing this (the original one just tests the count of the result). for issue 2, it is minor and just a text change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13671: [SPARK-15952] [SQL] fix "show databases" ordering...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/13671 [SPARK-15952] [SQL] fix "show databases" ordering issue ## What changes were proposed in this pull request? Two issues I've found for "show databases" commands: 1. The returned database name list was not sorted, it only works when "like" was used together; (HIVE will always return a sorted list) 2. When it is used as sql("show databases").show, it will output a table with column named as "result", but for sql("show tables").show, it will output the column name as "tableName", so I think we should be consistent and use "databaseName" at least. ## How was this patch tested? Updated existing test case to test its ordering as well. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-15952 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13671.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13671 commit d6b0f860352cf9e4a71e746c7f9bd035e9e243e5 Author: bomeng <bm...@us.ibm.com> Date: 2016-06-14T21:29:57Z fix the ordering issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13543 @srowen Thanks for merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13533: [SPARK-15781] [Documentation] remove deprecated environm...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13533 @srowen Thanks for merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13543: [SPARK-15806] [Documentation] update doc for SPAR...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/13543#discussion_r66701686 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala --- @@ -20,18 +20,24 @@ package org.apache.spark.deploy.master import scala.annotation.tailrec import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging import org.apache.spark.util.{IntParam, Utils} /** * Command-line parser for the master. */ -private[master] class MasterArguments(args: Array[String], conf: SparkConf) { +private[master] class MasterArguments(args: Array[String], conf: SparkConf) extends Logging { var host = Utils.localHostName() var port = 7077 var webUiPort = 8080 var propertiesFile: String = null // Check for settings in environment variables + if (System.getenv("SPARK_MASTER_IP") != null) { +logWarning("SPARK_MASTER_IP is deprecated, please use SPARK_MASTER_HOST") +host = System.getenv("SPARK_MASTER_IP") + } + if (System.getenv("SPARK_MASTER_HOST") != null) { --- End diff -- Master.scala create an instance of MasterArguments (line 1008), and MasterArguments will read environment as its initial values (includes SPARK_MASTER_HOST), that is the original logic. User may not pass in --host and just use the SPARK_MASTER_HOST as its value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13543: [SPARK-15806] [Documentation] update doc for SPAR...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/13543#discussion_r66647339 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala --- @@ -20,18 +20,24 @@ package org.apache.spark.deploy.master import scala.annotation.tailrec import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging import org.apache.spark.util.{IntParam, Utils} /** * Command-line parser for the master. */ -private[master] class MasterArguments(args: Array[String], conf: SparkConf) { +private[master] class MasterArguments(args: Array[String], conf: SparkConf) extends Logging { var host = Utils.localHostName() var port = 7077 var webUiPort = 8080 var propertiesFile: String = null // Check for settings in environment variables + if (System.getenv("SPARK_MASTER_IP") != null) { +logWarning("SPARK_MASTER_IP is deprecated, please use SPARK_MASTER_HOST") +host = System.getenv("SPARK_MASTER_IP") + } + if (System.getenv("SPARK_MASTER_HOST") != null) { --- End diff -- As I found before, MasterArguments.scala is currently used by Master.scala, I think we need to keep SPARK_MASTER_HOST as for now. Please let me know how we should proceed for this one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13543: [SPARK-15806] [Documentation] update doc for SPAR...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/13543#discussion_r66493098 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala --- @@ -20,18 +20,24 @@ package org.apache.spark.deploy.master import scala.annotation.tailrec import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging import org.apache.spark.util.{IntParam, Utils} /** * Command-line parser for the master. */ -private[master] class MasterArguments(args: Array[String], conf: SparkConf) { +private[master] class MasterArguments(args: Array[String], conf: SparkConf) extends Logging { var host = Utils.localHostName() var port = 7077 var webUiPort = 8080 var propertiesFile: String = null // Check for settings in environment variables + if (System.getenv("SPARK_MASTER_IP") != null) { +logWarning("SPARK_MASTER_IP is deprecated, please use SPARK_MASTER_HOST") +host = System.getenv("SPARK_MASTER_IP") + } + if (System.getenv("SPARK_MASTER_HOST") != null) { --- End diff -- MasterArguments.scala is used by Master.scala main() method, so there is a way to use `SPARK_MASTER_HOST --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13543: [SPARK-15806] [Documentation] update doc for SPAR...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/13543#discussion_r66488409 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala --- @@ -20,18 +20,24 @@ package org.apache.spark.deploy.master import scala.annotation.tailrec import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging import org.apache.spark.util.{IntParam, Utils} /** * Command-line parser for the master. */ -private[master] class MasterArguments(args: Array[String], conf: SparkConf) { +private[master] class MasterArguments(args: Array[String], conf: SparkConf) extends Logging { var host = Utils.localHostName() var port = 7077 var webUiPort = 8080 var propertiesFile: String = null // Check for settings in environment variables + if (System.getenv("SPARK_MASTER_IP") != null) { +logWarning("SPARK_MASTER_IP is deprecated, please use SPARK_MASTER_HOST") +host = System.getenv("SPARK_MASTER_IP") + } + if (System.getenv("SPARK_MASTER_HOST") != null) { --- End diff -- The code here is just to set its initial values and it may be changed by "--host" configuration. I think we should keep it there for now. For the warning message, we kind of always use logger, not sure it is a good idea to put into the script. I am open to your decision. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13543 Yes. I can add a warning if SPARK_MASTER_IP is set. Ideally we should use SPARK_MASTER_HOST in all places to avoid confusion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13543 Here is the link: [MasterArguments.scala](https://github.com/bomeng/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala#L56-L59) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13543 Please note that there are also some places still using SPARK_MASTER_IP, for example, start-master.sh, etc. I did not replace them, because it may break the current running script. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13543: [SPARK-15806] [Documentation] update doc for SPAR...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/13543 [SPARK-15806] [Documentation] update doc for SPARK_MASTER_IP ## What changes were proposed in this pull request? SPARK_MASTER_IP is a deprecated environment variable. It is replaced by SPARK_MASTER_HOST according to MasterArguments.scala. ## How was this patch tested? Manually verified. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-15806 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13543.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13543 commit 239cdfc08e5ad28864574f9ddbcf8240dd5a51ff Author: bomeng <bm...@us.ibm.com> Date: 2016-06-07T16:03:19Z update doc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13533: [SPARK-15781] [Documentation] remove deprecated environm...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13533 That could be another JIRA as we do not want to use one JIRA to fix all issues. Please file one if desired. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13533: [SPARK-17581] [ Documentation] remove deprecated ...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/13533 [SPARK-17581] [ Documentation] remove deprecated environment variable doc ## What changes were proposed in this pull request? Like `SPARK_JAVA_OPTS` and `SPARK_CLASSPATH`, we will remove the document for `SPARK_WORKER_INSTANCES` to discourage user not to use them. If they are actually used, SparkConf will show a warning message as before. ## How was this patch tested? Manually tested. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-15781 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13533.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13533 commit accc6f708059944d0a58c695cfe9f29501a77d0a Author: bomeng <bm...@us.ibm.com> Date: 2016-06-06T20:57:40Z update doc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13475: [SPARK-15737] [CORE] fix jetty warning
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/13475 [SPARK-15737] [CORE] fix jetty warning ## What changes were proposed in this pull request? After upgrading the Jetty to 9.2, we always see "WARN org.eclipse.jetty.server.handler.AbstractHandler: No Server set for org.eclipse.jetty.server.handler.ErrorHandler" while running test cases. This PR will fix it. ## How was this patch tested? The existing test cases will cover it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-15737 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13475.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13475 commit 03707a2f3fbebbeec68bb4adbbe4b026d3ef9a69 Author: bomeng <bm...@us.ibm.com> Date: 2016-06-02T21:27:39Z fix jetty warning --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13141: [SPARK-14752] [SQL] fix kryo ordering serializati...
Github user bomeng closed the pull request at: https://github.com/apache/spark/pull/13141 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15537] [SQL] fix dir delete issue
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/13304#discussion_r64665909 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala --- @@ -38,12 +39,12 @@ abstract class OrcSuite extends QueryTest with TestHiveSingleton with BeforeAndA super.beforeAll() orcTableAsDir = File.createTempFile("orctests", "sparksql") -orcTableAsDir.delete() +Utils.deleteRecursively(orcTableAsDir) --- End diff -- Thanks for the comments. I will update the code shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15537] [SQL] fix dir delete issue
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/13304 [SPARK-15537] [SQL] fix dir delete issue ## What changes were proposed in this pull request? For some of the test cases, e.g. OrcSourceSuite, it will create temp folders and temp files inside them. But after tests finish, the folders are not removed. This will cause lots of temp files created and space occupied, if we keep running the test cases. The reason is dir.delete() won't work if dir is not empty. We need to recursively delete the content before deleting the folder. ## How was this patch tested? Manually checked the temp folder to make sure the temp files were deleted. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-15537 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13304.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13304 commit 878b0cca355b21e84f08e8fc32f195485f1df14a Author: Bo Meng <men...@hotmail.com> Date: 2016-05-25T21:35:28Z fix dir delete issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15468] [SQL] some some typos
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/13246#discussion_r64142270 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -227,8 +227,8 @@ object IntegerIndex { * - Unnamed grouping expressions are named so that they can be referred to across phases of *aggregation * - Aggregations that appear multiple times are deduplicated. - * - The compution of the aggregations themselves is separated from the final result. For example, - *the `count` in `count + 1` will be split into an [[AggregateExpression]] and a final + * - The computation of the aggregations themselves is separated from the final result. For + *example, the `count` in `count + 1` will be split into an [[AggregateExpression]] and a final --- End diff -- This is just needed for 100-char line limit as previous line fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15468] [SQL] some some typos
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/13246 [SPARK-15468] [SQL] some some typos ## What changes were proposed in this pull request? Fix some typos while browsing the codes. ## How was this patch tested? None and obvious. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark typo Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13246.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13246 commit ff73a8ddc036e1d8edf7eaa3be2e39db4b17d67f Author: bomeng <bm...@us.ibm.com> Date: 2016-05-19T01:32:27Z fix typo commit 6b05bc95623483f96757a917508fc3737b20bc90 Author: Bo Meng <men...@hotmail.com> Date: 2016-05-20T18:48:17Z Merge remote-tracking branch 'upstream/master' into typo commit 3a5797544792557a6a143784277753f4d93dd031 Author: Bo Meng <men...@hotmail.com> Date: 2016-05-21T22:32:12Z Merge remote-tracking branch 'upstream/master' into typo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14752][SQL] LazilyGenerateOrdering thro...
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12661#issuecomment-219582450 Since this one has been here for more than 10 days, I've provided another approach with new test case. Please take a look. Thanks. [PR for SPARK-14752](https://github.com/apache/spark/pull/13141) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14752] [SQL] fix kryo ordering serializ...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/13141 [SPARK-14752] [SQL] fix kryo ordering serialization ## What changes were proposed in this pull request? When using Kryo as serializer and we will get `NullPointerException` exception for query with `ORDER BY'. ## How was this patch tested? I've added a new test cases to HashedRelationSuite.scala, since this issue is kind of related to Spark-14521. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-14752 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13141.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13141 commit 66f0e6c352bae9e65eadada19b1cfead8b06b3aa Author: bomeng <bm...@us.ibm.com> Date: 2016-05-16T23:42:22Z fix kryo serialization --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15230] [SQL] distinct() does not handle...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/13140 [SPARK-15230] [SQL] distinct() does not handle column name with dot properly ## What changes were proposed in this pull request? When table is created with column name containing dot, distinct() will fail to run. For example, ```scala val rowRDD = sparkContext.parallelize(Seq(Row(1), Row(1), Row(2))) val schema = StructType(Array(StructField("column.with.dot", IntegerType, nullable = false))) val df = spark.createDataFrame(rowRDD, schema) ``` running the following will have no problem: ```scala df.select(new Column("`column.with.dot`")) ``` but running the query with additional distinct() will cause exception: ```scala df.select(new Column("`column.with.dot`")).distinct() ``` The issue is that distinct() will try to resolve the column name, but the column name in the schema does not have backtick with it. So the solution is to add the backtick before passing the column name to resolve(). ## How was this patch tested? Added a new test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-15230 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13140.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13140 commit 2f7ffbd58a3437898f32e7603ca6b603f5fd5088 Author: bomeng <bm...@us.ibm.com> Date: 2016-05-16T20:37:54Z fix distinct() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14897] [SQL] upgrade to jetty 9.2.16
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12916#discussion_r63060249 --- Diff: yarn/pom.xml --- @@ -102,6 +102,10 @@ org.eclipse.jetty jetty-servlet + + org.eclipse.jetty + jetty-servlets + --- End diff -- yes, we need both, they have different content. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14897] [SQL] upgrade to jetty 9.2.16
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12916#discussion_r63003988 --- Diff: core/pom.xml --- @@ -125,12 +125,17 @@ jetty-servlet compile + + org.eclipse.jetty + jetty-servlets + compile +
[GitHub] spark pull request: [SPARK-14897] [SQL] upgrade to jetty 9.2.16
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12916#issuecomment-218668164 @srowen sorry for the late reply, I did not notice it. I have run the mvn dependency:tree and only javax.servlet-api 3.1.0 is listed, so it should be fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14897] [SQL] upgrade to jetty 9.2.16
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12916#discussion_r62968602 --- Diff: core/pom.xml --- @@ -125,12 +125,17 @@ jetty-servlet compile + + org.eclipse.jetty + jetty-servlets + compile +
[GitHub] spark pull request: [SPARK-14897] [SQL] upgrade to jetty 9.2.16
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12916#issuecomment-218340455 @srowen Finally I've got it working. Servlet and Derby were upgraded as well due to requirement of Jetty. Please review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14897] [SQL] [WIP] upgrade to jetty 9.2...
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12916#issuecomment-218320176 retest please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14897] [SQL] [WIP] upgrade to jetty 9.2...
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12916#issuecomment-217338702 The test failure was caused by timeout for HiveThriftHttpServerSuite and SingleSessionSuite... have not figure out the cause, any suggestion will be welcome. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14897] [SQL] upgrade to jetty 9.2.16
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/12916 [SPARK-14897] [SQL] upgrade to jetty 9.2.16 ## What changes were proposed in this pull request? Since Jetty 8 is EOL (end of life) and has critical security issue [http://www.securityweek.com/critical-vulnerability-found-jetty-web-server], I think upgrading to 9 is necessary. I am using latest 9.2 since 9.3 requires Java 8+. ## How was this patch tested? Manual test and current test cases should cover it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-14897 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12916.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12916 commit adba870ed702d4bd53292f240785fbd86484bb9d Author: bomeng <bm...@us.ibm.com> Date: 2016-05-04T23:55:54Z upgrade to jetty 9.2.16 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15062] [SQL] fix list type infer serial...
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12849#issuecomment-216384799 Making the changes based on the comments. Will post it shortly. List[_] should be supported as Seq[_], for now, you can use Seq[_] as workaround. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15062] [SQL] fix list type infer serial...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/12849 [SPARK-15062] [SQL] fix list type infer serializer issue ## What changes were proposed in this pull request? Make serializer correctly inferred if the input type is List[_], since List[_] is type of Seq[_], before it was matched to different case (case t if definedByConstructorParams(t)). ## How was this patch tested? New test case was added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-15062 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12849.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12849 commit 5869b95b41e27b90a8bc64d774c93966659f9226 Author: bomeng <bm...@us.ibm.com> Date: 2016-05-02T21:08:08Z fix list type infer serializer issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14955] [SQL] avoid stride value equals ...
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12739#issuecomment-215450711 @srowen Please review again. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14955] [SQL] avoid stride value equals ...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12739#discussion_r61342768 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala --- @@ -54,15 +54,22 @@ private[sql] object JDBCRelation { def columnPartition(partitioning: JDBCPartitioningInfo): Array[Partition] = { if (partitioning == null) return Array[Partition](JDBCPartition(null, 0)) +// make sure the input is valid +val lower = partitioning.lowerBound +val upper = partitioning.upperBound val numPartitions = partitioning.numPartitions val column = partitioning.column +require(lower < upper, "lower bound must be less than upper bound") +require(numPartitions > 0, "number of partition must be great than zero") + if (numPartitions == 1) return Array[Partition](JDBCPartition(null, 0)) -// Overflow and silliness can happen if you subtract then divide. -// Here we get a little roundoff, but that's (hopefully) OK. -val stride: Long = (partitioning.upperBound / numPartitions - - partitioning.lowerBound / numPartitions) + +val stride: Long = { --- End diff -- Cool, I will update it shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14955] [SQL] avoid stride value equals ...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/12739 [SPARK-14955] [SQL] avoid stride value equals to zero ## What changes were proposed in this pull request? In the columnPartition() method of JDBCRelation, stride is used for calculating the increment. But in some cases, this value could be zero, for example, lowerBound=0, upperBound=7, numOfPartition=8, which put all the data into one partition (last partition). This fix will try to make stride calculation more robust. I have also added require() to validate the input. equals() is added for overriding the parents equals() method, together with hashCode(). I have also fixed some text style, make keywords all uppercase. ## How was this patch tested? New test cases were added to JDBCSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-14955 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12739.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12739 commit b4df4b0626a4bab197eb249fea99283ba4afd293 Author: bomeng <bm...@us.ibm.com> Date: 2016-04-27T18:31:28Z fix stride --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14806] [SQL] Alias original Hive option...
Github user bomeng closed the pull request at: https://github.com/apache/spark/pull/12607 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14806] [SQL] Alias original Hive option...
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12607#issuecomment-214958322 closing it. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14928] [SQL] support substitution in SE...
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12709#issuecomment-214906151 Yes, I missed that. Parser already handles it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14928] [SQL] support substitution in SE...
Github user bomeng closed the pull request at: https://github.com/apache/spark/pull/12709 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14928] [SQL] support substitution in SE...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/12709 [SPARK-14928] [SQL] support substitution in SET key=value ## What changes were proposed in this pull request? In the `SET key=value` command, value can be defined as a variable and replaced by substitution. Since we have `VARIABLE_SUBSTITUTE_ENABLED` and `VARIABLE_SUBSTITUTE_DEPTH` defined in the SQLConf, it is nice to use them in the SET command. ## How was this patch tested? New test case was added to test this function. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-14928 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12709.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12709 commit 98f148dc902713df747915ba32703ac6a262226c Author: bomeng <bm...@us.ibm.com> Date: 2016-04-26T20:03:36Z support substitution --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14806] [SQL] Alias original Hive option...
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12607#issuecomment-214818178 @rxin I am open to your decision. I think it is still useful to allow user to use "SET" command by using spark.sql.variable.substitute as configuration. Currently, the "SET" command does not support that. Part of my codes is for fixing it, do you think it is still valid for that part? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14441] [SQL] Consolidate DDL tests
Github user bomeng closed the pull request at: https://github.com/apache/spark/pull/12347 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14441] [SQL] Consolidate DDL tests
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12347#issuecomment-214518666 closing this PR. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14806] [SQL] Alias original Hive option...
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12607#issuecomment-213614415 I think you mean set the value of `spark.sql.variable.substitute` and read `spark.sql.variable.substitute` above. I will post another try shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14806] [SQL] support substitution in se...
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12607#issuecomment-213533820 @rxin Just wanna confirm, you want to let user to do `SET hive.variable.substitute=true/false` in SQL? It will logWarning in `setConfWithCheck()` method and I just convert it there? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14806] [SQL] support substitution in se...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/12607 [SPARK-14806] [SQL] support substitution in set command ## What changes were proposed in this pull request? Since we have spark.sql.variable.substitute as an alias of hive.variable.substitute, we will use it for the `SET` command. ## How was this patch tested? Test was added to the existing one to test this new feature. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-14806 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12607.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12607 commit 982787607df966d98885eee56b636a9a9f9b208f Author: bomeng <bm...@us.ibm.com> Date: 2016-04-22T10:20:51Z support substitution in set command commit f306301148b957aeb1b48306bd06b4a65bdcc0b8 Author: bomeng <bm...@us.ibm.com> Date: 2016-04-22T10:28:49Z code improvement --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14819] [SQL] Improve SET / SET -v comma...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12583#discussion_r60685224 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -716,4 +716,8 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { } } + test("set / set -v") { +checkExistence(sql("set"), true, "env:", "system:") --- End diff -- Found out that current test cases SQLQuerySuite and HiveQuerySuite have intensive test coverage for SET command. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14819] [SQL] Improve SET / SET -v comma...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12583#discussion_r60669363 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -716,4 +716,8 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { } } + test("set / set -v") { +checkExistence(sql("set"), true, "env:", "system:") --- End diff -- SPARK_TESTING is not in sys.env nor sys.props while running the test cases. Any suggestions to add the test cases for that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14819] [SQL] Improve SET / SET -v comma...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/12583 [SPARK-14819] [SQL] Improve SET / SET -v command ## What changes were proposed in this pull request? Currently `SET` and `SET -v` commands are similar to Hive `SET` command except the following difference: 1. The result is not sorted; 2. When using `SET` and `SET -v`, in addition to the Hive related properties, it will also list all the system properties and environment properties, which is very useful in some cases. This JIRA is trying to make the current `SET` command more consistent to Hive output. ## How was this patch tested? A new test case was added to test the output of SET and SET -v. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-14819 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12583.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12583 commit 01d3e5a545eeced71b82d26a8407ea5e1d8f49ab Author: bomeng <bm...@us.ibm.com> Date: 2016-04-21T20:17:53Z improve SET / SET -v command --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541] [SQL] SQL function: IFNULL, NULL...
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12373#issuecomment-212169908 @rxin Could you please take a look if you get a chance? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14398] [SQL] Audit non-reserved keyword...
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12191#issuecomment-211543283 Yes, the reason for sorting the keywords is for ease of searching purpose. I have checked the generated codes and see the switch/case for each non-reserved words. But to my understanding, case A: case B: ... won't have performance difference as case A | B: ... this should be easily optimized by the compiler. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541] [SQL] SQL function: IFNULL, NULL...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12373#discussion_r59928147 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -128,6 +128,143 @@ case class IsNaN(child: Expression) extends UnaryExpression } /** + * An Expression accepts two parameters and returns null if both parameters are equal. + * If they are not equal, the first parameter value is returned. + */ +@ExpressionDescription( + usage = "_FUNC_(a,b) - Returns null if a equals to b, or a otherwise.") +case class NullIf(left: Expression, right: Expression) extends BinaryExpression { + override def nullable: Boolean = true + override def dataType: DataType = left.dataType + + override def eval(input: InternalRow): Any = { +val valueLeft = left.eval(input) +val valueRight = right.eval(input) +if (valueLeft.equals(valueRight)) { + null +} else { + valueLeft +} + } + + override def genCode(ctx: CodegenContext, ev: ExprCode): String = { +val leftGen = left.gen(ctx) +val rightGen = right.gen(ctx) +s""" + ${leftGen.code} + ${rightGen.code} + boolean ${ev.isNull} = false; + ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)}; + if (${ctx.genEqual(dataType, leftGen.value, rightGen.value)}) { +${ev.isNull} = true; + } else { +${ev.value} = ${leftGen.value}; + } +""" + } +} + +/** + * An Expression accepts two parameters and returns the second parameter if the value + * in the first parameter is null; if the first parameter is any value other than null, + * it is returned unchanged. + */ +@ExpressionDescription( + usage = "_FUNC_(a,b) - Returns b if a is null, or a otherwise.") +case class Nvl(left: Expression, right: Expression) extends BinaryExpression { --- End diff -- Did not notice this. I will do it shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541] [SQL] SQL function: IFNULL, NULL...
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12373#issuecomment-210605312 I have revisited the codes and made the codes more robust. Heavily tested against different data types by using introducing testAllTypes2Values() with 2 different values. PR description was updated. Javadoc was fixed. Please leave comments! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541] [SQL] SQL function: IFNULL, NULL...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12373#discussion_r59904696 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -128,6 +128,143 @@ case class IsNaN(child: Expression) extends UnaryExpression } /** + * An Expression accepts two parameters and returns null if both parameters are equal. + * If they are not equal, the first parameter value is returned. + */ +@ExpressionDescription( + usage = "_FUNC_(a,b) - Returns null if a equals to b, or a otherwise.") +case class NullIf(left: Expression, right: Expression) extends BinaryExpression { + override def nullable: Boolean = true + override def dataType: DataType = left.dataType + + override def eval(input: InternalRow): Any = { +val valueLeft = left.eval(input) +val valueRight = right.eval(input) +if (valueLeft.equals(valueRight)) { + null +} else { + valueLeft +} + } + + override def genCode(ctx: CodegenContext, ev: ExprCode): String = { +val leftGen = left.gen(ctx) +val rightGen = right.gen(ctx) +s""" + ${leftGen.code} + ${rightGen.code} + boolean ${ev.isNull} = false; + ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)}; + if (${ctx.genEqual(dataType, leftGen.value, rightGen.value)}) { +${ev.isNull} = true; + } else { +${ev.value} = ${leftGen.value}; + } +""" + } +} + +/** + * An Expression accepts two parameters and returns the second parameter if the value + * in the first parameter is null; if the first parameter is any value other than null, + * it is returned unchanged. + */ +@ExpressionDescription( + usage = "_FUNC_(a,b) - Returns b if a is null, or a otherwise.") +case class Nvl(left: Expression, right: Expression) extends BinaryExpression { --- End diff -- I will say, yes, kind of. Here is what I found: [difference](http://stackoverflow.com/questions/950084/oracle-differences-between-nvl-and-coalesce) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541] [SQL] SQL function: IFNULL, NULL...
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12373#issuecomment-210300785 I will address these issues tomorrow! Thank you all! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14460] [SQL] properly handling of colum...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12252#discussion_r59824077 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -246,13 +247,23 @@ object JdbcUtils extends Logging { } /** + * The utility to add quote to the column name based on its dialect + * @param dialect the JDBC dialect + * @param columnName the input column name + * @return the quoted column name + */ + private def quoteColumnName(dialect: JdbcDialect, columnName: String): String = { +dialect.quoteIdentifier(columnName) + } + + /** * Compute the schema string for this RDD. */ - def schemaString(df: DataFrame, url: String): String = { + def schemaString(dialect: JdbcDialect, df: DataFrame, url: String): String = { val sb = new StringBuilder() val dialect = JdbcDialects.get(url) --- End diff -- Thanks for pointing out. I've modified the codes. Please check it out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14460] [SQL] properly handling of colum...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12252#discussion_r59819746 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -246,13 +247,23 @@ object JdbcUtils extends Logging { } /** + * The utility to add quote to the column name based on its dialect + * @param dialect the JDBC dialect + * @param columnName the input column name + * @return the quoted column name + */ + private def quoteColumnName(dialect: JdbcDialect, columnName: String): String = { +dialect.quoteIdentifier(columnName) + } + + /** * Compute the schema string for this RDD. */ - def schemaString(df: DataFrame, url: String): String = { + def schemaString(dialect: JdbcDialect, df: DataFrame, url: String): String = { val sb = new StringBuilder() val dialect = JdbcDialects.get(url) --- End diff -- The purpose to pass in dialect is to get proper quote for columns based on its data source. Any suggestion? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541] [SQL] [WIP] SQL function: IFNULL...
Github user bomeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12373#discussion_r59659273 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -128,6 +128,58 @@ case class IsNaN(child: Expression) extends UnaryExpression } /** + * An Expression accepts two parameters and returns null if both parameters are equal. + * If they are not equal, the first parameter value is returned. + */ +@ExpressionDescription( + usage = "_FUNC_(a,b) - Returns null if a equals to b, or a otherwise.") +case class NullIf(left: Expression, right: Expression) extends BinaryExpression { + override def nullable: Boolean = true + override def dataType: DataType = left.dataType + + override def eval(input: InternalRow): Any = { +val valueLeft = left.eval(input) +val valueRight = right.eval(input) +if (valueLeft.equals(valueRight)) { + null +} else { + valueLeft +} + } + + override def genCode(ctx: CodegenContext, ev: ExprCode): String = { +val leftGen = left.gen(ctx) +val rightGen = right.gen(ctx) +dataType match { --- End diff -- Thanks, @viirya ! That simplifies the logic! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14541] [SQL] [WIP] SQL function: IFNULL...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/12373 [SPARK-14541] [SQL] [WIP] SQL function: IFNULL, NULLIF, NVL and NVL2 ## What changes were proposed in this pull request? I am trying to implement functions `NULLIF` in this PR. The meaning of NULLIF can be found here: [NULLIF( )](https://oracle-base.com/articles/misc/null-related-functions#nullif) ## How was this patch tested? Test cases were added. ## JIRA related [SPAKR-14541](https://issues.apache.org/jira/browse/SPARK-14541) You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-14541 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12373.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12373 commit c479394e1a6aa1588544357a0aa76054cb813088 Author: bomeng <bm...@us.ibm.com> Date: 2016-04-13T22:53:47Z support of NULLIF() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14441] [SQL] Consolidate DDL tests
Github user bomeng commented on the pull request: https://github.com/apache/spark/pull/12347#issuecomment-209622770 Ok, not a problem. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org