[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

2018-09-06 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/22356
  
Thanks for taking my codes. Looks good. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21638: [SPARK-22357][CORE] SparkContext.binaryFiles ignore minP...

2018-09-04 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/21638
  
Here is the test code, not sure it is right or not --- 
```
  test("Number of partitions") {
sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local")
  .set("spark.files.maxPartitionBytes", "10")
  .set("spark.files.openCostInBytes", "0")
  .set("spark.default.parallelism", "1"))

val dir1 = Utils.createTempDir()
val dirpath1 = dir1.getAbsolutePath
val dir2 = Utils.createTempDir()
val dirpath2 = dir2.getAbsolutePath

val file1 = new File(dir1, "part-0")
val file2 = new File(dir1, "part-1")

Files.write("someline1 in file1\nsomeline2 in file1\nsomeline3 in 
file1", file1,
  StandardCharsets.UTF_8)
Files.write("someline1 in file2\nsomeline2 in file2\nsomeline3 in 
file2", file2,
  StandardCharsets.UTF_8)

assert(sc.binaryFiles(dirpath1, minPartitions = 1).getNumPartitions == 
2)
assert(sc.binaryFiles(dirpath1, minPartitions = 2).getNumPartitions == 
2)
assert(sc.binaryFiles(dirpath1, minPartitions = 50).getNumPartitions == 
2)
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21638: [SPARK-22357][CORE] SparkContext.binaryFiles igno...

2018-09-04 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/21638#discussion_r215022562
  
--- Diff: 
core/src/main/scala/org/apache/spark/input/PortableDataStream.scala ---
@@ -47,7 +47,7 @@ private[spark] abstract class StreamFileInputFormat[T]
   def setMinPartitions(sc: SparkContext, context: JobContext, 
minPartitions: Int) {
 val defaultMaxSplitBytes = 
sc.getConf.get(config.FILES_MAX_PARTITION_BYTES)
 val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES)
-val defaultParallelism = sc.defaultParallelism
+val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions)
--- End diff --

From the codes, you can see the calculation is just the intermediate result 
and this method won't return any value. Checking the split size does not make 
sense for this test case because it depends on multiple variables and this is 
just one of them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21638: [SPARK-22357][CORE] SparkContext.binaryFiles igno...

2018-09-04 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/21638#discussion_r215010040
  
--- Diff: 
core/src/main/scala/org/apache/spark/input/PortableDataStream.scala ---
@@ -47,7 +47,7 @@ private[spark] abstract class StreamFileInputFormat[T]
   def setMinPartitions(sc: SparkContext, context: JobContext, 
minPartitions: Int) {
 val defaultMaxSplitBytes = 
sc.getConf.get(config.FILES_MAX_PARTITION_BYTES)
 val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES)
-val defaultParallelism = sc.defaultParallelism
+val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions)
--- End diff --

I agree it is hard to test. I appreciate If anyone can give me some hints 
of how to do these (how to verify and where to put my test cases). 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22276: [SPARK-25242][SQL] make sql config setting fluent

2018-08-31 Thread bomeng
Github user bomeng closed the pull request at:

https://github.com/apache/spark/pull/22276


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22276: [SPARK-25242][SQL] make sql config setting fluent

2018-08-31 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/22276
  
Ok, closing


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22276: [SPARK-25242][SQL] make sql config setting fluent

2018-08-29 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/22276
  
The tests failed due to method signatures' change, but it should not affect 
the existing test cases and  existing usages.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22276: [SPARK-25242][SQL] make sql config setting fluent

2018-08-29 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/22276

[SPARK-25242][SQL] make sql config setting fluent

## What changes were proposed in this pull request?

User can now set conf more easily by doing this:
```
sparkSession.conf.set(...).set(...).unset(...)
```

## How was this patch tested?

More tests for those writings are added to the existing test cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark 25242

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22276.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22276


commit 45f438c650ae44662341f656378106bc31667f4d
Author: Bo Meng 
Date:   2018-08-29T23:10:09Z

SPARK-25242: make sql config setting fluent




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22127: [SPARK-25032][SQL] fix drop database issue

2018-08-17 Thread bomeng
Github user bomeng closed the pull request at:

https://github.com/apache/spark/pull/22127


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue

2018-08-16 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/22127
  
Good points. I will leave it open for any suggestions for improving the 
user experience.. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22127: [SPARK-25032][SQL] fix drop database issue

2018-08-16 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/22127

[SPARK-25032][SQL] fix drop database issue

## What changes were proposed in this pull request?
When user tries to drop the current database (other than default database), 
after the database is deleted, we should set the database to default. 

## How was this patch tested?
A new test case is added to cover this scenario.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark 25032

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22127.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22127


commit 825533682c98598409e537fa866dcdab915e3948
Author: Bo Meng 
Date:   2018-08-16T21:58:17Z

fix drop database issue




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22115: [SPARK-25082] [SQL] improve the javadoc for expm1()

2018-08-15 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/22115
  
I have already done the global search. That is the only place needs change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22115: [SPARK-25082] [SQL] improve the javadoc for expm1...

2018-08-15 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/22115

[SPARK-25082] [SQL] improve the javadoc for expm1()

## What changes were proposed in this pull request?
Correct the javadoc for expm1() function.

## How was this patch tested?
None. It is a minor issue.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark 25082

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22115.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22115


commit 089c31fcff1a5b84634f5de78c1bd440f738b2f4
Author: Bo Meng 
Date:   2018-08-16T00:09:32Z

improve the javadoc




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21638: [SPARK-22357][CORE] SparkContext.binaryFiles igno...

2018-07-23 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/21638#discussion_r204517923
  
--- Diff: 
core/src/main/scala/org/apache/spark/input/PortableDataStream.scala ---
@@ -47,7 +47,7 @@ private[spark] abstract class StreamFileInputFormat[T]
   def setMinPartitions(sc: SparkContext, context: JobContext, 
minPartitions: Int) {
 val defaultMaxSplitBytes = 
sc.getConf.get(config.FILES_MAX_PARTITION_BYTES)
 val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES)
-val defaultParallelism = sc.defaultParallelism
+val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions)
--- End diff --

BinaryFileRDD will set minPartitions, which will either be 
defaultMinPartitions, or the values you can set via binaryFiles(path, 
minPartitions) method. Eventually, this minPartitions value will be passed to 
setMinPartitions() method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21638: [SPARK-22357][CORE] SparkContext.binaryFiles ignore minP...

2018-07-18 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/21638
  
Either way works for me, but I think since this is not a private method, so 
people may use it in their own approach. The minimal change will be the best. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21638: [SPARK-22357][CORE] SparkContext.binaryFiles igno...

2018-07-17 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/21638#discussion_r202907829
  
--- Diff: 
core/src/main/scala/org/apache/spark/input/PortableDataStream.scala ---
@@ -47,7 +47,7 @@ private[spark] abstract class StreamFileInputFormat[T]
   def setMinPartitions(sc: SparkContext, context: JobContext, 
minPartitions: Int) {
 val defaultMaxSplitBytes = 
sc.getConf.get(config.FILES_MAX_PARTITION_BYTES)
 val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES)
-val defaultParallelism = sc.defaultParallelism
+val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions)
--- End diff --

you need to pass in the minPartitions to use this method, what do you mean 
minParititions is not set? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21638: [SPARK-22357][CORE] SparkContext.binaryFiles ignore minP...

2018-06-27 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/21638
  
@HyukjinKwon please review. thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21638: [SPARK-22357][CORE] SparkContext.binaryFiles igno...

2018-06-25 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/21638

[SPARK-22357][CORE] SparkContext.binaryFiles ignore minPartitions parameter

## What changes were proposed in this pull request?
Fix the issue that minPartitions was not used in the method.

## How was this patch tested?
I have not provided the additional test since the fix is very 
straightforward.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark 22357

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21638.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21638


commit b9eea4994c3ad151aa75ed03bbcf807bc3c4ded8
Author: Bo Meng 
Date:   2018-06-25T20:02:43Z

fix: SparkContext.binaryFiles ignore minPartitions parameter

commit 0fc35d4e0db34239cd3c52b0cf21445c59d2dede
Author: Bo Meng 
Date:   2018-06-25T20:04:58Z

should be max()




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper

2017-10-30 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/19614
  
I will fix the style shortly. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19614: update the location of reference paper

2017-10-30 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/19614

update the location of reference paper

## What changes were proposed in this pull request?
Update the url of reference paper.

## How was this patch tested?
It is comments, so nothing tested.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark 22399

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19614.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19614


commit ddc97efed418698b81cce70e8cd0498e46dbcd88
Author: bomeng <bm...@us.ibm.com>
Date:   2017-10-30T22:31:05Z

update the location of reference paper




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17470: [SPARK-20146][SQL] fix comment missing issue for ...

2017-03-29 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/17470

[SPARK-20146][SQL] fix comment missing issue for thrift server

## What changes were proposed in this pull request?

The column comment was missing while constructing the Hive TableSchema. 
This fix will preserve the original comment.

## How was this patch tested?

I have added a new test case to test the column with/without comment. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-20146

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17470.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17470


commit 69f2172e0c2e422aa88c1365c68786ab8abf1113
Author: bomeng <bm...@us.ibm.com>
Date:   2017-03-29T18:28:41Z

fix comment missing issue for thrift server




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13720: [SPARK-16004] [SQL] Correctly display "Last Access Time"...

2016-06-27 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13720
  
@cloud-fan please review again, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13720: [SPARK-16004] [SQL] improve the display of CatalogTable ...

2016-06-23 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13720
  
ok, i will work on it based on comments. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13720: [SPARK-16004] [SQL] improve the display of CatalogTable ...

2016-06-23 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13720
  
@cloud-fan Is this one worth to be fixed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12739: [SPARK-14955] [SQL] avoid stride value equals to zero

2016-06-23 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/12739
  
close this pr provided it was fixed by another pr. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12739: [SPARK-14955] [SQL] avoid stride value equals to ...

2016-06-23 Thread bomeng
Github user bomeng closed the pull request at:

https://github.com/apache/spark/pull/12739


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13140: [SPARK-15230] [SQL] distinct() does not handle column na...

2016-06-22 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13140
  
i do not know what happened to jenkin, looks the failure is irrelevant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13140: [SPARK-15230] [SQL] distinct() does not handle column na...

2016-06-22 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13140
  
@cloud-fan thanks for your concise codes!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13720: [SPARK-16004] [SQL] improve the display of Catalo...

2016-06-21 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13720#discussion_r67958472
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -127,7 +127,7 @@ case class CatalogTable(
 sortColumnNames: Seq[String] = Seq.empty,
 bucketColumnNames: Seq[String] = Seq.empty,
 numBuckets: Int = -1,
-owner: String = "",
+owner: String = System.getProperty("user.name"),
--- End diff --

User name will be complicated. It is set from current SessionState 
authenticator. I do not believe we have reached that yet. I will revert this 
part.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13720: [SPARK-16004] [SQL] improve the display of Catalo...

2016-06-21 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13720#discussion_r67957269
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -180,7 +180,8 @@ case class CatalogTable(
   Seq(s"Table: ${identifier.quotedString}",
 if (owner.nonEmpty) s"Owner: $owner" else "",
 s"Created: ${new Date(createTime).toString}",
-s"Last Access: ${new Date(lastAccessTime).toString}",
+"Last Access: " +
+  (if (lastAccessTime == -1) "UNKNOWN" else new 
Date(lastAccessTime).toString),
--- End diff --

Here is the code from Hive (it is using 0 as initial last access value):


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13720: [SPARK-16004] [SQL] improve the display of Catalo...

2016-06-21 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13720#discussion_r67812926
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -127,7 +127,7 @@ case class CatalogTable(
 sortColumnNames: Seq[String] = Seq.empty,
 bucketColumnNames: Seq[String] = Seq.empty,
 numBuckets: Int = -1,
-owner: String = "",
+owner: String = System.getProperty("user.name"),
--- End diff --

Let me check what HIVE does tomorrow and get it back to you. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13791: [SPARK-16084] [SQL] Minor Javadoc update for "DES...

2016-06-20 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/13791

[SPARK-16084] [SQL] Minor Javadoc update for "DESCRIBE" table

## What changes were proposed in this pull request?

1. FORMATTED is actually supported, but partition is not supported;
2. Remove parenthesis as it is not necessary just like anywhere else.

## How was this patch tested?

Minor issue. I do not think it needs a test case!

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-16084

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13791.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13791


commit 3638ffd0dbb93feb58c96b3163c52220aacf3981
Author: bomeng <bm...@us.ibm.com>
Date:   2016-06-20T22:40:50Z

minor comments fix

commit 5db284dab1aaced5f86cc6bed3e23e42e2c79b74
Author: bomeng <bm...@us.ibm.com>
Date:   2016-06-20T22:43:53Z

Revert "minor comments fix"

This reverts commit 3638ffd0dbb93feb58c96b3163c52220aacf3981.

commit e1a5f5421f92dc3ef5d39a189bdd0017b7633662
Author: bomeng <bm...@us.ibm.com>
Date:   2016-06-20T22:45:55Z

fix java doc issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13720: [SPARK-16004] [SQL] improve the display of CatalogTable ...

2016-06-20 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13720
  
@srowen please review. thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13720: [SPAKR-16004] [SQL] improve the disply of Catalog...

2016-06-16 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/13720

[SPAKR-16004] [SQL] improve the disply of CatalogTable information

## What changes were proposed in this pull request?

A few issues found when running "describe extended | formatted [tableName]" 
command:

1. The last access time is incorrectly displayed something like "Last 
Access Time: |Wed Dec 31 15:59:59 PST 1969", I think we should display as 
"UNKNOWN" as Hive does;

2. Owner is always empty, instead of the current login user, who creates 
the table;

3. Comments fields display "null" instead of empty string when commend is 
None;

## How was this patch tested?

Currently, I have manually tested them - it is very straight-forward to 
test, but hard to write test cases for them.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-16004

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13720.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13720


commit 358ac0d2e9b27bcf7c3d0448555497b60fc20dd5
Author: bomeng <bm...@us.ibm.com>
Date:   2016-06-16T23:21:39Z

improve the disply of CatalogTable information




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12739: [SPARK-14955] [SQL] avoid stride value equals to zero

2016-06-16 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/12739
  
@andrewor14 Hey Andrew, could you please review this one as well? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13695: [SPARK-15978] [SQL] improve 'show tables' command relate...

2016-06-16 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13695
  
Thanks for merging !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13695: [SPARK-15978] [SQL] improve 'show tables' command relate...

2016-06-16 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13695
  
@rxin could you please review it again? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13695: [SPARK-15978] [SQL] remove unnecessary format

2016-06-15 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/13695

[SPARK-15978] [SQL] remove unnecessary format

## What changes were proposed in this pull request?

I've found some minor issues in "show tables" command:

1. In the `SessionCatalog.scala`, `listTables(db: String)` method will call 
`listTables(formatDatabaseName(db), "*")` to list all the tables for certain 
db, but in the method `listTables(db: String, pattern: String)`, this db name 
is formatted once more. So I think we should remove 
`formatDatabaseName()` in the caller.

2. I suggest to add sort to listTables(db: String) in 
InMemoryCatalog.scala, just like listDatabases().


## How was this patch tested?

The existing test cases should cover it.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-15978

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13695.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13695


commit 2bbc919105e20a9c766f156e80fad18052395215
Author: bomeng <bm...@us.ibm.com>
Date:   2016-06-15T23:04:06Z

remove unnecessary format




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13671: [SPARK-15952] [SQL] fix "show databases" ordering issue

2016-06-14 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13671
  
thanks for merging!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13671: [SPARK-15952] [SQL] fix "show databases" ordering issue

2016-06-14 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13671
  
for issue 1, I have updated the existing test case for testing this (the 
original one just tests the count of the result). for issue 2, it is minor and 
just a text change. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13671: [SPARK-15952] [SQL] fix "show databases" ordering...

2016-06-14 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/13671

[SPARK-15952] [SQL] fix "show databases" ordering issue

## What changes were proposed in this pull request?

Two issues I've found for "show databases" commands:

1. The returned database name list was not sorted, it only works when 
"like" was used together; (HIVE will always return a sorted list)

2. When it is used as sql("show databases").show, it will output a table 
with column named as "result", but for sql("show tables").show, it will output 
the column name as "tableName", so I think we should be consistent and use 
"databaseName" at least.

## How was this patch tested?

Updated existing test case to test its ordering as well.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-15952

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13671.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13671


commit d6b0f860352cf9e4a71e746c7f9bd035e9e243e5
Author: bomeng <bm...@us.ibm.com>
Date:   2016-06-14T21:29:57Z

fix the ordering issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-12 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13543
  
@srowen Thanks for merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13533: [SPARK-15781] [Documentation] remove deprecated environm...

2016-06-12 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13533
  
@srowen Thanks for merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13543: [SPARK-15806] [Documentation] update doc for SPAR...

2016-06-11 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13543#discussion_r66701686
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala ---
@@ -20,18 +20,24 @@ package org.apache.spark.deploy.master
 import scala.annotation.tailrec
 
 import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
 import org.apache.spark.util.{IntParam, Utils}
 
 /**
  * Command-line parser for the master.
  */
-private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) {
+private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) extends Logging {
   var host = Utils.localHostName()
   var port = 7077
   var webUiPort = 8080
   var propertiesFile: String = null
 
   // Check for settings in environment variables
+  if (System.getenv("SPARK_MASTER_IP") != null) {
+logWarning("SPARK_MASTER_IP is deprecated, please use 
SPARK_MASTER_HOST")
+host = System.getenv("SPARK_MASTER_IP")
+  }
+
   if (System.getenv("SPARK_MASTER_HOST") != null) {
--- End diff --

Master.scala create an instance of MasterArguments (line 1008), and 
MasterArguments will read environment as its initial values (includes 
SPARK_MASTER_HOST), that is the original logic. User may not pass in --host and 
just use the SPARK_MASTER_HOST as its value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13543: [SPARK-15806] [Documentation] update doc for SPAR...

2016-06-10 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13543#discussion_r66647339
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala ---
@@ -20,18 +20,24 @@ package org.apache.spark.deploy.master
 import scala.annotation.tailrec
 
 import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
 import org.apache.spark.util.{IntParam, Utils}
 
 /**
  * Command-line parser for the master.
  */
-private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) {
+private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) extends Logging {
   var host = Utils.localHostName()
   var port = 7077
   var webUiPort = 8080
   var propertiesFile: String = null
 
   // Check for settings in environment variables
+  if (System.getenv("SPARK_MASTER_IP") != null) {
+logWarning("SPARK_MASTER_IP is deprecated, please use 
SPARK_MASTER_HOST")
+host = System.getenv("SPARK_MASTER_IP")
+  }
+
   if (System.getenv("SPARK_MASTER_HOST") != null) {
--- End diff --

As I found before, MasterArguments.scala is currently used by Master.scala, 
I think we need to keep SPARK_MASTER_HOST as for now. Please let me know how we 
should proceed for this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13543: [SPARK-15806] [Documentation] update doc for SPAR...

2016-06-09 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13543#discussion_r66493098
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala ---
@@ -20,18 +20,24 @@ package org.apache.spark.deploy.master
 import scala.annotation.tailrec
 
 import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
 import org.apache.spark.util.{IntParam, Utils}
 
 /**
  * Command-line parser for the master.
  */
-private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) {
+private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) extends Logging {
   var host = Utils.localHostName()
   var port = 7077
   var webUiPort = 8080
   var propertiesFile: String = null
 
   // Check for settings in environment variables
+  if (System.getenv("SPARK_MASTER_IP") != null) {
+logWarning("SPARK_MASTER_IP is deprecated, please use 
SPARK_MASTER_HOST")
+host = System.getenv("SPARK_MASTER_IP")
+  }
+
   if (System.getenv("SPARK_MASTER_HOST") != null) {
--- End diff --

MasterArguments.scala is used by Master.scala main() method, so there is a 
way to use `SPARK_MASTER_HOST


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13543: [SPARK-15806] [Documentation] update doc for SPAR...

2016-06-09 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13543#discussion_r66488409
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala ---
@@ -20,18 +20,24 @@ package org.apache.spark.deploy.master
 import scala.annotation.tailrec
 
 import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
 import org.apache.spark.util.{IntParam, Utils}
 
 /**
  * Command-line parser for the master.
  */
-private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) {
+private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) extends Logging {
   var host = Utils.localHostName()
   var port = 7077
   var webUiPort = 8080
   var propertiesFile: String = null
 
   // Check for settings in environment variables
+  if (System.getenv("SPARK_MASTER_IP") != null) {
+logWarning("SPARK_MASTER_IP is deprecated, please use 
SPARK_MASTER_HOST")
+host = System.getenv("SPARK_MASTER_IP")
+  }
+
   if (System.getenv("SPARK_MASTER_HOST") != null) {
--- End diff --

The code here is just to set its initial values and it may be changed by 
"--host" configuration. I think we should keep it there for now. For the 
warning message, we kind of always use logger, not sure it is a good idea to 
put into the script. I am open to your decision. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13543
  
Yes. I can add a warning if SPARK_MASTER_IP is set. Ideally we should use 
SPARK_MASTER_HOST in all places to avoid confusion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13543
  
Here is the link:  
[MasterArguments.scala](https://github.com/bomeng/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala#L56-L59)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13543
  
Please note that there are also some places still using SPARK_MASTER_IP, 
for example, start-master.sh, etc. I did not replace them, because it may break 
the current running script.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13543: [SPARK-15806] [Documentation] update doc for SPAR...

2016-06-07 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/13543

[SPARK-15806] [Documentation] update doc for SPARK_MASTER_IP

## What changes were proposed in this pull request?

SPARK_MASTER_IP is a deprecated environment variable. It is replaced by 
SPARK_MASTER_HOST according to MasterArguments.scala.

## How was this patch tested?

Manually verified.

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-15806

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13543.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13543


commit 239cdfc08e5ad28864574f9ddbcf8240dd5a51ff
Author: bomeng <bm...@us.ibm.com>
Date:   2016-06-07T16:03:19Z

update doc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13533: [SPARK-15781] [Documentation] remove deprecated environm...

2016-06-06 Thread bomeng
Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/13533
  
That could be another JIRA as we do not want to use one JIRA to fix all 
issues. Please file one if desired. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13533: [SPARK-17581] [ Documentation] remove deprecated ...

2016-06-06 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/13533

[SPARK-17581] [ Documentation] remove deprecated environment variable doc

## What changes were proposed in this pull request?

Like `SPARK_JAVA_OPTS` and `SPARK_CLASSPATH`, we will remove the document 
for `SPARK_WORKER_INSTANCES` to discourage user not to use them. If they are 
actually used, SparkConf will show a warning message as before.

## How was this patch tested?

Manually tested.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-15781

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13533.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13533


commit accc6f708059944d0a58c695cfe9f29501a77d0a
Author: bomeng <bm...@us.ibm.com>
Date:   2016-06-06T20:57:40Z

update doc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13475: [SPARK-15737] [CORE] fix jetty warning

2016-06-02 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/13475

[SPARK-15737] [CORE] fix jetty warning

## What changes were proposed in this pull request?

After upgrading the Jetty to 9.2, we always see "WARN 
org.eclipse.jetty.server.handler.AbstractHandler: No Server set for 
org.eclipse.jetty.server.handler.ErrorHandler" while running test cases. 

This PR will fix it. 

## How was this patch tested?

The existing test cases will cover it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-15737

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13475.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13475


commit 03707a2f3fbebbeec68bb4adbbe4b026d3ef9a69
Author: bomeng <bm...@us.ibm.com>
Date:   2016-06-02T21:27:39Z

fix jetty warning




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13141: [SPARK-14752] [SQL] fix kryo ordering serializati...

2016-06-02 Thread bomeng
Github user bomeng closed the pull request at:

https://github.com/apache/spark/pull/13141


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15537] [SQL] fix dir delete issue

2016-05-25 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13304#discussion_r64665909
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala ---
@@ -38,12 +39,12 @@ abstract class OrcSuite extends QueryTest with 
TestHiveSingleton with BeforeAndA
 super.beforeAll()
 
 orcTableAsDir = File.createTempFile("orctests", "sparksql")
-orcTableAsDir.delete()
+Utils.deleteRecursively(orcTableAsDir)
--- End diff --

Thanks for the comments. I will update the code shortly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15537] [SQL] fix dir delete issue

2016-05-25 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/13304

[SPARK-15537] [SQL] fix dir delete issue

## What changes were proposed in this pull request?

For some of the test cases, e.g. OrcSourceSuite, it will create temp 
folders and temp files inside them. But after tests finish, the folders are not 
removed. This will cause lots of temp files created and space occupied, if we 
keep running the test cases.

The reason is dir.delete() won't work if dir is not empty. We need to 
recursively delete the content before deleting the folder.

## How was this patch tested?

Manually checked the temp folder to make sure the temp files were deleted.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-15537

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13304.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13304


commit 878b0cca355b21e84f08e8fc32f195485f1df14a
Author: Bo Meng <men...@hotmail.com>
Date:   2016-05-25T21:35:28Z

fix dir delete issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15468] [SQL] some some typos

2016-05-21 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13246#discussion_r64142270
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -227,8 +227,8 @@ object IntegerIndex {
  *  - Unnamed grouping expressions are named so that they can be referred 
to across phases of
  *aggregation
  *  - Aggregations that appear multiple times are deduplicated.
- *  - The compution of the aggregations themselves is separated from the 
final result. For example,
- *the `count` in `count + 1` will be split into an 
[[AggregateExpression]] and a final
+ *  - The computation of the aggregations themselves is separated from the 
final result. For
+ *example, the `count` in `count + 1` will be split into an 
[[AggregateExpression]] and a final
--- End diff --

This is just needed for 100-char line limit as previous line fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15468] [SQL] some some typos

2016-05-21 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/13246

[SPARK-15468] [SQL] some some typos

## What changes were proposed in this pull request?

Fix some typos while browsing the codes.

## How was this patch tested?

None and obvious.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark typo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13246.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13246


commit ff73a8ddc036e1d8edf7eaa3be2e39db4b17d67f
Author: bomeng <bm...@us.ibm.com>
Date:   2016-05-19T01:32:27Z

fix typo

commit 6b05bc95623483f96757a917508fc3737b20bc90
Author: Bo Meng <men...@hotmail.com>
Date:   2016-05-20T18:48:17Z

Merge remote-tracking branch 'upstream/master' into typo

commit 3a5797544792557a6a143784277753f4d93dd031
Author: Bo Meng <men...@hotmail.com>
Date:   2016-05-21T22:32:12Z

Merge remote-tracking branch 'upstream/master' into typo




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14752][SQL] LazilyGenerateOrdering thro...

2016-05-16 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12661#issuecomment-219582450
  
Since this one has been here for more than 10 days, I've provided another 
approach with new test case. Please take a look. Thanks.
[PR for SPARK-14752](https://github.com/apache/spark/pull/13141)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14752] [SQL] fix kryo ordering serializ...

2016-05-16 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/13141

[SPARK-14752] [SQL] fix kryo ordering serialization

## What changes were proposed in this pull request?

When using Kryo as serializer and we will get `NullPointerException` 
exception for query with `ORDER BY'.

## How was this patch tested?

I've added a new test cases to HashedRelationSuite.scala, since this issue 
is kind of related to Spark-14521.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-14752

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13141.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13141


commit 66f0e6c352bae9e65eadada19b1cfead8b06b3aa
Author: bomeng <bm...@us.ibm.com>
Date:   2016-05-16T23:42:22Z

fix kryo serialization




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15230] [SQL] distinct() does not handle...

2016-05-16 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/13140

[SPARK-15230] [SQL] distinct() does not handle column name with dot properly

## What changes were proposed in this pull request?

When table is created with column name containing dot, distinct() will fail 
to run. For example,
```scala
val rowRDD = sparkContext.parallelize(Seq(Row(1), Row(1), Row(2)))
val schema = StructType(Array(StructField("column.with.dot", IntegerType, 
nullable = false)))
val df = spark.createDataFrame(rowRDD, schema)
```
running the following will have no problem: 
```scala
df.select(new Column("`column.with.dot`"))
```
but running the query with additional distinct() will cause exception:
```scala
df.select(new Column("`column.with.dot`")).distinct()
```

The issue is that distinct() will try to resolve the column name, but the 
column name in the schema does not have backtick with it. So the solution is to 
add the backtick before passing the column name to resolve().

## How was this patch tested?

Added a new test case.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-15230

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13140.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13140


commit 2f7ffbd58a3437898f32e7603ca6b603f5fd5088
Author: bomeng <bm...@us.ibm.com>
Date:   2016-05-16T20:37:54Z

fix distinct()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14897] [SQL] upgrade to jetty 9.2.16

2016-05-12 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12916#discussion_r63060249
  
--- Diff: yarn/pom.xml ---
@@ -102,6 +102,10 @@
   org.eclipse.jetty
   jetty-servlet
 
+
+  org.eclipse.jetty
+  jetty-servlets
+
--- End diff --

yes, we need both, they have different content.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14897] [SQL] upgrade to jetty 9.2.16

2016-05-12 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12916#discussion_r63003988
  
--- Diff: core/pom.xml ---
@@ -125,12 +125,17 @@
   jetty-servlet
   compile
 
+
+  org.eclipse.jetty
+  jetty-servlets
+  compile
+
 

[GitHub] spark pull request: [SPARK-14897] [SQL] upgrade to jetty 9.2.16

2016-05-12 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12916#issuecomment-218668164
  
@srowen sorry for the late reply, I did not notice it. I have run the mvn 
dependency:tree and only javax.servlet-api 3.1.0 is listed, so it should be 
fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14897] [SQL] upgrade to jetty 9.2.16

2016-05-12 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12916#discussion_r62968602
  
--- Diff: core/pom.xml ---
@@ -125,12 +125,17 @@
   jetty-servlet
   compile
 
+
+  org.eclipse.jetty
+  jetty-servlets
+  compile
+
 

[GitHub] spark pull request: [SPARK-14897] [SQL] upgrade to jetty 9.2.16

2016-05-10 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12916#issuecomment-218340455
  
@srowen Finally I've got it working. Servlet and Derby were upgraded as 
well due to requirement of Jetty. Please review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14897] [SQL] [WIP] upgrade to jetty 9.2...

2016-05-10 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12916#issuecomment-218320176
  
retest please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14897] [SQL] [WIP] upgrade to jetty 9.2...

2016-05-05 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12916#issuecomment-217338702
  
The test failure was caused by timeout for HiveThriftHttpServerSuite and 
SingleSessionSuite... have not figure out the cause, any suggestion will be 
welcome. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14897] [SQL] upgrade to jetty 9.2.16

2016-05-04 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/12916

[SPARK-14897] [SQL] upgrade to jetty 9.2.16

## What changes were proposed in this pull request?

Since Jetty 8 is EOL (end of life) and has critical security issue 
[http://www.securityweek.com/critical-vulnerability-found-jetty-web-server], I 
think upgrading to 9 is necessary. I am using latest 9.2 since 9.3 requires 
Java 8+.

## How was this patch tested?

Manual test and current test cases should cover it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-14897

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12916.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12916


commit adba870ed702d4bd53292f240785fbd86484bb9d
Author: bomeng <bm...@us.ibm.com>
Date:   2016-05-04T23:55:54Z

upgrade to jetty 9.2.16




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15062] [SQL] fix list type infer serial...

2016-05-02 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12849#issuecomment-216384799
  
Making the changes based on the comments. Will post it shortly. List[_] 
should be supported as Seq[_], for now, you can use Seq[_] as workaround. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15062] [SQL] fix list type infer serial...

2016-05-02 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/12849

[SPARK-15062] [SQL] fix list type infer serializer issue

## What changes were proposed in this pull request?

Make serializer correctly inferred if the input type is List[_], since 
List[_] is type of Seq[_], before it was matched to different case (case t if 
definedByConstructorParams(t)).

## How was this patch tested?

New test case was added.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-15062

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12849.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12849


commit 5869b95b41e27b90a8bc64d774c93966659f9226
Author: bomeng <bm...@us.ibm.com>
Date:   2016-05-02T21:08:08Z

fix list type infer serializer issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14955] [SQL] avoid stride value equals ...

2016-04-28 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12739#issuecomment-215450711
  
@srowen Please review again. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14955] [SQL] avoid stride value equals ...

2016-04-27 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12739#discussion_r61342768
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
 ---
@@ -54,15 +54,22 @@ private[sql] object JDBCRelation {
   def columnPartition(partitioning: JDBCPartitioningInfo): 
Array[Partition] = {
 if (partitioning == null) return Array[Partition](JDBCPartition(null, 
0))
 
+// make sure the input is valid
+val lower = partitioning.lowerBound
+val upper = partitioning.upperBound
 val numPartitions = partitioning.numPartitions
 val column = partitioning.column
+require(lower < upper, "lower bound must be less than upper bound")
+require(numPartitions > 0, "number of partition must be great than 
zero")
+
 if (numPartitions == 1) return Array[Partition](JDBCPartition(null, 0))
-// Overflow and silliness can happen if you subtract then divide.
-// Here we get a little roundoff, but that's (hopefully) OK.
-val stride: Long = (partitioning.upperBound / numPartitions
-  - partitioning.lowerBound / numPartitions)
+
+val stride: Long = {
--- End diff --

Cool, I will update it shortly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14955] [SQL] avoid stride value equals ...

2016-04-27 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/12739

[SPARK-14955] [SQL] avoid stride value equals to zero

## What changes were proposed in this pull request?

In the columnPartition() method of JDBCRelation, stride is used for 
calculating the increment. But in some cases, this value could be zero, for 
example, lowerBound=0, upperBound=7, numOfPartition=8, which put all the data 
into one partition (last partition).

This fix will try to make stride calculation more robust. I have also added 
require() to validate the input. equals() is added for overriding the parents 
equals() method, together with hashCode().

I have also fixed some text style, make keywords all uppercase.

## How was this patch tested?

New test cases were added to JDBCSuite.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-14955

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12739.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12739


commit b4df4b0626a4bab197eb249fea99283ba4afd293
Author: bomeng <bm...@us.ibm.com>
Date:   2016-04-27T18:31:28Z

fix stride




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14806] [SQL] Alias original Hive option...

2016-04-26 Thread bomeng
Github user bomeng closed the pull request at:

https://github.com/apache/spark/pull/12607


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14806] [SQL] Alias original Hive option...

2016-04-26 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12607#issuecomment-214958322
  
closing it. thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14928] [SQL] support substitution in SE...

2016-04-26 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12709#issuecomment-214906151
  
Yes, I missed that. Parser already handles it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14928] [SQL] support substitution in SE...

2016-04-26 Thread bomeng
Github user bomeng closed the pull request at:

https://github.com/apache/spark/pull/12709


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14928] [SQL] support substitution in SE...

2016-04-26 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/12709

[SPARK-14928] [SQL] support substitution in SET key=value

## What changes were proposed in this pull request?

In the `SET key=value` command, value can be defined as a variable and 
replaced by substitution.
Since we have `VARIABLE_SUBSTITUTE_ENABLED` and `VARIABLE_SUBSTITUTE_DEPTH` 
defined in the SQLConf, it is nice to use them in the SET command.

## How was this patch tested?

New test case was added to test this function.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-14928

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12709.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12709


commit 98f148dc902713df747915ba32703ac6a262226c
Author: bomeng <bm...@us.ibm.com>
Date:   2016-04-26T20:03:36Z

support substitution




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14806] [SQL] Alias original Hive option...

2016-04-26 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12607#issuecomment-214818178
  
@rxin I am open to your decision. I think it is still useful to allow user 
to use "SET" command by using spark.sql.variable.substitute as configuration. 
Currently, the "SET" command does not support that. Part of my codes is for 
fixing it, do you think it is still valid for that part? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14441] [SQL] Consolidate DDL tests

2016-04-25 Thread bomeng
Github user bomeng closed the pull request at:

https://github.com/apache/spark/pull/12347


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14441] [SQL] Consolidate DDL tests

2016-04-25 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12347#issuecomment-214518666
  
closing this PR. thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14806] [SQL] Alias original Hive option...

2016-04-22 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12607#issuecomment-213614415
  
I think you mean set the value of `spark.sql.variable.substitute` and read 
`spark.sql.variable.substitute` above. I will post another try shortly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14806] [SQL] support substitution in se...

2016-04-22 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12607#issuecomment-213533820
  
@rxin Just wanna confirm, you want to let user to do `SET 
hive.variable.substitute=true/false` in SQL? It will logWarning in 
`setConfWithCheck()` method and I just convert it there? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14806] [SQL] support substitution in se...

2016-04-22 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/12607

[SPARK-14806] [SQL] support substitution in set command

## What changes were proposed in this pull request?

Since we have spark.sql.variable.substitute as an alias of 
hive.variable.substitute, we will use it for the `SET` command.

## How was this patch tested?
Test was added to the existing one to test this new feature.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-14806

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12607.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12607


commit 982787607df966d98885eee56b636a9a9f9b208f
Author: bomeng <bm...@us.ibm.com>
Date:   2016-04-22T10:20:51Z

support substitution in set command

commit f306301148b957aeb1b48306bd06b4a65bdcc0b8
Author: bomeng <bm...@us.ibm.com>
Date:   2016-04-22T10:28:49Z

code improvement




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14819] [SQL] Improve SET / SET -v comma...

2016-04-21 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12583#discussion_r60685224
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -716,4 +716,8 @@ class DDLSuite extends QueryTest with SharedSQLContext 
with BeforeAndAfterEach {
 }
   }
 
+  test("set / set -v") {
+checkExistence(sql("set"), true, "env:", "system:")
--- End diff --

Found out that current test cases SQLQuerySuite and HiveQuerySuite have 
intensive test coverage for SET command.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14819] [SQL] Improve SET / SET -v comma...

2016-04-21 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12583#discussion_r60669363
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -716,4 +716,8 @@ class DDLSuite extends QueryTest with SharedSQLContext 
with BeforeAndAfterEach {
 }
   }
 
+  test("set / set -v") {
+checkExistence(sql("set"), true, "env:", "system:")
--- End diff --

SPARK_TESTING is not in sys.env nor sys.props while running the test cases. 
Any suggestions to add the test cases for that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14819] [SQL] Improve SET / SET -v comma...

2016-04-21 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/12583

[SPARK-14819] [SQL] Improve SET / SET -v command

## What changes were proposed in this pull request?

Currently `SET` and `SET -v` commands are similar to Hive `SET` command 
except the following difference:
1. The result is not sorted;
2. When using `SET` and `SET -v`, in addition to the Hive related 
properties, it will also list all the system properties and environment 
properties, which is very useful in some cases.

This JIRA is trying to make the current `SET` command more consistent to 
Hive output.

## How was this patch tested?

A new test case was added to test the output of SET and SET -v.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-14819

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12583.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12583


commit 01d3e5a545eeced71b82d26a8407ea5e1d8f49ab
Author: bomeng <bm...@us.ibm.com>
Date:   2016-04-21T20:17:53Z

improve SET / SET -v command




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541] [SQL] SQL function: IFNULL, NULL...

2016-04-19 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12373#issuecomment-212169908
  
@rxin Could you please take a look if you get a chance? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14398] [SQL] Audit non-reserved keyword...

2016-04-18 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12191#issuecomment-211543283
  
Yes, the reason for sorting the keywords is for ease of searching purpose. 
I have checked the generated codes and see the switch/case for each 
non-reserved words. But to my understanding, case A: case B: ... won't have 
performance difference as case A | B: ... this should be easily optimized by 
the compiler. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541] [SQL] SQL function: IFNULL, NULL...

2016-04-15 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12373#discussion_r59928147
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
 ---
@@ -128,6 +128,143 @@ case class IsNaN(child: Expression) extends 
UnaryExpression
 }
 
 /**
+ * An Expression accepts two parameters and returns null if both 
parameters are equal.
+ * If they are not equal, the first parameter value is returned.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a,b) - Returns null if a equals to b, or a otherwise.")
+case class NullIf(left: Expression, right: Expression) extends 
BinaryExpression {
+  override def nullable: Boolean = true
+  override def dataType: DataType = left.dataType
+
+  override def eval(input: InternalRow): Any = {
+val valueLeft = left.eval(input)
+val valueRight = right.eval(input)
+if (valueLeft.equals(valueRight)) {
+  null
+} else {
+  valueLeft
+}
+  }
+
+  override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
+val leftGen = left.gen(ctx)
+val rightGen = right.gen(ctx)
+s"""
+  ${leftGen.code}
+  ${rightGen.code}
+  boolean ${ev.isNull} = false;
+  ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
+  if (${ctx.genEqual(dataType, leftGen.value, rightGen.value)}) {
+${ev.isNull} = true;
+  } else {
+${ev.value} = ${leftGen.value};
+  }
+"""
+  }
+}
+
+/**
+ * An Expression accepts two parameters and returns the second parameter 
if the value
+ * in the first parameter is null; if the first parameter is any value 
other than null,
+ * it is returned unchanged.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a,b) - Returns b if a is null, or a otherwise.")
+case class Nvl(left: Expression, right: Expression) extends 
BinaryExpression {
--- End diff --

Did not notice this. I will do it shortly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541] [SQL] SQL function: IFNULL, NULL...

2016-04-15 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12373#issuecomment-210605312
  
I have revisited the codes and made the codes more robust. Heavily tested 
against different data types by using introducing testAllTypes2Values() with 2 
different values. PR description was updated. Javadoc was fixed. Please leave 
comments! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541] [SQL] SQL function: IFNULL, NULL...

2016-04-15 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12373#discussion_r59904696
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
 ---
@@ -128,6 +128,143 @@ case class IsNaN(child: Expression) extends 
UnaryExpression
 }
 
 /**
+ * An Expression accepts two parameters and returns null if both 
parameters are equal.
+ * If they are not equal, the first parameter value is returned.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a,b) - Returns null if a equals to b, or a otherwise.")
+case class NullIf(left: Expression, right: Expression) extends 
BinaryExpression {
+  override def nullable: Boolean = true
+  override def dataType: DataType = left.dataType
+
+  override def eval(input: InternalRow): Any = {
+val valueLeft = left.eval(input)
+val valueRight = right.eval(input)
+if (valueLeft.equals(valueRight)) {
+  null
+} else {
+  valueLeft
+}
+  }
+
+  override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
+val leftGen = left.gen(ctx)
+val rightGen = right.gen(ctx)
+s"""
+  ${leftGen.code}
+  ${rightGen.code}
+  boolean ${ev.isNull} = false;
+  ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
+  if (${ctx.genEqual(dataType, leftGen.value, rightGen.value)}) {
+${ev.isNull} = true;
+  } else {
+${ev.value} = ${leftGen.value};
+  }
+"""
+  }
+}
+
+/**
+ * An Expression accepts two parameters and returns the second parameter 
if the value
+ * in the first parameter is null; if the first parameter is any value 
other than null,
+ * it is returned unchanged.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a,b) - Returns b if a is null, or a otherwise.")
+case class Nvl(left: Expression, right: Expression) extends 
BinaryExpression {
--- End diff --

I will say, yes, kind of. Here is what I found: 
[difference](http://stackoverflow.com/questions/950084/oracle-differences-between-nvl-and-coalesce)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541] [SQL] SQL function: IFNULL, NULL...

2016-04-15 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12373#issuecomment-210300785
  
I will address these issues tomorrow! Thank you all!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14460] [SQL] properly handling of colum...

2016-04-14 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12252#discussion_r59824077
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -246,13 +247,23 @@ object JdbcUtils extends Logging {
   }
 
   /**
+   * The utility to add quote to the column name based on its dialect
+   * @param dialect the JDBC dialect
+   * @param columnName the input column name
+   * @return the quoted column name
+   */
+  private def quoteColumnName(dialect: JdbcDialect, columnName: String): 
String = {
+dialect.quoteIdentifier(columnName)
+  }
+
+  /**
* Compute the schema string for this RDD.
*/
-  def schemaString(df: DataFrame, url: String): String = {
+  def schemaString(dialect: JdbcDialect, df: DataFrame, url: String): 
String = {
 val sb = new StringBuilder()
 val dialect = JdbcDialects.get(url)
--- End diff --

Thanks for pointing out. I've modified the codes. Please check it out. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14460] [SQL] properly handling of colum...

2016-04-14 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12252#discussion_r59819746
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -246,13 +247,23 @@ object JdbcUtils extends Logging {
   }
 
   /**
+   * The utility to add quote to the column name based on its dialect
+   * @param dialect the JDBC dialect
+   * @param columnName the input column name
+   * @return the quoted column name
+   */
+  private def quoteColumnName(dialect: JdbcDialect, columnName: String): 
String = {
+dialect.quoteIdentifier(columnName)
+  }
+
+  /**
* Compute the schema string for this RDD.
*/
-  def schemaString(df: DataFrame, url: String): String = {
+  def schemaString(dialect: JdbcDialect, df: DataFrame, url: String): 
String = {
 val sb = new StringBuilder()
 val dialect = JdbcDialects.get(url)
--- End diff --

The purpose to pass in dialect is to get proper quote for columns based on 
its data source. Any suggestion? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541] [SQL] [WIP] SQL function: IFNULL...

2016-04-13 Thread bomeng
Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12373#discussion_r59659273
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
 ---
@@ -128,6 +128,58 @@ case class IsNaN(child: Expression) extends 
UnaryExpression
 }
 
 /**
+ * An Expression accepts two parameters and returns null if both 
parameters are equal.
+ * If they are not equal, the first parameter value is returned.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a,b) - Returns null if a equals to b, or a otherwise.")
+case class NullIf(left: Expression, right: Expression) extends 
BinaryExpression {
+  override def nullable: Boolean = true
+  override def dataType: DataType = left.dataType
+
+  override def eval(input: InternalRow): Any = {
+val valueLeft = left.eval(input)
+val valueRight = right.eval(input)
+if (valueLeft.equals(valueRight)) {
+  null
+} else {
+  valueLeft
+}
+  }
+
+  override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
+val leftGen = left.gen(ctx)
+val rightGen = right.gen(ctx)
+dataType match {
--- End diff --

Thanks, @viirya ! That simplifies the logic!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541] [SQL] [WIP] SQL function: IFNULL...

2016-04-13 Thread bomeng
GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/12373

[SPARK-14541] [SQL] [WIP] SQL function: IFNULL, NULLIF, NVL and NVL2

## What changes were proposed in this pull request?
I am trying to implement functions `NULLIF` in this PR. The meaning of 
NULLIF can be found here:
[NULLIF( 
)](https://oracle-base.com/articles/misc/null-related-functions#nullif)

## How was this patch tested?
Test cases were added.

## JIRA related
[SPAKR-14541](https://issues.apache.org/jira/browse/SPARK-14541)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-14541

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12373.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12373


commit c479394e1a6aa1588544357a0aa76054cb813088
Author: bomeng <bm...@us.ibm.com>
Date:   2016-04-13T22:53:47Z

support of NULLIF()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14441] [SQL] Consolidate DDL tests

2016-04-13 Thread bomeng
Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12347#issuecomment-209622770
  
Ok, not a problem. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >