[GitHub] spark pull request: [SPARK-5966]

2015-10-22 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/9220

[SPARK-5966]

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark working_on_spark-5966

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9220.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9220


commit 396c66a3d65a417618e4ce28c548cca6f028abc0
Author: Kevin Yu <q...@us.ibm.com>
Date:   2015-10-22T07:06:13Z

[SPARK-5966]




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5966][WIP]

2015-10-22 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/9220#discussion_r42811834
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/impatient.sc 
---
@@ -0,0 +1 @@
+1+1;
--- End diff --

Hello Josh: Sorry, it is unintentional. I have deleted the new file and 
push again. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5966][WIP]

2015-10-22 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/9220#discussion_r42813223
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -266,6 +266,11 @@ object SparkSubmit {
   }
 }
 
+// SPARK-5966, check deployMode CLUSTER and master local
+if (clusterManager == LOCAL && deployMode == CLUSTER) {
+  printErrorAndExit("Cluster deploy mode is not compatible with master 
\"local\"")
+}
--- End diff --

Hello Andrew: Thanks for pointing this out, I have made the code changes, 
run the test , and submit the pull request, can you help review? 
Kevin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

2015-11-15 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/9720

[SPARK-11447][SQL] change NullType to StringType during binaryComparison 
between NullType and StringType

During executing PromoteStrings rule, if one side of binaryComparison is 
StringType and the other side is not StringType, the current code will 
promote(cast) the StringType to DoubleType, and if the StringType doesn't 
contain the numbers, it will get null value. So if it is doing <=> (NULL-safe 
equal) with Null, it will not filter anything, caused the problem reported by 
this jira.

I proposal to the changes through this PR, can you review my code changes ? 

This problem only happen for <=>, other operators works fine.

scala> val filteredDF = df.filter(df("column") > (new 
Column(Literal(null
filteredDF: org.apache.spark.sql.DataFrame = [column: string]

scala> filteredDF.show
+--+
|column|
+--+
+--+

scala> val filteredDF = df.filter(df("column") === (new 
Column(Literal(null
filteredDF: org.apache.spark.sql.DataFrame = [column: string]

scala> filteredDF.show
+--+
|column|
+--+
+--+

scala> df.registerTempTable("DF")

scala> sqlContext.sql("select * from DF where 'column' = NULL")
res27: org.apache.spark.sql.DataFrame = [column: string]

scala> res27.show
+--+
|column|
+--+
+--+

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark working_on_spark-11447

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9720.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9720


commit b53b85cad4f5fced9ba003351d5a9af1eb5111fc
Author: Kevin Yu <q...@us.ibm.com>
Date:   2015-11-13T18:11:59Z

[SPARK-11447]Check NullType before Promote StringType

commit bb705cae18032fcee8f8a532be464f0a995b27cb
Author: Kevin Yu <q...@us.ibm.com>
Date:   2015-11-15T06:41:48Z

add testcase in ColumnExpressionSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

2015-11-17 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/9720#discussion_r45069001
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 ---
@@ -280,6 +280,12 @@ object HiveTypeCoercion {
   case p @ BinaryComparison(left @ DateType(), right @ 
TimestampType()) =>
 p.makeCopy(Array(Cast(left, StringType), Cast(right, StringType)))
 
+  // Checking NullType
+  case p @ BinaryComparison(left @ StringType(), right @ NullType()) =>
+p.makeCopy(Array(left, Literal.create(null, StringType)))
+  case p @ BinaryComparison(left @ NullType(), right @ StringType()) =>
+p.makeCopy(Array(Literal.create(null, StringType), right))
+
   case p @ BinaryComparison(left @ StringType(), right) if 
right.dataType != StringType =>
 p.makeCopy(Array(Cast(left, DoubleType), right))
--- End diff --

@yhuai @cloud-fan : sure, I will not do that. I will try to run more 
testing to see if anything is broken.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

2015-11-16 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/9720#issuecomment-157287070
  
@cloud-fan and @marmbrus @yhuai @nongli @liancheng : thanks for reviewing 
the fix. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

2015-11-16 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/9720#discussion_r45028090
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 ---
@@ -280,6 +280,12 @@ object HiveTypeCoercion {
   case p @ BinaryComparison(left @ DateType(), right @ 
TimestampType()) =>
 p.makeCopy(Array(Cast(left, StringType), Cast(right, StringType)))
 
+  // Checking NullType
+  case p @ BinaryComparison(left @ StringType(), right @ NullType()) =>
+p.makeCopy(Array(left, Literal.create(null, StringType)))
+  case p @ BinaryComparison(left @ NullType(), right @ StringType()) =>
+p.makeCopy(Array(Literal.create(null, StringType), right))
+
   case p @ BinaryComparison(left @ StringType(), right) if 
right.dataType != StringType =>
 p.makeCopy(Array(Cast(left, DoubleType), right))
--- End diff --

@cloud-fan : do you want me to open a new jira to look into this? The new 
jira/pr will focus on the rules in PromoteStrings  and ImplicitTypeCasts, as 
you suggested to reduce the redundant rules in PromoteStrings. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org




[GitHub] spark pull request: Working on spark 11827

2015-12-03 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/10125

Working on spark 11827

Hello : Can you help check this PR? I am adding support for the 
java.math.BigInteger for java bean code path. I saw internally spark is 
converting the BigInteger to BigDecimal in ColumnType.scala and 
CatalystRowConverter.scala. I use the similar way and convert the BigInteger to 
the BigDecimal. . 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark working_on_spark-11827

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10125.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10125


commit a67722094e8a9d0689ba022eb4f923e28791503e
Author: Kevin Yu <q...@us.ibm.com>
Date:   2015-12-01T16:38:09Z

adding java.math.BigInteger support for java bean

commit a58d92cd85719c6112c5cb0162be9b6104f9ba00
Author: Kevin Yu <q...@us.ibm.com>
Date:   2015-12-02T05:37:56Z

adding test case

commit f400a825f38a2e3559e9b4f63b4e58bdd17c5e3b
Author: Kevin Yu <q...@us.ibm.com>
Date:   2015-12-03T07:38:15Z

modify the JavaDataFrameSuite

commit 3db875a7d9a331d3a200d26338c956d694001046
Author: Kevin Yu <q...@us.ibm.com>
Date:   2015-12-03T07:50:43Z

clean the JavaDataFrameSuite

commit 0807550ae396231a19648c2f4db7e8946544d4a2
Author: Kevin Yu <q...@us.ibm.com>
Date:   2015-12-03T07:57:20Z

working on the JavaDataFrameSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2015-12-03 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10125#issuecomment-161707755
  
Hello Sean: I am sorry, I forgot to update the title and description. I 
have made the changes, please let me know if anything needs to be changed. 
Thanks.
Kevin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-15 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/10314

[SPARK-12317][SQL]Support configurable value in SQLConf file

Hello: adding the configure value for  AUTO_BROADCASTJOIN_THRESHOLD and 
DEFAULT_SIZE_IN_BYTES. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark working_on_spark-12317

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10314.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10314


commit 44e7fba419088a589cbeeb6cbf012f43ad49576c
Author: Kevin Yu <q...@us.ibm.com>
Date:   2015-12-15T19:25:28Z

fix spark jira 12317




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2015-12-16 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10125#discussion_r47871490
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -69,6 +68,16 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   }
 
   /**
+   * Set this Decimal to the given BigInt. Will have precision 38 and 
scale 0.
+   */
+  def set(intVal: BigInt): Decimal = {
+this.decimalVal = null
+this.longVal = intVal.toLong
--- End diff --

Hi Davies: Yes, we need to check the range, otherwise, it will cause 
overflow. Thanks. I will look into.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2015-12-16 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10125#discussion_r47871566
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -362,6 +371,8 @@ object Decimal {
 
   def apply(value: java.math.BigDecimal): Decimal = new 
Decimal().set(value)
 
+  def apply(value: java.math.BigInteger): Decimal = new 
Decimal().set((value))
--- End diff --

Sorry, I forgot to remove the extra (). I will correct it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2015-12-16 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10125#discussion_r47866879
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
 ---
@@ -326,6 +326,7 @@ object CatalystTypeConverters {
   val decimal = scalaValue match {
 case d: BigDecimal => Decimal(d)
 case d: JavaBigDecimal => Decimal(d)
+case d: BigInteger => Decimal(d)
--- End diff --

Hi Wenchen: Sure, I will add that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2015-12-16 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10125#discussion_r47866947
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
 ---
@@ -75,6 +75,7 @@ object JavaTypeInference {
   case c: Class[_] if c == classOf[java.lang.Boolean] => (BooleanType, 
true)
 
   case c: Class[_] if c == classOf[java.math.BigDecimal] => 
(DecimalType.SYSTEM_DEFAULT, true)
+  case c: Class[_] if c == classOf[java.math.BigInteger] => 
(DecimalType.SYSTEM_DEFAULT, true)
--- End diff --

Yes, I will use (38,0) for the BigInteger. No need to have scale. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...

2015-12-14 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10299#issuecomment-164603371
  
Hello Michael: I fixed the scala style issue, can you help re-run the test? 
Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...

2015-12-18 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10299#issuecomment-165710284
  
The failure is because of the changed project, will submit an updated patch 
tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-15 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10314#discussion_r47713626
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -93,7 +94,7 @@ private[spark] object SQLConf {
   isPublic: Boolean = true): SQLConfEntry[Int] =
   SQLConfEntry(key, defaultValue, { v =>
 try {
-  v.toInt
+  Utils.byteStringAsBytes(v).toInt
--- End diff --

Hello Sean: Thanks for your comment. Yes, you are right. There are other 
methods are not meaning to use memory sizes. (like COLUMN_BATCH_SIZE, etc).  
There are couple approaches, can  you suggest which way is preferable way for 
this problem or suggest a new way to fix this ?
1. we will document that [g|G|m|M|k|K] means memory size.

2. create a new method of intConf for AUTO_BROADCASTJOIN_THRESHOLD.

3.  create a rule to the parseByteString. like K/KB/M/MB means 1024, 
k/kb/m/mb means 1000.

Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...

2015-12-14 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/10299

[SPARK-12231][SQL]create a combineFilters' projection when we call 
buildPartitionedTableScan

Hello Michael & All: Here I am submitting another approach to solve this 
problem. Can you verify ? 

I think the problem is related to change from spark-10829,

before that PR change, the projects and filters are done inside 
buildPartitionedTableScan.
With that PR change, the filter expression divide to 3 parts, the filter 
left outside of the scan (combineFilters ) needs a different projection. 

So the fix is to create a combine projection for the outside filter and 
beyond. 

Thanks for your comments.  

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark working_on_spark-12231

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10299.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10299


commit 2333c6d3dffd580529705e33f5ccdc8871670c0f
Author: Kevin Yu <q...@us.ibm.com>
Date:   2015-12-14T19:51:35Z

another approach to fix this problem




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-15 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10314#issuecomment-165010421
  
@watermen thanks for your input. It is good idea if we decide to go with 
approach 2. create a new method of intConf for AUTO_BROADCASTJOIN_THRESHOLD.  
If we decide to go with approach 3, then we may need to change the 
parseByteString part to distinct lower and upper case. 

@srowen what do you think? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-15 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10314#issuecomment-165023639
  
@srowen Thanks Sean, I will create a new method of intConf for 
AUTO_BROADCASTJOIN_THRESHOLD. Will update the PR soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2016-01-04 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10314#issuecomment-168854235
  
@srowen Hello Sean: Sorry for taking so long. Can you review the code? 
Thanks. Kevin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurate value fo...

2016-01-05 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10314#issuecomment-169238048
  
@viirya @yhuai @srowen @marmbrus @concretevitamin: 
SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE was added by spark-9850, PR#9276, I am 
including Yin , 
Michael set the autoBroadcastJoinThreshold to 10 * 1024 * 1024 through 
PR3064,
spark-2393 set the autoBroadcastJoinThreshold to Int.
I am not sure which one to choose either, so I cc on the persons who 
introduce these two fields. Thanks for your input. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurate value fo...

2016-01-05 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10314#issuecomment-169220626
  
Hello @viirya : Good Point. I just test it, -1 and -1g will have different 
behavior. It will take -1, and throw IllegalArgumentException for -1g. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL] Support units (m,k,g) in SQ...

2016-01-06 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/10629

[SPARK-12317][SQL] Support units (m,k,g) in SQLConf

This PR is continue from previous closed PR 10314.

In this PR, SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE will be taken memory 
string conventions as input.

For example, the user can now specify 10g for 
SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE in SQLConf file. 

@marmbrus @srowen : Can you help review this code changes ? Thanks. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark spark-12317

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10629.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10629


commit a37a05e856b58a13ec13239ffc1a2050563102ea
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-01-07T04:20:27Z

Support units (m,k,g) in SQLConf




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support units (m,k,g) in SQL...

2016-01-06 Thread kevinyu98
Github user kevinyu98 closed the pull request at:

https://github.com/apache/spark/pull/10314


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL] Support units (m,k,g) in SQ...

2016-01-07 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10629#issuecomment-169902816
  
@rxin I am so sorry that I didn't reply earlier. The code passed the style 
check, I copied the code from the existing codes, and I thought the indentation 
2 is fine. So I am not sure how to make changes. But appreciated your help. 
Next time, I will raise the questions more quickly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurate value fo...

2016-01-05 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10314#issuecomment-169173265
  
Hello @srowen @marmbrus @viirya : I have made the code changes, and change 
the title based on the comments. Can you help review the codes? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2016-01-01 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10314#issuecomment-168317362
  
Hello Sean: Sorry for the delay. Yes, I have made most code changes, and I 
will try to finish it up soon and do more testing. Will keep you updated. 
Thanks, Happy New Year !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support units (m,k,g) in SQL...

2016-01-06 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10314#issuecomment-169438242
  
Hello @marmbrus, thanks. So your mean I can remove the code change for 
intMemConf, and keep the code for longMemConf for this jira? I will make the PR 
title and description changes. I need to close this PR, open another one, seems 
there is some issues when I did last git push.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...

2015-12-21 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10388#issuecomment-166407905
  
@marmbrus : Can you help take a look at this PR? Thanks for your review. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-22 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10314#issuecomment-166747545
  
@srowen Hello Sean: I have submit the new code, can you help review? Thanks 
a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...

2015-12-24 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10388#issuecomment-167179672
  
I delete the test cases from DataFrameNaFunctionsSuite.scala. I checked the 
previous failure, not sure why it is failed. I worked when I run the local test 
on my laptop. 
$ build/sbt "test-only org.apache.spark.sql.thriftserver"
..
[success] Total time: 296 s, completed Dec 24, 2015 6:11:20 PM

then I re-run the sql test buckets, seems fine.

$ build/sbt sql/test-only

[info] Passed: Total 1522, Failed 0, Errors 0, Passed 1522, Ignored 10
[success] Total time: 146 s, completed Dec 24, 2015 6:26:23 PM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...

2015-12-18 Thread kevinyu98
Github user kevinyu98 closed the pull request at:

https://github.com/apache/spark/pull/10299


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...

2015-12-18 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10299#issuecomment-165919581
  
I will create a new PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...

2015-12-18 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/10388

[SPARK-12231][SQL]create a combineFilters' projection when we call 
buildPartitionedTableScan

Hello Michael & All: 

We have some issues to submit the new codes in the other PR(#10299), so we 
closed that PR and open this one with the fix. 

The reason for the previous failure is that the projection for the scan 
when there is a filter that is not pushed down (the "left-over" filter) could 
be different, in elements or ordering, from the original projection.

With this new codes, the approach to solve this problem is:

Insert a new Project if the "left-over" filter is nonempty and (the 
original projection is not empty and the projection for the scan has more than 
one elements which could otherwise cause different ordering in projection).

We create 3 test cases to cover the otherwise failure cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark spark-12231

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10388.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10388


commit 2d56ac02eaff10972e5bc46f3b57cff993d60e24
Author: Kevin Yu <q...@us.ibm.com>
Date:   2015-12-18T23:31:05Z

another approach to fix this problem

commit 305739f872ba90ba9ef4f3ef6c4f812b4024d8e9
Author: Kevin Yu <q...@us.ibm.com>
Date:   2015-12-18T23:46:37Z

update comments




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-26 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10314#discussion_r48451909
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -100,6 +101,33 @@ private[spark] object SQLConf {
 }
   }, _.toString, doc, isPublic)
 
+def intMemConf(
--- End diff --

I will make the changes. I used the code format and run the scalastyle 
test, thought it passed the styles. I will look more carefully next time. 
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-26 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10314#discussion_r48451916
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -100,6 +101,33 @@ private[spark] object SQLConf {
 }
   }, _.toString, doc, isPublic)
 
+def intMemConf(
+key: String,
+defaultValue: Option[Int] = None,
+doc: String = "",
+isPublic: Boolean = true): SQLConfEntry[Int] =
+  SQLConfEntry(key, defaultValue, { v =>
+var isNegative: Boolean = false
+try {
+  isNegative = (v.toInt < 0)
+} catch {
+  case _: Throwable =>
+}
+if (!isNegative) {
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-26 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10314#discussion_r48451915
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -100,6 +101,33 @@ private[spark] object SQLConf {
 }
   }, _.toString, doc, isPublic)
 
+def intMemConf(
+key: String,
+defaultValue: Option[Int] = None,
+doc: String = "",
+isPublic: Boolean = true): SQLConfEntry[Int] =
+  SQLConfEntry(key, defaultValue, { v =>
+var isNegative: Boolean = false
+try {
+  isNegative = (v.toInt < 0)
+} catch {
+  case _: Throwable =>
--- End diff --

yah, I want to catch the exception, then do nothing. I will make the 
changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-26 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10314#discussion_r48451911
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -100,6 +101,33 @@ private[spark] object SQLConf {
 }
   }, _.toString, doc, isPublic)
 
+def intMemConf(
+key: String,
+defaultValue: Option[Int] = None,
+doc: String = "",
+isPublic: Boolean = true): SQLConfEntry[Int] =
+  SQLConfEntry(key, defaultValue, { v =>
+var isNegative: Boolean = false
--- End diff --

ok, I will make the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-26 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10314#discussion_r48451918
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -100,6 +101,33 @@ private[spark] object SQLConf {
 }
   }, _.toString, doc, isPublic)
 
+def intMemConf(
+key: String,
+defaultValue: Option[Int] = None,
+doc: String = "",
+isPublic: Boolean = true): SQLConfEntry[Int] =
+  SQLConfEntry(key, defaultValue, { v =>
+var isNegative: Boolean = false
+try {
+  isNegative = (v.toInt < 0)
+} catch {
+  case _: Throwable =>
+}
+if (!isNegative) {
+  if ((Utils.byteStringAsBytes(v) <= Int.MaxValue.toLong) &&
--- End diff --

I will put it in a variable. thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-26 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10314#discussion_r48451947
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -107,7 +135,7 @@ private[spark] object SQLConf {
 isPublic: Boolean = true): SQLConfEntry[Long] =
   SQLConfEntry(key, defaultValue, { v =>
 try {
-  v.toLong
+  Utils.byteStringAsBytes(v)
--- End diff --

sorry, I thought it was only one place. But actually there are two places. 
I will create a new method for SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...

2015-12-22 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10388#discussion_r48231314
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameNaFunctionsSuite.scala ---
@@ -194,4 +194,45 @@ class DataFrameNaFunctionsSuite extends QueryTest with 
SharedSQLContext {
 assert(out1(4) === Row("Amy", null, null))
 assert(out1(5) === Row(null, null, null))
   }
+
+  test("Spark-12231: dropna with partitionBy and groupBy") {
--- End diff --

You are right, this problem is not related to na.drop. At that time, I was 
not sure where I can put the test case, so I just keep here. Thanks for the 
suggestion about putting with the other DataSource tests, I will change the 
test case and look for the place around the DataSource tests. Thanks very much !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...

2015-12-22 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10388#discussion_r48230827
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -88,16 +88,27 @@ private[sql] object DataSourceStrategy extends Strategy 
with Logging {
 s"Selected $selected partitions out of $total, pruned 
$percentPruned% partitions."
   }
 
+  // need to add projections from combineFilters in
+  val combineFilter = combineFilters.reduceLeftOption(expressions.And)
--- End diff --

Will change the name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...

2015-12-22 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10388#discussion_r48230905
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -88,16 +88,27 @@ private[sql] object DataSourceStrategy extends Strategy 
with Logging {
 s"Selected $selected partitions out of $total, pruned 
$percentPruned% partitions."
   }
 
+  // need to add projections from combineFilters in
+  val combineFilter = combineFilters.reduceLeftOption(expressions.And)
+  val combinedProjects = 
combineFilter.map(_.references.toSet.union(projects.toSet).toSeq)
+.getOrElse(projects)
   val scan = buildPartitionedTableScan(
 l,
-projects,
+combinedProjects,
 pushedFilters,
 t.partitionSpec.partitionColumns,
 selectedPartitions)
 
-  combineFilters
-.reduceLeftOption(expressions.And)
-.map(execution.Filter(_, scan)).getOrElse(scan) :: Nil
+  // Add a Projection to guarantee the original projection:
+  // this is because "combinedProjects" may be different from the
+  // original "projects", in elements or their ordering
--- End diff --

Thanks for the suggestion. Thought the == is 'shallow' compare, but it is 
'deep' equality checking. Will make the changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...

2015-12-22 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10388#discussion_r48230807
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameNaFunctionsSuite.scala ---
@@ -194,4 +194,45 @@ class DataFrameNaFunctionsSuite extends QueryTest with 
SharedSQLContext {
 assert(out1(4) === Row("Amy", null, null))
 assert(out1(5) === Row(null, null, null))
   }
+
+  test("Spark-12231: dropna with partitionBy and groupBy") {
+withTempPath { dir =>
+  val df = sqlContext.range(10)
+  val df1 = df.withColumn("a", $"id".cast("int"))
+  df1.write.partitionBy("id").parquet(dir.getCanonicalPath)
+  val df2 = sqlContext.read.parquet(dir.getCanonicalPath)
+  val group = df2.na.drop().groupBy().count().collect()
--- End diff --

Hi Michael: Sure, will change the testcase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...

2015-12-22 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10388#discussion_r48230771
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -88,16 +88,27 @@ private[sql] object DataSourceStrategy extends Strategy 
with Logging {
 s"Selected $selected partitions out of $total, pruned 
$percentPruned% partitions."
   }
 
+  // need to add projections from combineFilters in
+  val combineFilter = combineFilters.reduceLeftOption(expressions.And)
+  val combinedProjects = 
combineFilter.map(_.references.toSet.union(projects.toSet).toSeq)
--- End diff --

Hi Michael: Sure, will make the changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-22 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10314#issuecomment-166560467
  
@srowen Hello Sean: I am sorry that I have't been able to submit the PR 
yet. I will conti. work on it tomorrow. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...

2015-12-21 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10314#issuecomment-166326525
  
@srowen Hello Sean, yes, sorry for the delay. I will submit the updated PR 
today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13506: [SPARK-15763][SQL] Support DELETE FILE command na...

2016-06-04 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13506#discussion_r65805306
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1441,6 +1441,32 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
   }
 
   /**
+   * Delete a file to be downloaded with this Spark job on every node.
+   * The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+   * filesystems), or an HTTP, HTTPS or FTP URI.  To access the file in 
Spark jobs,
+   * use `SparkFiles.get(fileName)` to find its download location.
+   *
+   */
+  def deleteFile(path: String): Unit = {
--- End diff --

Hello Reynold: Sorry I am afraid that I misunderstood your previous 
comments. Does your mean the user should take the path from the LIST FILE 
command output, then use that path as the DELETE FILE command's path? If that 
is the case, the delete code will much simple. Thanks for your advice. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...

2016-06-08 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13555#discussion_r66366855
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -625,6 +625,21 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
   }
 }
   }
+
+  test("SPARK-15804: write out the metadata to parquet file") {
+val df = Seq((1, "abc"), (2, "hello")).toDF("a", "b")
+val md = new MetadataBuilder().putString("key", "value").build()
+val dfWithmeta = df.select('a, 'b.as("b", md))
+
+withTempPath { dir =>
+  val path = s"${dir.getCanonicalPath}/data"
+  dfWithmeta.write.parquet(path)
+
+  readParquetFile(path) { df =>
+assert(df.schema.json.contains("\"key\":\"value\""))
--- End diff --

@cloud-fan Thanks for your comments, I have changed the test case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...

2016-06-08 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13555#discussion_r66366487
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -625,6 +625,21 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
   }
 }
   }
+
+  test("SPARK-15804: write out the metadata to parquet file") {
+val df = Seq((1, "abc"), (2, "hello")).toDF("a", "b")
+val md = new MetadataBuilder().putString("key", "value").build()
+val dfWithmeta = df.select('a, 'b.as("b", md))
+
+withTempPath { dir =>
+  val path = s"${dir.getCanonicalPath}/data"
--- End diff --

ok, I will make change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...

2016-06-08 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13555#discussion_r66366470
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -625,6 +625,21 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
   }
 }
   }
+
+  test("SPARK-15804: write out the metadata to parquet file") {
+val df = Seq((1, "abc"), (2, "hello")).toDF("a", "b")
+val md = new MetadataBuilder().putString("key", "value").build()
+val dfWithmeta = df.select('a, 'b.as("b", md))
+
+withTempPath { dir =>
+  val path = s"${dir.getCanonicalPath}/data"
+  dfWithmeta.write.parquet(path)
+
+  readParquetFile(path) { df =>
+assert(df.schema.json.contains("\"key\":\"value\""))
--- End diff --

sure, I will do that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...

2016-06-08 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/13555

[SPARK-15804][SQL]Include metadata in the toStructType 

## What changes were proposed in this pull request?
The help function 'toStructType' in the AttributeSeq class doesn't include 
the metadata when it builds the StructField, so it causes this reported problem 
https://issues.apache.org/jira/browse/SPARK-15804?jql=project%20%3D%20SPARK 
when spark writes the the dataframe with the metadata to the parquet 
datasource.  

The code path is when spark writes the dataframe to the parquet datasource 
through the InsertIntoHadoopFsRelationCommand, spark will build the 
WriteRelation container, and it will call the help function 'toStructType' to 
create StructType which contains StructField, it should include the metadata 
there, otherwise, we will lost the user provide metadata. 


## How was this patch tested?

added test case in ParquetQuerySuite.scala


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark spark-15804

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13555.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13555


commit 3b44c5978bd44db986621d3e8511e9165b66926b
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-20T18:06:30Z

adding testcase

commit 18b4a31c687b264b50aa5f5a74455956911f738a
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-22T21:48:00Z

Merge remote-tracking branch 'upstream/master'

commit 4f4d1c8f2801b1e662304ab2b33351173e71b427
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-23T16:50:19Z

Merge remote-tracking branch 'upstream/master'
get latest code from upstream

commit f5f0cbed1eb5754c04c36933b374c3b3d2ae4f4e
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-23T22:20:53Z

Merge remote-tracking branch 'upstream/master'
adding trim characters support

commit d8b2edbd13ee9a4f057bca7dcb0c0940e8e867b8
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-25T20:24:33Z

Merge remote-tracking branch 'upstream/master'
get latest code for pr12646

commit 196b6c66b0d55232f427c860c0e7c6876c216a67
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-25T23:45:57Z

Merge remote-tracking branch 'upstream/master'
merge latest code

commit f37a01e005f3e27ae2be056462d6eb6730933ba5
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-27T14:15:06Z

Merge remote-tracking branch 'upstream/master'
merge upstream/master

commit bb5b01fd3abeea1b03315eccf26762fcc23f80c0
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-30T23:49:31Z

Merge remote-tracking branch 'upstream/master'

commit bde5820a181cf84e0879038ad8c4cebac63c1e24
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-04T03:52:31Z

Merge remote-tracking branch 'upstream/master'

commit 5f7cd96d495f065cd04e8e4cc58461843e45bc8d
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-10T21:14:50Z

Merge remote-tracking branch 'upstream/master'

commit 893a49af0bfd153ccb59ba50b63a232660e0eada
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-13T18:20:39Z

Merge remote-tracking branch 'upstream/master'

commit 4bbe1fd4a3ebd50338ccbe07dc5887fe289cd53d
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-17T21:58:14Z

Merge remote-tracking branch 'upstream/master'

commit b2dd795e23c36cbbd022f07a10c0cf21c85eb421
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-18T06:37:13Z

Merge remote-tracking branch 'upstream/master'

commit 8c3e5da458dbff397ed60fcb68f2a46d87ab7ba4
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-18T16:18:16Z

Merge remote-tracking branch 'upstream/master'

commit a0eaa408e847fbdc3ac5b26348588ee0a1e276c7
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-19T04:28:20Z

Merge remote-tracking branch 'upstream/master'

commit d03c940ed89795fa7fe1d1e9f511363b22cdf19d
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-19T21:24:33Z

Merge remote-tracking branch 'upstream/master'

commit d728d5e002082e571ac47292226eb8b2614f479f
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-24T20:32:57Z

Merge remote-tracking branch 'upstream/master'

commit ea104ddfbf7d180ed1bc53dd9a1005010264aa1f
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-25T22:52:57Z

Merge remote-tracking branch 'upstream/master'

commit 6ab1215b781ad0cccf1752f3a625b4e4e371c38e
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-27T17:18:46Z

Merge remote-tracking branch 'upstream/master'

commit 0c566533705331697eb1b287b30c8b16111f6fa2
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-06-

[GitHub] spark pull request #13506: [SPARK-15763][SQL] Support DELETE FILE command na...

2016-06-03 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/13506

[SPARK-15763][SQL] Support DELETE FILE command natively

## What changes were proposed in this pull request?
Hive supports  these cli commands to manage the resource [Hive 
Doc](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli) : 
`ADD/DELETE (FILE(s)|JAR(s) )` 
`LIST (FILE(S) [filepath ...] | JAR(S) [jarpath ...]) `

 but SPARK only supports two commands  
`ADD (FILE  | JAR )` 
`LIST (FILE(S) [filepath ...] | JAR(S) [jarpath ...])` for now.

This PR is to add the DELETE FILE command into Spark SQL and I will submit 
another PR for the DELETE JAR(s).

`DELETE FILE `

## **Example:**
**DELETE FILE**
```
scala> spark.sql("add file /Users/qianyangyu/myfile.txt")
res0: org.apache.spark.sql.DataFrame = []

scala> spark.sql("add file /Users/qianyangyu/myfile2.txt")
res1: org.apache.spark.sql.DataFrame = []

scala> spark.sql("list file")
res2: org.apache.spark.sql.DataFrame = [Results: string]

scala> spark.sql("list file").show(false)
+--+
|Results   |
+--+
|file:/Users/qianyangyu/myfile2.txt|
|file:/Users/qianyangyu/myfile.txt |
+--+
scala> spark.sql("delete file /Users/qianyangyu/myfile.txt")
res4: org.apache.spark.sql.DataFrame = []

scala> spark.sql("list file").show(false)
+--+
|Results   |
+--+
|file:/Users/qianyangyu/myfile2.txt|
+--+


scala> spark.sql("delete file /Users/qianyangyu/myfile2.txt")
res6: org.apache.spark.sql.DataFrame = []

scala> spark.sql("list file").show(false)
+---+
|Results|
+---+
+---+
```

## How was this patch tested?

Add test cases in Spark-SQL SPARK-Shell and SparkContext suites.


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark spark-15763

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13506.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13506


commit 3b44c5978bd44db986621d3e8511e9165b66926b
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-20T18:06:30Z

adding testcase

commit 18b4a31c687b264b50aa5f5a74455956911f738a
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-22T21:48:00Z

Merge remote-tracking branch 'upstream/master'

commit 4f4d1c8f2801b1e662304ab2b33351173e71b427
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-23T16:50:19Z

Merge remote-tracking branch 'upstream/master'
get latest code from upstream

commit f5f0cbed1eb5754c04c36933b374c3b3d2ae4f4e
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-23T22:20:53Z

Merge remote-tracking branch 'upstream/master'
adding trim characters support

commit d8b2edbd13ee9a4f057bca7dcb0c0940e8e867b8
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-25T20:24:33Z

Merge remote-tracking branch 'upstream/master'
get latest code for pr12646

commit 196b6c66b0d55232f427c860c0e7c6876c216a67
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-25T23:45:57Z

Merge remote-tracking branch 'upstream/master'
merge latest code

commit f37a01e005f3e27ae2be056462d6eb6730933ba5
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-27T14:15:06Z

Merge remote-tracking branch 'upstream/master'
merge upstream/master

commit bb5b01fd3abeea1b03315eccf26762fcc23f80c0
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-30T23:49:31Z

Merge remote-tracking branch 'upstream/master'

commit bde5820a181cf84e0879038ad8c4cebac63c1e24
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-04T03:52:31Z

Merge remote-tracking branch 'upstream/master'

commit 5f7cd96d495f065cd04e8e4cc58461843e45bc8d
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-10T21:14:50Z

Merge remote-tracking branch 'upstream/master'

commit 893a49af0bfd153ccb59ba50b63a232660e0eada
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-13T18:20:39Z

Merge remote-tracking branch 'upstream/master'

commit 4bbe1fd4a3ebd50338ccbe07dc5887fe289cd53d
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-17T21:58:14Z

Merge remote-tracking branch 'upstream/master'

commit b2dd795e23c36cbbd022f07a10c0cf21c85eb421
Author: Kevin Yu <q

[GitHub] spark pull request #13506: [SPARK-15763][SQL] Support DELETE FILE command na...

2016-06-04 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13506#discussion_r65799162
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1441,6 +1441,32 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
   }
 
   /**
+   * Delete a file to be downloaded with this Spark job on every node.
+   * The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+   * filesystems), or an HTTP, HTTPS or FTP URI.  To access the file in 
Spark jobs,
+   * use `SparkFiles.get(fileName)` to find its download location.
+   *
+   */
+  def deleteFile(path: String): Unit = {
--- End diff --

Hi Reynold: Thanks very much for reviewing the code. 
yes, it is deleting the path from the addedFile hashmap, the path will be 
generated as key and stored in the map. 
The addFile use this logical to generate the key and stored in the hashmap, 
so in order to find the same key, I have to use the same logical to generate 
the key. 
For example:
for this local file, the addFile will generate a 'file' in front of the 
path.

spark.sql("add file /Users/qianyangyu/myfile.txt")

scala> spark.sql("list file").show(false)
+--+
|Results   |
+--+
|file:/Users/qianyangyu/myfile2.txt|
|file:/Users/qianyangyu/myfile.txt |
+--+

but for the remote location file, it will just take the path.

scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt")
res17: org.apache.spark.sql.DataFrame = []

scala> spark.sql("list file").show(false)
+-+
|Results  |
+-+
|file:/Users/qianyangyu/myfile.txt|
|hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt|
+-+

if the command is issued from the worker node and add local file, the path 
will be added into the NettyStreamManager's hashmap and using that 
environment's path as key to store in the addedFiles. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13506: [SPARK-15763][SQL] Support DELETE FILE command na...

2016-06-06 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13506#discussion_r65981961
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1441,6 +1441,32 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
   }
 
   /**
+   * Delete a file to be downloaded with this Spark job on every node.
+   * The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+   * filesystems), or an HTTP, HTTPS or FTP URI.  To access the file in 
Spark jobs,
+   * use `SparkFiles.get(fileName)` to find its download location.
+   *
+   */
+  def deleteFile(path: String): Unit = {
--- End diff --

I have updated the deleteFile comments to make it more clear. Thanks for 
reviewing. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...

2016-06-08 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13555#discussion_r66264754
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -625,6 +625,22 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
   }
 }
   }
+
+  test("SPARK-15804: write out the metadata to parquet file") {
+val data = (1, "abc") ::(2, "helloabcde") :: Nil
+val df = spark.createDataFrame(data).toDF("a", "b")
+val md = new MetadataBuilder().putString("key", "value").build()
+val dfWithmeta = df.select(Column("a"), Column("b").as("b", md))
--- End diff --

I will change. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...

2016-06-08 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13555#discussion_r66264631
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -625,6 +625,22 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
   }
 }
   }
+
+  test("SPARK-15804: write out the metadata to parquet file") {
+val data = (1, "abc") ::(2, "helloabcde") :: Nil
+val df = spark.createDataFrame(data).toDF("a", "b")
--- End diff --

sure, I will do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...

2016-06-08 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13555#discussion_r66265069
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -625,6 +625,22 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
   }
 }
   }
+
+  test("SPARK-15804: write out the metadata to parquet file") {
+val data = (1, "abc") ::(2, "helloabcde") :: Nil
+val df = spark.createDataFrame(data).toDF("a", "b")
+val md = new MetadataBuilder().putString("key", "value").build()
+val dfWithmeta = df.select(Column("a"), Column("b").as("b", md))
+
+withTempPath { dir =>
+  val path = s"${dir.getCanonicalPath}/data"
+  dfWithmeta.write.parquet(path)
+
+  readParquetFile(path) { dfwithmeta2 =>
--- End diff --

ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...

2016-06-08 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13555#discussion_r66383008
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -625,6 +625,21 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
   }
 }
   }
+
+  test("SPARK-15804: write out the metadata to parquet file") {
+val df = Seq((1, "abc"), (2, "hello")).toDF("a", "b")
+val md = new MetadataBuilder().putString("key", "value").build()
+val dfWithmeta = df.select('a, 'b.as("b", md))
+
+withTempPath { dir =>
+  val path = s"${dir.getCanonicalPath}"
--- End diff --

Done, Thanks very much.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10777] [SQL]avoid checking nullability ...

2016-02-12 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/11184

[SPARK-10777] [SQL]avoid checking nullability for complex data type for 
typeSuffix code path

Hello: 

When we call the typeSuffix method, it will call the dataType and get the 
LongType for the suffix("L"). But in the complex data type(like CreatArray) 
cases, the dataType will also evaluate the children's nullability, which is not 
necessary for the typeSuffix. 

For the proposed fix, I will create a prettyDataType, in parallel to the 
dataType in the expression. At the base, it defaults to the dataType. It is 
only used by typeSuffix in the NameExpression for now.

So the main changes is at the typeSuffix in the NameExpression, and the 
complexTypeCreator. The rest of files just override the prettyDataType from the 
abstract class Expression when their "dataType" method relies on other 
expression's "dataType"

Then for those complex types, the "prettyDataType" does not try to evaluate 
the "nullable" but just pass a default that will not be used at all by the 
caller, "typeSuffix".
 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark working_on_spark-13253

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11184.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11184


commit dd82428191f8a312ec29f471a4230fa91212eadd
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-02-12T18:22:02Z

avoid checking nullability for complex data type




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10777] [SQL] Resolve Aliases in the Gro...

2016-01-27 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/10967

[SPARK-10777] [SQL] Resolve Aliases in the Group By clause 

@gatorsmile @yhuai @marmbrus @cloud-fan : Hello All, I tried to run the 
failing query with PR 10678 from Spark-12705, still got the same failure. 

Actually for this jira problem, I can recreate it without using order by 
and window function. It just needs select a column with aliases and aggregate 
function , group by with the aliases. 

the query looks like below:

select a  r, sum(b) s FROM testData2 GROUP BY r

(if I replace r in the group by with a, it will work)

I think this jira is different than Xiao's jira. 

For this Jira, it looks like the Aliases  in the Group By clause (r)  can't 
be resolved in the rule ResolveReferences. 

Currently, the ResolveReferences only deal with the aggregate function if 
the argument contains Stars, so for other aggregate function, it falls into 
this case: case q: LogicalPlan , and it will try to resolve it in the child. In 
this case, the group by contains alias r, the child is LogicalRDD contains 
column a and b, that is why we can't find r in the child.

Here is the plan looks like.

plan = {Aggregate@9173} "'Aggregate ['r], [a#4 AS r#43,(sum(cast(b#5 as 
bigint)),mode=Complete,isDistinct=false) AS s#44L]\n+- Subquery testData2\n   
+- LogicalRDD [a#4,b#5], MapPartitionsRDD[5] at beforeAll at 
BeforeAndAfterAll.scala:187\n"
 groupingExpressions = {$colon$colon@9176} "::" size = 1
  (0)  = {UnresolvedAttribute@9190} "'r"
 aggregateExpressions = {$colon$colon@9177} "::" size = 2
  (0)  = {Alias@9110} "a#4 AS r#43"
  (1)  = {Alias@9196} "(sum(cast(b#5 as 
bigint)),mode=Complete,isDistinct=false) AS s#44L"
 child = {Subquery@7456} "Subquery testData2\n+- LogicalRDD [a#4,b#5], 
MapPartitionsRDD[5] at beforeAll at BeforeAndAfterAll.scala:187\n"
  alias = {String@9201} "testData2"
  child = {LogicalRDD@9202} "LogicalRDD [a#4,b#5], MapPartitionsRDD[5] at 
beforeAll at BeforeAndAfterAll.scala:187\n"
  _analyzed = false
  resolved = true
  cleanArgs = null
  org$apache$spark$Logging$$log_ = null
  bitmap$0 = 1
  schema = null
  bitmap$0 = false
  origin = {Origin@9203} "Origin(Some(1),Some(27))"
  containsChild = {Set$Set1@9204} "Set$Set1" size = 1
  bitmap$0 = true
 resolved = false
 bitmap$0 = true
 _analyzed = false
 resolved = false

the proposal fix is that we create another case for aggregate function, if 
there is unresolved attribute in the groupingExpressions, and all the 
attributes are resolved in the aggregateExpressions, we will search the 
unresolved attribute in the aggregateExpressions first. 

Thanks for reviewing. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark working_on_spark-10777

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10967.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10967


commit c2fcaa8e488d12419c7b7c5032ccadab38f20b68
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-01-10T03:21:14Z

window function: Sorting columns are not in Project

commit 5ca463035bc6eaebd15e7cf332faeea157e5593e
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-01-10T03:30:58Z

style fix.

commit da6baf25488767ce6e73538b03f9195bba92b84e
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-01-10T06:23:48Z

code cleaning and address comments.

commit b5de0799650a86b8479eb053d7e3e65b23e5d75b
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-01-10T16:31:09Z

Merge remote-tracking branch 'upstream/master' into sortWindows

commit d164342747502b09686c1802cf9d24d8ed4c899e
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-01-13T06:15:31Z

address comments.

commit 27fcaa5ad6a3b4228ef4fc46b963c1e818d2f5c4
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-01-13T08:30:12Z

address comments.

commit 7fc98e49a26fd03f398b2241b4cfd19e969b770e
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-01-17T05:03:23Z

added a support to more operators.

commit 03112397437cf0f49eea8a347383d9d642e0995b
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-01-17T05:24:14Z

Merge remote-tracking branch 'upstream/master' into sortWindows

commit 522626bbd483054f441d2ca49bc06512901258ea
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-01-17T05:25:56Z

style fix.

commit 26945fa63809a8671461404eb2e661e1605dc196
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-01-17T07:14:3

[GitHub] spark pull request: [SPARK-12987][SQL]Fixing the name resolution i...

2016-02-02 Thread kevinyu98
GitHub user kevinyu98 reopened a pull request:

https://github.com/apache/spark/pull/11009

[SPARK-12987][SQL]Fixing the name resolution in drop column

@marmbrus @cloud-fan @thomas @jayadevan : Hello All: Can you help review 
this code fix? 

This problem is coming from drop column, after we drop off the old column, 
construct the new dataframe from the remaining columns. 

the new dataframe is using the Schema information to construct the column 
name, in this case, the name is string ‘a.c’. Since it is from Schema, we 
should take the name as it is, we should not do any parsing on the name. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark work_on_spark-12987

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11009.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11009


commit ff187c38234686ede0f859f4fe00a8013d8dc86f
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-02-02T00:08:01Z

Fixing the name resolution in drop column




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12987][SQL]Fixing the name resolution i...

2016-02-01 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/11009

[SPARK-12987][SQL]Fixing the name resolution in drop column

@marmbrus @cloud-fan @thomas @jayadevan : Hello All: Can you help review 
this code fix? 

This problem is coming from drop column, after we drop off the old column, 
construct the new dataframe from the remaining columns. 

the new dataframe is using the Schema information to construct the column 
name, in this case, the name is string ‘a.c’. Since it is from Schema, we 
should take the name as it is, we should not do any parsing on the name. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark work_on_spark-12987

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11009.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11009


commit ff187c38234686ede0f859f4fe00a8013d8dc86f
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-02-02T00:08:01Z

Fixing the name resolution in drop column




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12987][SQL]Fixing the name resolution i...

2016-02-03 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/11009#issuecomment-179478292
  
@marmbrus @cloud-fan @dilipbiswal @yzhou2001 : I have change the code based 
on Michael's comments, can you help review it again? 

Not sure why the first test failed, I run the sql test locally, it passed.

[info] Run completed in 3 minutes, 23 seconds.
[info] Total number of tests run: 1553
[info] Suites: completed 110, aborted 0
[info] Tests: succeeded 1553, failed 0, canceled 0, ignored 10, pending 0
[info] All tests passed.
[info] Passed: Total 1553, Failed 0, Errors 0, Passed 1553, Ignored 10




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12987][SQL]Fixing the name resolution i...

2016-02-02 Thread kevinyu98
Github user kevinyu98 closed the pull request at:

https://github.com/apache/spark/pull/11009


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12987][SQL]Fixing the name resolution i...

2016-02-02 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/11009#issuecomment-178712237
  
@marmbrus @cloud-fan @dilipbiswal @yzhou2001 : it seems this is the 
duplicate of [SPARK-12988][SQL] Can't drop columns that contain dots #10943, I 
will close this PR. thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...

2016-04-24 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/12646#issuecomment-214038288
  
Hello : I removed some invalid unit test cases and correct the error 
messages in the unit test cases. It passed the local tests. Can you retest it ? 
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...

2016-04-25 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/12646#issuecomment-214465479
  
Hello Dongjoon: Thanks for your comments, I will make changes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...

2016-04-25 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/12646#issuecomment-214521095
  
@dongjoon-hyun Hello Dongjoon: I have fix the comments, let me know if you 
see anything else I need to change. Also I did git fetch upstream, git merge 
upstream/master, and merge my branch with the latest master. Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...

2016-04-26 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/12646#issuecomment-214921975
  
retest please, I just did rebase to resolve the conflicts. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...

2016-04-28 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/12646#issuecomment-215444282
  
retest it please. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...

2016-04-23 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/12646#issuecomment-213853674
  
@hvanhovell @yhuai @chenghao-intel @gatorsmile @dilipbiswal @viirya  
@xwu0226 can you help take a look at this PR? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...

2016-04-23 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/12646

[SPARK-14878][SQL] Trim characters string function support




 What changes were proposed in this pull request?

This PR enhances the TRIM function support in Spark SQL by allowing the 
specification
of trim characters as per the SQL 2003 standard. Below is the SQL syntax :

``` SQL
 ::= TRIM   
 ::= [ [  ] [  ] FROM ] 

 ::= 
 ::=
  LEADING
| TRAILING
| BOTH
 ::= 
```
Here are the documentation link of support of this feature by other 
mainstream databases.
- **Oracle:** [TRIM 
function](http://docs.oracle.com/javadb/10.6.1.0/ref/rreftrimfunc.html)
- **DB2:** [TRIM scalar 
function](http://www.ibm.com/support/knowledgecenter/SSEPGG_9.8.0/com.ibm.db2.luw.sql.ref.doc/doc/r0023198.html)
- **MySQL:** [Trim 
function](http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_trim)

This PR is to implement the above enhancement. In the implementation, the 
design principle is to keep the changes to the minimum. Also, the exiting trim 
functions (which handles a special case, i.e., trimming space characters) are 
kept unchanged for performane reasons. 

 How was this patch tested?
The unit test cases are added in the following files:
- UTF8StringSuite.java
- StringExpressionsSuite.scala
- sql/SQLQuerySuite.scala
- StringFunctionsSuite.scala
- ExpressionToSQLSuite.scala
- execution/SQLQuerySuite.scala

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark spark-14878

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12646.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12646


commit ac718e268d6090fd788e5ec8addb10230cfae16b
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-06T21:38:53Z

draft of seq[expression]

commit c78ae966f30ac2437fe8292d9024adbef2f60860
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-07T02:20:23Z

trim with binaryExpression

commit c749691d532c0f09400c143379f1486c39fbaed8
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-08T06:11:00Z

utf8 string code change

commit 3c014a57daff15bb86995993de1bcdd0ab136fec
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-08T06:15:18Z

Merge branch 'trim-fun4' into trim-seqexp
I am using seq[expression] now

commit ae68402631b8325c5037fed8bf4b45599f8d3000
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-11T21:03:45Z

adding seq(expression)

commit 7bb9770a75ccddd69eaee4c06674aa64220d828b
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-11T22:19:36Z

fix2

commit 9525770c0e5bbba26b17d544653c8722ba261a37
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-15T23:12:51Z

trim character

commit 209bd195a9bc96889908b96ec26318b46a6d
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-16T00:13:26Z

fix style at utf8stringsuite

commit 4a49fcfa9ae102859ea78f0f4ec6d95a0d7855ed
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-16T17:10:16Z

simply trim method

commit 18c17b5bcb1e5574d58f46f1bf55defbbc1647ac
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-18T05:19:11Z

fixing style and simply code

commit 5833d26e8299efa6c47d4281eec7ea23f5dd3ec7
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-18T16:08:41Z

simply trimleft

commit 4e93a5032b352f3c0985d7e0fb362495077efdf7
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-18T16:26:02Z

fixing more styles

commit d6a1cb0dca88629d5d1d9ef8d05d08dcdb1089bc
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-19T16:12:04Z

fixing style3

commit 3b44c5978bd44db986621d3e8511e9165b66926b
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-20T18:06:30Z

adding testcase

commit 7dc5ecaf52936017ac739ba58fe4b7c9036570e6
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-22T01:44:26Z

fixing style 4

commit 25dbb2351bea034ffe300d94ea45c3277d399641
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-22T04:50:00Z

adding trim comments

commit 257303d5099dc405d5845bbcb9a5249d50aff018
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-22T06:42:04Z

fixing more style5

commit 11438c030a6066daf2caf6252b645ae6c464efee
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-22T16:27:50Z

fixing comments

commit de7bff8d1a654919a1f509aaf1c7a5799e1815b4
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-22T21:31:43Z

fixing more styles

commit 18b4a31c687b264b50aa5f5a74455956911f738a
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-22T21:48:00Z

Merge remote-tracking branch 'upstream/master'

commit 4f4d1c8f2801b1e662304ab2b33351173e71b427
Author: Ke

[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...

2016-04-30 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/12646#issuecomment-216018276
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-18 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10125#issuecomment-219945586
  
I just did rebase to solve the conflict. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-18 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10125#issuecomment-219945597
  
retest it please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-19 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10125#issuecomment-220346292
  
retest it please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-19 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10125#discussion_r63880607
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -109,6 +109,7 @@ object DecimalType extends AbstractDataType {
   val MAX_SCALE = 38
   val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 18)
   val USER_DEFAULT: DecimalType = DecimalType(10, 0)
+  val BIGINT_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 0)
--- End diff --

sure, I will do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-19 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10125#discussion_r63900108
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java ---
@@ -163,7 +168,9 @@ void validateDataFrameWithBeans(Bean bean, Dataset 
df) {
 Assert.assertEquals(
   new StructField("d", new ArrayType(DataTypes.StringType, true), 
true, Metadata.empty()),
   schema.apply("d"));
-Row first = df.select("a", "b", "c", "d").first();
+Assert.assertEquals(new StructField("e", 
DataTypes.createDecimalType(38,0), true, Metadata.empty()),
+  schema.apply("e"));
+Row first = df.select("a", "b", "c", "d","e").first();
--- End diff --

will add


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-19 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10125#discussion_r63900146
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java ---
@@ -182,6 +189,8 @@ void validateDataFrameWithBeans(Bean bean, Dataset 
df) {
 for (int i = 0; i < d.length(); i++) {
   Assert.assertEquals(bean.getD().get(i), d.apply(i));
 }
+  // Java.math.BigInteger is equavient to Spark Decimal(38,0)
+Assert.assertEquals(new BigDecimal(bean.getE()), 
first.getDecimal(4).setScale(0));
--- End diff --

will remove that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-19 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10125#issuecomment-220361041
  
I will push the latest one after jenkins finish. Thanks very much !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-19 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10125#issuecomment-220237290
  
@cloud-fan I tried, and it still fail. It didn't go through the 
createDataFrame you added in SparkSession. 
It went with  this createDataFrame(data: java.util.List[_], beanClass: 
Class[_]): DataFrame 
-> val rows = SQLContext.beansToRows(data.asScala.iterator, beanInfo, 
attrSeq)

the beanToRows will create internal rows and it is from SQLContext. 

Should we add RowEncoder into the beansToRows call or leave the code as it 
is ? Thanks.

here is the trace

scala.MatchError: 1234567 (of class java.math.BigInteger)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:326)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:323)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
at 
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:892)
at 
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:892)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at 
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:892)
at 
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:890)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.toStream(Iterator.scala:1322)
at scala.collection.AbstractIterator.toStream(Iterator.scala:1336)
at 
scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298)
at scala.collection.AbstractIterator.toSeq(Iterator.scala:1336)
at 
org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:373)
at 
test.org.apache.spark.sql.JavaDataFrameSuite.testCreateDataFrameFromLocalJavaBeans(JavaDataFrameSuite.java:200)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX][SPARK-15445] Build fails for java 1.7...

2016-05-20 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/13223#issuecomment-220639711
  
@techaddict @srowen @cloud-fan @gatorsmile : Hi Sandeep , thanks for fixing 
this. I didn't realize the method is java 1.8 only. The code looks good to me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-17 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10125#discussion_r63608338
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
 ---
@@ -321,11 +323,13 @@ object CatalystTypeConverters {
   }
 
   private class DecimalConverter(dataType: DecimalType)
-extends CatalystTypeConverter[Any, JavaBigDecimal, Decimal] {
+  extends CatalystTypeConverter[Any, JavaBigDecimal, Decimal] {
--- End diff --

sure, I will take this out. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-14 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10125#discussion_r63284859
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -129,6 +129,23 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   }
 
   /**
+   * Set this Decimal to the given BigInteger value. Will have precision 
38 and scale 0.
+   */
+  def set(BigIntVal: BigInteger): Decimal = {
--- End diff --

I will change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...

2016-05-13 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/12646#issuecomment-219164845
  
retest it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-18 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10125#discussion_r63655581
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
 ---
@@ -321,11 +323,13 @@ object CatalystTypeConverters {
   }
 
   private class DecimalConverter(dataType: DecimalType)
-extends CatalystTypeConverter[Any, JavaBigDecimal, Decimal] {
+  extends CatalystTypeConverter[Any, JavaBigDecimal, Decimal] {
--- End diff --

Hello Wenchen: I have to keep case d: JavaBigInteger => Decimal(d) there, 
otherwise, this testcase will fail with the java.math.BigInteger.

@Test
  public void testCreateDataFrameFromLocalJavaBeans() {
Bean bean = new Bean();
List data = Arrays.asList(bean);
Dataset df = spark.createDataFrame(data, Bean.class);
validateDataFrameWithBeans(bean, df);
  }
here is the trace

scala.MatchError: 1234567 (of class java.math.BigInteger)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:326)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:323)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
at 
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:892)
at 
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:892)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at 
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:892)
at 
org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:890)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.toStream(Iterator.scala:1322)
at scala.collection.AbstractIterator.toStream(Iterator.scala:1336)
at 
scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298)
at scala.collection.AbstractIterator.toSeq(Iterator.scala:1336)
at 
org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:373)
at 
test.org.apache.spark.sql.JavaDataFrameSuite.testCreateDataFrameFromLocalJavaBeans(JavaDataFrameSuite.java:200)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-18 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10125#issuecomment-220037011
  
@cloud-fan can you help take a look? I have made changes based on your 
comments. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-18 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10125#issuecomment-220078687
  
sure, I will do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-13 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10125#discussion_r63228007
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ScalaReflectionRelationSuite.scala 
---
@@ -34,7 +34,13 @@ case class ReflectData(
 decimalField: java.math.BigDecimal,
 date: Date,
 timestampField: Timestamp,
-seqInt: Seq[Int])
+seqInt: Seq[Int],
+javaBigInt: java.math.BigInteger,
+scalaBigInt: scala.math.BigInt)
+
+case class ReflectData3(
+ scalaBigInt: scala.math.BigInt
+ )
--- End diff --

I just removed that code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-13 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10125#issuecomment-219120180
  
@srowen @davies @cloud-fan I updated the code, can you help review? Sorry 
for the delay. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-13 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10125#issuecomment-219136812
  
retest it please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-13 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10125#issuecomment-219136788
  
I just run the ./dev/mima locally, it works,
[info] Done packaging.
[info] spark-examples: previous-artifact not set, not analyzing binary 
compatibility
[info] spark-mllib: found 0 potential binary incompatibilities while 
checking against org.apache.spark:spark-mllib_2.11:1.6.0  (filtered 500)
[info] spark-sql: found 0 potential binary incompatibilities while checking 
against org.apache.spark:spark-sql_2.11:1.6.0  (filtered 752)
[success] Total time: 231 s, completed May 13, 2016 12:22:16 PM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...

2016-05-04 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/12893

[Spark-15051] [SQL] Create a TypedColumn alias  

## What changes were proposed in this pull request?

Currently when we try to create an alias against an aggregator TypedColumn, 
it is using the alias' function from Column, the function will create a column 
with TypedAggregateExpression, it is unresolved because the inputDeserializer 
is not defined. But the aggregator function will inject the inputDeserializer 
back only if it is TypedColumn, so the TypedAggregateExpression will remain 
unresolved and caused the 
problem reported by this jira 
[15051](https://issues.apache.org/jira/browse/SPARK-15051?jql=project%20%3D%20SPARK).


This PR propose to create a TypedColumn's own alias function which will 
return TypedColumn , when it is used with aggregator function, the aggregator 
function will inject the inputDeserializer back .
 
## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)

Add test cases in DatasetAggregatorSuite.scala
run the sql related queries against this patch.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark spark-15051

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12893.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12893


commit 3b44c5978bd44db986621d3e8511e9165b66926b
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-20T18:06:30Z

adding testcase

commit 18b4a31c687b264b50aa5f5a74455956911f738a
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-22T21:48:00Z

Merge remote-tracking branch 'upstream/master'

commit 4f4d1c8f2801b1e662304ab2b33351173e71b427
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-23T16:50:19Z

Merge remote-tracking branch 'upstream/master'
get latest code from upstream

commit f5f0cbed1eb5754c04c36933b374c3b3d2ae4f4e
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-23T22:20:53Z

Merge remote-tracking branch 'upstream/master'
adding trim characters support

commit d8b2edbd13ee9a4f057bca7dcb0c0940e8e867b8
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-25T20:24:33Z

Merge remote-tracking branch 'upstream/master'
get latest code for pr12646

commit 196b6c66b0d55232f427c860c0e7c6876c216a67
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-25T23:45:57Z

Merge remote-tracking branch 'upstream/master'
merge latest code

commit f37a01e005f3e27ae2be056462d6eb6730933ba5
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-27T14:15:06Z

Merge remote-tracking branch 'upstream/master'
merge upstream/master

commit bb5b01fd3abeea1b03315eccf26762fcc23f80c0
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-04-30T23:49:31Z

Merge remote-tracking branch 'upstream/master'

commit 99027fa9cfd3e968bd5dc3808e8af7f8456e1f2d
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-04T03:51:36Z

fix

commit bde5820a181cf84e0879038ad8c4cebac63c1e24
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-04T03:52:31Z

Merge remote-tracking branch 'upstream/master'

commit cc8f34006c916d3a5deb50d3def9d6029b514683
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-04T03:53:53Z

Merge branch 'testing-jira' into spark-15051

commit 0a348415e708464ba101fb0eafa0306c01f23aee
Author: Kevin Yu <q...@us.ibm.com>
Date:   2016-05-04T07:54:00Z

fixing the typeColumn




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-06 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/10125#issuecomment-217526107
  
@srowen: sorry for the long delay. I will work on it now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...

2016-05-06 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/12893#issuecomment-217443570
  
test please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...

2016-05-04 Thread kevinyu98
Github user kevinyu98 commented on the pull request:

https://github.com/apache/spark/pull/12893#issuecomment-216911919
  
@cloud-fan can you help take a look at this pr? Thanks very much !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...

2016-05-04 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12893#discussion_r62150990
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
@@ -68,6 +68,25 @@ class TypedColumn[-T, U](
 }
 new TypedColumn[T, U](newExpr, encoder)
   }
+
+  /** Creates a TypedColumn based on the given expression. */
+  private def withExpr(newExpr: Expression): TypedColumn[T, U] =
--- End diff --

Sure, I will remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...

2016-05-05 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12893#discussion_r62151072
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
@@ -68,6 +68,25 @@ class TypedColumn[-T, U](
 }
 new TypedColumn[T, U](newExpr, encoder)
   }
+
+  /** Creates a TypedColumn based on the given expression. */
+  private def withExpr(newExpr: Expression): TypedColumn[T, U] =
+new TypedColumn[T, U](newExpr, encoder)
+
+  /**
+   * Gives the TypedColumn a name (alias).
+   * If the current TypedColumn has metadata associated with it, this 
metadata will be propagated
+   * to the new column.
+   *
+   * @group expr_ops
+   * @since 2.0.0
+   */
+  override def as(alias: String): TypedColumn[T, U] = withExpr {
--- End diff --

@rxin @cloud-fan : Thanks very much. I have made the changes based on your 
comments. Can you help check? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >