[GitHub] spark pull request: [SPARK-5966]
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/9220 [SPARK-5966] You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark working_on_spark-5966 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9220.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9220 commit 396c66a3d65a417618e4ce28c548cca6f028abc0 Author: Kevin Yu <q...@us.ibm.com> Date: 2015-10-22T07:06:13Z [SPARK-5966] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5966][WIP]
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/9220#discussion_r42811834 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/impatient.sc --- @@ -0,0 +1 @@ +1+1; --- End diff -- Hello Josh: Sorry, it is unintentional. I have deleted the new file and push again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5966][WIP]
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/9220#discussion_r42813223 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -266,6 +266,11 @@ object SparkSubmit { } } +// SPARK-5966, check deployMode CLUSTER and master local +if (clusterManager == LOCAL && deployMode == CLUSTER) { + printErrorAndExit("Cluster deploy mode is not compatible with master \"local\"") +} --- End diff -- Hello Andrew: Thanks for pointing this out, I have made the code changes, run the test , and submit the pull request, can you help review? Kevin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/9720 [SPARK-11447][SQL] change NullType to StringType during binaryComparison between NullType and StringType During executing PromoteStrings rule, if one side of binaryComparison is StringType and the other side is not StringType, the current code will promote(cast) the StringType to DoubleType, and if the StringType doesn't contain the numbers, it will get null value. So if it is doing <=> (NULL-safe equal) with Null, it will not filter anything, caused the problem reported by this jira. I proposal to the changes through this PR, can you review my code changes ? This problem only happen for <=>, other operators works fine. scala> val filteredDF = df.filter(df("column") > (new Column(Literal(null filteredDF: org.apache.spark.sql.DataFrame = [column: string] scala> filteredDF.show +--+ |column| +--+ +--+ scala> val filteredDF = df.filter(df("column") === (new Column(Literal(null filteredDF: org.apache.spark.sql.DataFrame = [column: string] scala> filteredDF.show +--+ |column| +--+ +--+ scala> df.registerTempTable("DF") scala> sqlContext.sql("select * from DF where 'column' = NULL") res27: org.apache.spark.sql.DataFrame = [column: string] scala> res27.show +--+ |column| +--+ +--+ You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark working_on_spark-11447 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9720.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9720 commit b53b85cad4f5fced9ba003351d5a9af1eb5111fc Author: Kevin Yu <q...@us.ibm.com> Date: 2015-11-13T18:11:59Z [SPARK-11447]Check NullType before Promote StringType commit bb705cae18032fcee8f8a532be464f0a995b27cb Author: Kevin Yu <q...@us.ibm.com> Date: 2015-11-15T06:41:48Z add testcase in ColumnExpressionSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/9720#discussion_r45069001 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala --- @@ -280,6 +280,12 @@ object HiveTypeCoercion { case p @ BinaryComparison(left @ DateType(), right @ TimestampType()) => p.makeCopy(Array(Cast(left, StringType), Cast(right, StringType))) + // Checking NullType + case p @ BinaryComparison(left @ StringType(), right @ NullType()) => +p.makeCopy(Array(left, Literal.create(null, StringType))) + case p @ BinaryComparison(left @ NullType(), right @ StringType()) => +p.makeCopy(Array(Literal.create(null, StringType), right)) + case p @ BinaryComparison(left @ StringType(), right) if right.dataType != StringType => p.makeCopy(Array(Cast(left, DoubleType), right)) --- End diff -- @yhuai @cloud-fan : sure, I will not do that. I will try to run more testing to see if anything is broken. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/9720#issuecomment-157287070 @cloud-fan and @marmbrus @yhuai @nongli @liancheng : thanks for reviewing the fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/9720#discussion_r45028090 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala --- @@ -280,6 +280,12 @@ object HiveTypeCoercion { case p @ BinaryComparison(left @ DateType(), right @ TimestampType()) => p.makeCopy(Array(Cast(left, StringType), Cast(right, StringType))) + // Checking NullType + case p @ BinaryComparison(left @ StringType(), right @ NullType()) => +p.makeCopy(Array(left, Literal.create(null, StringType))) + case p @ BinaryComparison(left @ NullType(), right @ StringType()) => +p.makeCopy(Array(Literal.create(null, StringType), right)) + case p @ BinaryComparison(left @ StringType(), right) if right.dataType != StringType => p.makeCopy(Array(Cast(left, DoubleType), right)) --- End diff -- @cloud-fan : do you want me to open a new jira to look into this? The new jira/pr will focus on the rules in PromoteStrings and ImplicitTypeCasts, as you suggested to reduce the redundant rules in PromoteStrings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Working on spark 11827
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/10125 Working on spark 11827 Hello : Can you help check this PR? I am adding support for the java.math.BigInteger for java bean code path. I saw internally spark is converting the BigInteger to BigDecimal in ColumnType.scala and CatalystRowConverter.scala. I use the similar way and convert the BigInteger to the BigDecimal. . You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark working_on_spark-11827 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10125.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10125 commit a67722094e8a9d0689ba022eb4f923e28791503e Author: Kevin Yu <q...@us.ibm.com> Date: 2015-12-01T16:38:09Z adding java.math.BigInteger support for java bean commit a58d92cd85719c6112c5cb0162be9b6104f9ba00 Author: Kevin Yu <q...@us.ibm.com> Date: 2015-12-02T05:37:56Z adding test case commit f400a825f38a2e3559e9b4f63b4e58bdd17c5e3b Author: Kevin Yu <q...@us.ibm.com> Date: 2015-12-03T07:38:15Z modify the JavaDataFrameSuite commit 3db875a7d9a331d3a200d26338c956d694001046 Author: Kevin Yu <q...@us.ibm.com> Date: 2015-12-03T07:50:43Z clean the JavaDataFrameSuite commit 0807550ae396231a19648c2f4db7e8946544d4a2 Author: Kevin Yu <q...@us.ibm.com> Date: 2015-12-03T07:57:20Z working on the JavaDataFrameSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10125#issuecomment-161707755 Hello Sean: I am sorry, I forgot to update the title and description. I have made the changes, please let me know if anything needs to be changed. Thanks. Kevin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/10314 [SPARK-12317][SQL]Support configurable value in SQLConf file Hello: adding the configure value for AUTO_BROADCASTJOIN_THRESHOLD and DEFAULT_SIZE_IN_BYTES. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark working_on_spark-12317 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10314.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10314 commit 44e7fba419088a589cbeeb6cbf012f43ad49576c Author: Kevin Yu <q...@us.ibm.com> Date: 2015-12-15T19:25:28Z fix spark jira 12317 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10125#discussion_r47871490 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -69,6 +68,16 @@ final class Decimal extends Ordered[Decimal] with Serializable { } /** + * Set this Decimal to the given BigInt. Will have precision 38 and scale 0. + */ + def set(intVal: BigInt): Decimal = { +this.decimalVal = null +this.longVal = intVal.toLong --- End diff -- Hi Davies: Yes, we need to check the range, otherwise, it will cause overflow. Thanks. I will look into. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10125#discussion_r47871566 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -362,6 +371,8 @@ object Decimal { def apply(value: java.math.BigDecimal): Decimal = new Decimal().set(value) + def apply(value: java.math.BigInteger): Decimal = new Decimal().set((value)) --- End diff -- Sorry, I forgot to remove the extra (). I will correct it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10125#discussion_r47866879 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala --- @@ -326,6 +326,7 @@ object CatalystTypeConverters { val decimal = scalaValue match { case d: BigDecimal => Decimal(d) case d: JavaBigDecimal => Decimal(d) +case d: BigInteger => Decimal(d) --- End diff -- Hi Wenchen: Sure, I will add that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10125#discussion_r47866947 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala --- @@ -75,6 +75,7 @@ object JavaTypeInference { case c: Class[_] if c == classOf[java.lang.Boolean] => (BooleanType, true) case c: Class[_] if c == classOf[java.math.BigDecimal] => (DecimalType.SYSTEM_DEFAULT, true) + case c: Class[_] if c == classOf[java.math.BigInteger] => (DecimalType.SYSTEM_DEFAULT, true) --- End diff -- Yes, I will use (38,0) for the BigInteger. No need to have scale. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10299#issuecomment-164603371 Hello Michael: I fixed the scala style issue, can you help re-run the test? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10299#issuecomment-165710284 The failure is because of the changed project, will submit an updated patch tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10314#discussion_r47713626 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -93,7 +94,7 @@ private[spark] object SQLConf { isPublic: Boolean = true): SQLConfEntry[Int] = SQLConfEntry(key, defaultValue, { v => try { - v.toInt + Utils.byteStringAsBytes(v).toInt --- End diff -- Hello Sean: Thanks for your comment. Yes, you are right. There are other methods are not meaning to use memory sizes. (like COLUMN_BATCH_SIZE, etc). There are couple approaches, can you suggest which way is preferable way for this problem or suggest a new way to fix this ? 1. we will document that [g|G|m|M|k|K] means memory size. 2. create a new method of intConf for AUTO_BROADCASTJOIN_THRESHOLD. 3. create a rule to the parseByteString. like K/KB/M/MB means 1024, k/kb/m/mb means 1000. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/10299 [SPARK-12231][SQL]create a combineFilters' projection when we call buildPartitionedTableScan Hello Michael & All: Here I am submitting another approach to solve this problem. Can you verify ? I think the problem is related to change from spark-10829, before that PR change, the projects and filters are done inside buildPartitionedTableScan. With that PR change, the filter expression divide to 3 parts, the filter left outside of the scan (combineFilters ) needs a different projection. So the fix is to create a combine projection for the outside filter and beyond. Thanks for your comments. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark working_on_spark-12231 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10299.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10299 commit 2333c6d3dffd580529705e33f5ccdc8871670c0f Author: Kevin Yu <q...@us.ibm.com> Date: 2015-12-14T19:51:35Z another approach to fix this problem --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10314#issuecomment-165010421 @watermen thanks for your input. It is good idea if we decide to go with approach 2. create a new method of intConf for AUTO_BROADCASTJOIN_THRESHOLD. If we decide to go with approach 3, then we may need to change the parseByteString part to distinct lower and upper case. @srowen what do you think? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10314#issuecomment-165023639 @srowen Thanks Sean, I will create a new method of intConf for AUTO_BROADCASTJOIN_THRESHOLD. Will update the PR soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10314#issuecomment-168854235 @srowen Hello Sean: Sorry for taking so long. Can you review the code? Thanks. Kevin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurate value fo...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10314#issuecomment-169238048 @viirya @yhuai @srowen @marmbrus @concretevitamin: SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE was added by spark-9850, PR#9276, I am including Yin , Michael set the autoBroadcastJoinThreshold to 10 * 1024 * 1024 through PR3064, spark-2393 set the autoBroadcastJoinThreshold to Int. I am not sure which one to choose either, so I cc on the persons who introduce these two fields. Thanks for your input. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurate value fo...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10314#issuecomment-169220626 Hello @viirya : Good Point. I just test it, -1 and -1g will have different behavior. It will take -1, and throw IllegalArgumentException for -1g. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL] Support units (m,k,g) in SQ...
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/10629 [SPARK-12317][SQL] Support units (m,k,g) in SQLConf This PR is continue from previous closed PR 10314. In this PR, SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE will be taken memory string conventions as input. For example, the user can now specify 10g for SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE in SQLConf file. @marmbrus @srowen : Can you help review this code changes ? Thanks. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark spark-12317 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10629.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10629 commit a37a05e856b58a13ec13239ffc1a2050563102ea Author: Kevin Yu <q...@us.ibm.com> Date: 2016-01-07T04:20:27Z Support units (m,k,g) in SQLConf --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support units (m,k,g) in SQL...
Github user kevinyu98 closed the pull request at: https://github.com/apache/spark/pull/10314 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL] Support units (m,k,g) in SQ...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10629#issuecomment-169902816 @rxin I am so sorry that I didn't reply earlier. The code passed the style check, I copied the code from the existing codes, and I thought the indentation 2 is fine. So I am not sure how to make changes. But appreciated your help. Next time, I will raise the questions more quickly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurate value fo...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10314#issuecomment-169173265 Hello @srowen @marmbrus @viirya : I have made the code changes, and change the title based on the comments. Can you help review the codes? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10314#issuecomment-168317362 Hello Sean: Sorry for the delay. Yes, I have made most code changes, and I will try to finish it up soon and do more testing. Will keep you updated. Thanks, Happy New Year ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support units (m,k,g) in SQL...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10314#issuecomment-169438242 Hello @marmbrus, thanks. So your mean I can remove the code change for intMemConf, and keep the code for longMemConf for this jira? I will make the PR title and description changes. I need to close this PR, open another one, seems there is some issues when I did last git push. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10388#issuecomment-166407905 @marmbrus : Can you help take a look at this PR? Thanks for your review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10314#issuecomment-166747545 @srowen Hello Sean: I have submit the new code, can you help review? Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10388#issuecomment-167179672 I delete the test cases from DataFrameNaFunctionsSuite.scala. I checked the previous failure, not sure why it is failed. I worked when I run the local test on my laptop. $ build/sbt "test-only org.apache.spark.sql.thriftserver" .. [success] Total time: 296 s, completed Dec 24, 2015 6:11:20 PM then I re-run the sql test buckets, seems fine. $ build/sbt sql/test-only [info] Passed: Total 1522, Failed 0, Errors 0, Passed 1522, Ignored 10 [success] Total time: 146 s, completed Dec 24, 2015 6:26:23 PM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...
Github user kevinyu98 closed the pull request at: https://github.com/apache/spark/pull/10299 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10299#issuecomment-165919581 I will create a new PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/10388 [SPARK-12231][SQL]create a combineFilters' projection when we call buildPartitionedTableScan Hello Michael & All: We have some issues to submit the new codes in the other PR(#10299), so we closed that PR and open this one with the fix. The reason for the previous failure is that the projection for the scan when there is a filter that is not pushed down (the "left-over" filter) could be different, in elements or ordering, from the original projection. With this new codes, the approach to solve this problem is: Insert a new Project if the "left-over" filter is nonempty and (the original projection is not empty and the projection for the scan has more than one elements which could otherwise cause different ordering in projection). We create 3 test cases to cover the otherwise failure cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark spark-12231 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10388.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10388 commit 2d56ac02eaff10972e5bc46f3b57cff993d60e24 Author: Kevin Yu <q...@us.ibm.com> Date: 2015-12-18T23:31:05Z another approach to fix this problem commit 305739f872ba90ba9ef4f3ef6c4f812b4024d8e9 Author: Kevin Yu <q...@us.ibm.com> Date: 2015-12-18T23:46:37Z update comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10314#discussion_r48451909 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -100,6 +101,33 @@ private[spark] object SQLConf { } }, _.toString, doc, isPublic) +def intMemConf( --- End diff -- I will make the changes. I used the code format and run the scalastyle test, thought it passed the styles. I will look more carefully next time. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10314#discussion_r48451916 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -100,6 +101,33 @@ private[spark] object SQLConf { } }, _.toString, doc, isPublic) +def intMemConf( +key: String, +defaultValue: Option[Int] = None, +doc: String = "", +isPublic: Boolean = true): SQLConfEntry[Int] = + SQLConfEntry(key, defaultValue, { v => +var isNegative: Boolean = false +try { + isNegative = (v.toInt < 0) +} catch { + case _: Throwable => +} +if (!isNegative) { --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10314#discussion_r48451915 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -100,6 +101,33 @@ private[spark] object SQLConf { } }, _.toString, doc, isPublic) +def intMemConf( +key: String, +defaultValue: Option[Int] = None, +doc: String = "", +isPublic: Boolean = true): SQLConfEntry[Int] = + SQLConfEntry(key, defaultValue, { v => +var isNegative: Boolean = false +try { + isNegative = (v.toInt < 0) +} catch { + case _: Throwable => --- End diff -- yah, I want to catch the exception, then do nothing. I will make the changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10314#discussion_r48451911 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -100,6 +101,33 @@ private[spark] object SQLConf { } }, _.toString, doc, isPublic) +def intMemConf( +key: String, +defaultValue: Option[Int] = None, +doc: String = "", +isPublic: Boolean = true): SQLConfEntry[Int] = + SQLConfEntry(key, defaultValue, { v => +var isNegative: Boolean = false --- End diff -- ok, I will make the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10314#discussion_r48451918 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -100,6 +101,33 @@ private[spark] object SQLConf { } }, _.toString, doc, isPublic) +def intMemConf( +key: String, +defaultValue: Option[Int] = None, +doc: String = "", +isPublic: Boolean = true): SQLConfEntry[Int] = + SQLConfEntry(key, defaultValue, { v => +var isNegative: Boolean = false +try { + isNegative = (v.toInt < 0) +} catch { + case _: Throwable => +} +if (!isNegative) { + if ((Utils.byteStringAsBytes(v) <= Int.MaxValue.toLong) && --- End diff -- I will put it in a variable. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10314#discussion_r48451947 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -107,7 +135,7 @@ private[spark] object SQLConf { isPublic: Boolean = true): SQLConfEntry[Long] = SQLConfEntry(key, defaultValue, { v => try { - v.toLong + Utils.byteStringAsBytes(v) --- End diff -- sorry, I thought it was only one place. But actually there are two places. I will create a new method for SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10388#discussion_r48231314 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameNaFunctionsSuite.scala --- @@ -194,4 +194,45 @@ class DataFrameNaFunctionsSuite extends QueryTest with SharedSQLContext { assert(out1(4) === Row("Amy", null, null)) assert(out1(5) === Row(null, null, null)) } + + test("Spark-12231: dropna with partitionBy and groupBy") { --- End diff -- You are right, this problem is not related to na.drop. At that time, I was not sure where I can put the test case, so I just keep here. Thanks for the suggestion about putting with the other DataSource tests, I will change the test case and look for the place around the DataSource tests. Thanks very much ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10388#discussion_r48230827 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -88,16 +88,27 @@ private[sql] object DataSourceStrategy extends Strategy with Logging { s"Selected $selected partitions out of $total, pruned $percentPruned% partitions." } + // need to add projections from combineFilters in + val combineFilter = combineFilters.reduceLeftOption(expressions.And) --- End diff -- Will change the name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10388#discussion_r48230905 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -88,16 +88,27 @@ private[sql] object DataSourceStrategy extends Strategy with Logging { s"Selected $selected partitions out of $total, pruned $percentPruned% partitions." } + // need to add projections from combineFilters in + val combineFilter = combineFilters.reduceLeftOption(expressions.And) + val combinedProjects = combineFilter.map(_.references.toSet.union(projects.toSet).toSeq) +.getOrElse(projects) val scan = buildPartitionedTableScan( l, -projects, +combinedProjects, pushedFilters, t.partitionSpec.partitionColumns, selectedPartitions) - combineFilters -.reduceLeftOption(expressions.And) -.map(execution.Filter(_, scan)).getOrElse(scan) :: Nil + // Add a Projection to guarantee the original projection: + // this is because "combinedProjects" may be different from the + // original "projects", in elements or their ordering --- End diff -- Thanks for the suggestion. Thought the == is 'shallow' compare, but it is 'deep' equality checking. Will make the changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10388#discussion_r48230807 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameNaFunctionsSuite.scala --- @@ -194,4 +194,45 @@ class DataFrameNaFunctionsSuite extends QueryTest with SharedSQLContext { assert(out1(4) === Row("Amy", null, null)) assert(out1(5) === Row(null, null, null)) } + + test("Spark-12231: dropna with partitionBy and groupBy") { +withTempPath { dir => + val df = sqlContext.range(10) + val df1 = df.withColumn("a", $"id".cast("int")) + df1.write.partitionBy("id").parquet(dir.getCanonicalPath) + val df2 = sqlContext.read.parquet(dir.getCanonicalPath) + val group = df2.na.drop().groupBy().count().collect() --- End diff -- Hi Michael: Sure, will change the testcase. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12231][SQL]create a combineFilters' pro...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10388#discussion_r48230771 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -88,16 +88,27 @@ private[sql] object DataSourceStrategy extends Strategy with Logging { s"Selected $selected partitions out of $total, pruned $percentPruned% partitions." } + // need to add projections from combineFilters in + val combineFilter = combineFilters.reduceLeftOption(expressions.And) + val combinedProjects = combineFilter.map(_.references.toSet.union(projects.toSet).toSeq) --- End diff -- Hi Michael: Sure, will make the changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10314#issuecomment-166560467 @srowen Hello Sean: I am sorry that I have't been able to submit the PR yet. I will conti. work on it tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12317][SQL]Support configurable value i...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10314#issuecomment-166326525 @srowen Hello Sean, yes, sorry for the delay. I will submit the updated PR today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13506: [SPARK-15763][SQL] Support DELETE FILE command na...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/13506#discussion_r65805306 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1441,6 +1441,32 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli } /** + * Delete a file to be downloaded with this Spark job on every node. + * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported + * filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, + * use `SparkFiles.get(fileName)` to find its download location. + * + */ + def deleteFile(path: String): Unit = { --- End diff -- Hello Reynold: Sorry I am afraid that I misunderstood your previous comments. Does your mean the user should take the path from the LIST FILE command output, then use that path as the DELETE FILE command's path? If that is the case, the delete code will much simple. Thanks for your advice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/13555#discussion_r66366855 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -625,6 +625,21 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext } } } + + test("SPARK-15804: write out the metadata to parquet file") { +val df = Seq((1, "abc"), (2, "hello")).toDF("a", "b") +val md = new MetadataBuilder().putString("key", "value").build() +val dfWithmeta = df.select('a, 'b.as("b", md)) + +withTempPath { dir => + val path = s"${dir.getCanonicalPath}/data" + dfWithmeta.write.parquet(path) + + readParquetFile(path) { df => +assert(df.schema.json.contains("\"key\":\"value\"")) --- End diff -- @cloud-fan Thanks for your comments, I have changed the test case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/13555#discussion_r66366487 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -625,6 +625,21 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext } } } + + test("SPARK-15804: write out the metadata to parquet file") { +val df = Seq((1, "abc"), (2, "hello")).toDF("a", "b") +val md = new MetadataBuilder().putString("key", "value").build() +val dfWithmeta = df.select('a, 'b.as("b", md)) + +withTempPath { dir => + val path = s"${dir.getCanonicalPath}/data" --- End diff -- ok, I will make change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/13555#discussion_r66366470 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -625,6 +625,21 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext } } } + + test("SPARK-15804: write out the metadata to parquet file") { +val df = Seq((1, "abc"), (2, "hello")).toDF("a", "b") +val md = new MetadataBuilder().putString("key", "value").build() +val dfWithmeta = df.select('a, 'b.as("b", md)) + +withTempPath { dir => + val path = s"${dir.getCanonicalPath}/data" + dfWithmeta.write.parquet(path) + + readParquetFile(path) { df => +assert(df.schema.json.contains("\"key\":\"value\"")) --- End diff -- sure, I will do that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/13555 [SPARK-15804][SQL]Include metadata in the toStructType ## What changes were proposed in this pull request? The help function 'toStructType' in the AttributeSeq class doesn't include the metadata when it builds the StructField, so it causes this reported problem https://issues.apache.org/jira/browse/SPARK-15804?jql=project%20%3D%20SPARK when spark writes the the dataframe with the metadata to the parquet datasource. The code path is when spark writes the dataframe to the parquet datasource through the InsertIntoHadoopFsRelationCommand, spark will build the WriteRelation container, and it will call the help function 'toStructType' to create StructType which contains StructField, it should include the metadata there, otherwise, we will lost the user provide metadata. ## How was this patch tested? added test case in ParquetQuerySuite.scala (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark spark-15804 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13555.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13555 commit 3b44c5978bd44db986621d3e8511e9165b66926b Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-20T18:06:30Z adding testcase commit 18b4a31c687b264b50aa5f5a74455956911f738a Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-22T21:48:00Z Merge remote-tracking branch 'upstream/master' commit 4f4d1c8f2801b1e662304ab2b33351173e71b427 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-23T16:50:19Z Merge remote-tracking branch 'upstream/master' get latest code from upstream commit f5f0cbed1eb5754c04c36933b374c3b3d2ae4f4e Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-23T22:20:53Z Merge remote-tracking branch 'upstream/master' adding trim characters support commit d8b2edbd13ee9a4f057bca7dcb0c0940e8e867b8 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-25T20:24:33Z Merge remote-tracking branch 'upstream/master' get latest code for pr12646 commit 196b6c66b0d55232f427c860c0e7c6876c216a67 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-25T23:45:57Z Merge remote-tracking branch 'upstream/master' merge latest code commit f37a01e005f3e27ae2be056462d6eb6730933ba5 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-27T14:15:06Z Merge remote-tracking branch 'upstream/master' merge upstream/master commit bb5b01fd3abeea1b03315eccf26762fcc23f80c0 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-30T23:49:31Z Merge remote-tracking branch 'upstream/master' commit bde5820a181cf84e0879038ad8c4cebac63c1e24 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-04T03:52:31Z Merge remote-tracking branch 'upstream/master' commit 5f7cd96d495f065cd04e8e4cc58461843e45bc8d Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-10T21:14:50Z Merge remote-tracking branch 'upstream/master' commit 893a49af0bfd153ccb59ba50b63a232660e0eada Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-13T18:20:39Z Merge remote-tracking branch 'upstream/master' commit 4bbe1fd4a3ebd50338ccbe07dc5887fe289cd53d Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-17T21:58:14Z Merge remote-tracking branch 'upstream/master' commit b2dd795e23c36cbbd022f07a10c0cf21c85eb421 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-18T06:37:13Z Merge remote-tracking branch 'upstream/master' commit 8c3e5da458dbff397ed60fcb68f2a46d87ab7ba4 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-18T16:18:16Z Merge remote-tracking branch 'upstream/master' commit a0eaa408e847fbdc3ac5b26348588ee0a1e276c7 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-19T04:28:20Z Merge remote-tracking branch 'upstream/master' commit d03c940ed89795fa7fe1d1e9f511363b22cdf19d Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-19T21:24:33Z Merge remote-tracking branch 'upstream/master' commit d728d5e002082e571ac47292226eb8b2614f479f Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-24T20:32:57Z Merge remote-tracking branch 'upstream/master' commit ea104ddfbf7d180ed1bc53dd9a1005010264aa1f Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-25T22:52:57Z Merge remote-tracking branch 'upstream/master' commit 6ab1215b781ad0cccf1752f3a625b4e4e371c38e Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-27T17:18:46Z Merge remote-tracking branch 'upstream/master' commit 0c566533705331697eb1b287b30c8b16111f6fa2 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-06-
[GitHub] spark pull request #13506: [SPARK-15763][SQL] Support DELETE FILE command na...
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/13506 [SPARK-15763][SQL] Support DELETE FILE command natively ## What changes were proposed in this pull request? Hive supports these cli commands to manage the resource [Hive Doc](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli) : `ADD/DELETE (FILE(s)|JAR(s) )` `LIST (FILE(S) [filepath ...] | JAR(S) [jarpath ...]) ` but SPARK only supports two commands `ADD (FILE | JAR )` `LIST (FILE(S) [filepath ...] | JAR(S) [jarpath ...])` for now. This PR is to add the DELETE FILE command into Spark SQL and I will submit another PR for the DELETE JAR(s). `DELETE FILE ` ## **Example:** **DELETE FILE** ``` scala> spark.sql("add file /Users/qianyangyu/myfile.txt") res0: org.apache.spark.sql.DataFrame = [] scala> spark.sql("add file /Users/qianyangyu/myfile2.txt") res1: org.apache.spark.sql.DataFrame = [] scala> spark.sql("list file") res2: org.apache.spark.sql.DataFrame = [Results: string] scala> spark.sql("list file").show(false) +--+ |Results | +--+ |file:/Users/qianyangyu/myfile2.txt| |file:/Users/qianyangyu/myfile.txt | +--+ scala> spark.sql("delete file /Users/qianyangyu/myfile.txt") res4: org.apache.spark.sql.DataFrame = [] scala> spark.sql("list file").show(false) +--+ |Results | +--+ |file:/Users/qianyangyu/myfile2.txt| +--+ scala> spark.sql("delete file /Users/qianyangyu/myfile2.txt") res6: org.apache.spark.sql.DataFrame = [] scala> spark.sql("list file").show(false) +---+ |Results| +---+ +---+ ``` ## How was this patch tested? Add test cases in Spark-SQL SPARK-Shell and SparkContext suites. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark spark-15763 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13506.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13506 commit 3b44c5978bd44db986621d3e8511e9165b66926b Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-20T18:06:30Z adding testcase commit 18b4a31c687b264b50aa5f5a74455956911f738a Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-22T21:48:00Z Merge remote-tracking branch 'upstream/master' commit 4f4d1c8f2801b1e662304ab2b33351173e71b427 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-23T16:50:19Z Merge remote-tracking branch 'upstream/master' get latest code from upstream commit f5f0cbed1eb5754c04c36933b374c3b3d2ae4f4e Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-23T22:20:53Z Merge remote-tracking branch 'upstream/master' adding trim characters support commit d8b2edbd13ee9a4f057bca7dcb0c0940e8e867b8 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-25T20:24:33Z Merge remote-tracking branch 'upstream/master' get latest code for pr12646 commit 196b6c66b0d55232f427c860c0e7c6876c216a67 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-25T23:45:57Z Merge remote-tracking branch 'upstream/master' merge latest code commit f37a01e005f3e27ae2be056462d6eb6730933ba5 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-27T14:15:06Z Merge remote-tracking branch 'upstream/master' merge upstream/master commit bb5b01fd3abeea1b03315eccf26762fcc23f80c0 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-30T23:49:31Z Merge remote-tracking branch 'upstream/master' commit bde5820a181cf84e0879038ad8c4cebac63c1e24 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-04T03:52:31Z Merge remote-tracking branch 'upstream/master' commit 5f7cd96d495f065cd04e8e4cc58461843e45bc8d Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-10T21:14:50Z Merge remote-tracking branch 'upstream/master' commit 893a49af0bfd153ccb59ba50b63a232660e0eada Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-13T18:20:39Z Merge remote-tracking branch 'upstream/master' commit 4bbe1fd4a3ebd50338ccbe07dc5887fe289cd53d Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-17T21:58:14Z Merge remote-tracking branch 'upstream/master' commit b2dd795e23c36cbbd022f07a10c0cf21c85eb421 Author: Kevin Yu <q
[GitHub] spark pull request #13506: [SPARK-15763][SQL] Support DELETE FILE command na...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/13506#discussion_r65799162 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1441,6 +1441,32 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli } /** + * Delete a file to be downloaded with this Spark job on every node. + * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported + * filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, + * use `SparkFiles.get(fileName)` to find its download location. + * + */ + def deleteFile(path: String): Unit = { --- End diff -- Hi Reynold: Thanks very much for reviewing the code. yes, it is deleting the path from the addedFile hashmap, the path will be generated as key and stored in the map. The addFile use this logical to generate the key and stored in the hashmap, so in order to find the same key, I have to use the same logical to generate the key. For example: for this local file, the addFile will generate a 'file' in front of the path. spark.sql("add file /Users/qianyangyu/myfile.txt") scala> spark.sql("list file").show(false) +--+ |Results | +--+ |file:/Users/qianyangyu/myfile2.txt| |file:/Users/qianyangyu/myfile.txt | +--+ but for the remote location file, it will just take the path. scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt") res17: org.apache.spark.sql.DataFrame = [] scala> spark.sql("list file").show(false) +-+ |Results | +-+ |file:/Users/qianyangyu/myfile.txt| |hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt| +-+ if the command is issued from the worker node and add local file, the path will be added into the NettyStreamManager's hashmap and using that environment's path as key to store in the addedFiles. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13506: [SPARK-15763][SQL] Support DELETE FILE command na...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/13506#discussion_r65981961 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1441,6 +1441,32 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli } /** + * Delete a file to be downloaded with this Spark job on every node. + * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported + * filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, + * use `SparkFiles.get(fileName)` to find its download location. + * + */ + def deleteFile(path: String): Unit = { --- End diff -- I have updated the deleteFile comments to make it more clear. Thanks for reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/13555#discussion_r66264754 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -625,6 +625,22 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext } } } + + test("SPARK-15804: write out the metadata to parquet file") { +val data = (1, "abc") ::(2, "helloabcde") :: Nil +val df = spark.createDataFrame(data).toDF("a", "b") +val md = new MetadataBuilder().putString("key", "value").build() +val dfWithmeta = df.select(Column("a"), Column("b").as("b", md)) --- End diff -- I will change. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/13555#discussion_r66264631 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -625,6 +625,22 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext } } } + + test("SPARK-15804: write out the metadata to parquet file") { +val data = (1, "abc") ::(2, "helloabcde") :: Nil +val df = spark.createDataFrame(data).toDF("a", "b") --- End diff -- sure, I will do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/13555#discussion_r66265069 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -625,6 +625,22 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext } } } + + test("SPARK-15804: write out the metadata to parquet file") { +val data = (1, "abc") ::(2, "helloabcde") :: Nil +val df = spark.createDataFrame(data).toDF("a", "b") +val md = new MetadataBuilder().putString("key", "value").build() +val dfWithmeta = df.select(Column("a"), Column("b").as("b", md)) + +withTempPath { dir => + val path = s"${dir.getCanonicalPath}/data" + dfWithmeta.write.parquet(path) + + readParquetFile(path) { dfwithmeta2 => --- End diff -- ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/13555#discussion_r66383008 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -625,6 +625,21 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext } } } + + test("SPARK-15804: write out the metadata to parquet file") { +val df = Seq((1, "abc"), (2, "hello")).toDF("a", "b") +val md = new MetadataBuilder().putString("key", "value").build() +val dfWithmeta = df.select('a, 'b.as("b", md)) + +withTempPath { dir => + val path = s"${dir.getCanonicalPath}" --- End diff -- Done, Thanks very much. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10777] [SQL]avoid checking nullability ...
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/11184 [SPARK-10777] [SQL]avoid checking nullability for complex data type for typeSuffix code path Hello: When we call the typeSuffix method, it will call the dataType and get the LongType for the suffix("L"). But in the complex data type(like CreatArray) cases, the dataType will also evaluate the children's nullability, which is not necessary for the typeSuffix. For the proposed fix, I will create a prettyDataType, in parallel to the dataType in the expression. At the base, it defaults to the dataType. It is only used by typeSuffix in the NameExpression for now. So the main changes is at the typeSuffix in the NameExpression, and the complexTypeCreator. The rest of files just override the prettyDataType from the abstract class Expression when their "dataType" method relies on other expression's "dataType" Then for those complex types, the "prettyDataType" does not try to evaluate the "nullable" but just pass a default that will not be used at all by the caller, "typeSuffix". You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark working_on_spark-13253 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11184.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11184 commit dd82428191f8a312ec29f471a4230fa91212eadd Author: Kevin Yu <q...@us.ibm.com> Date: 2016-02-12T18:22:02Z avoid checking nullability for complex data type --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10777] [SQL] Resolve Aliases in the Gro...
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/10967 [SPARK-10777] [SQL] Resolve Aliases in the Group By clause @gatorsmile @yhuai @marmbrus @cloud-fan : Hello All, I tried to run the failing query with PR 10678 from Spark-12705, still got the same failure. Actually for this jira problem, I can recreate it without using order by and window function. It just needs select a column with aliases and aggregate function , group by with the aliases. the query looks like below: select a r, sum(b) s FROM testData2 GROUP BY r (if I replace r in the group by with a, it will work) I think this jira is different than Xiao's jira. For this Jira, it looks like the Aliases in the Group By clause (r) can't be resolved in the rule ResolveReferences. Currently, the ResolveReferences only deal with the aggregate function if the argument contains Stars, so for other aggregate function, it falls into this case: case q: LogicalPlan , and it will try to resolve it in the child. In this case, the group by contains alias r, the child is LogicalRDD contains column a and b, that is why we can't find r in the child. Here is the plan looks like. plan = {Aggregate@9173} "'Aggregate ['r], [a#4 AS r#43,(sum(cast(b#5 as bigint)),mode=Complete,isDistinct=false) AS s#44L]\n+- Subquery testData2\n +- LogicalRDD [a#4,b#5], MapPartitionsRDD[5] at beforeAll at BeforeAndAfterAll.scala:187\n" groupingExpressions = {$colon$colon@9176} "::" size = 1 (0) = {UnresolvedAttribute@9190} "'r" aggregateExpressions = {$colon$colon@9177} "::" size = 2 (0) = {Alias@9110} "a#4 AS r#43" (1) = {Alias@9196} "(sum(cast(b#5 as bigint)),mode=Complete,isDistinct=false) AS s#44L" child = {Subquery@7456} "Subquery testData2\n+- LogicalRDD [a#4,b#5], MapPartitionsRDD[5] at beforeAll at BeforeAndAfterAll.scala:187\n" alias = {String@9201} "testData2" child = {LogicalRDD@9202} "LogicalRDD [a#4,b#5], MapPartitionsRDD[5] at beforeAll at BeforeAndAfterAll.scala:187\n" _analyzed = false resolved = true cleanArgs = null org$apache$spark$Logging$$log_ = null bitmap$0 = 1 schema = null bitmap$0 = false origin = {Origin@9203} "Origin(Some(1),Some(27))" containsChild = {Set$Set1@9204} "Set$Set1" size = 1 bitmap$0 = true resolved = false bitmap$0 = true _analyzed = false resolved = false the proposal fix is that we create another case for aggregate function, if there is unresolved attribute in the groupingExpressions, and all the attributes are resolved in the aggregateExpressions, we will search the unresolved attribute in the aggregateExpressions first. Thanks for reviewing. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark working_on_spark-10777 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10967.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10967 commit c2fcaa8e488d12419c7b7c5032ccadab38f20b68 Author: gatorsmile <gatorsm...@gmail.com> Date: 2016-01-10T03:21:14Z window function: Sorting columns are not in Project commit 5ca463035bc6eaebd15e7cf332faeea157e5593e Author: gatorsmile <gatorsm...@gmail.com> Date: 2016-01-10T03:30:58Z style fix. commit da6baf25488767ce6e73538b03f9195bba92b84e Author: gatorsmile <gatorsm...@gmail.com> Date: 2016-01-10T06:23:48Z code cleaning and address comments. commit b5de0799650a86b8479eb053d7e3e65b23e5d75b Author: gatorsmile <gatorsm...@gmail.com> Date: 2016-01-10T16:31:09Z Merge remote-tracking branch 'upstream/master' into sortWindows commit d164342747502b09686c1802cf9d24d8ed4c899e Author: gatorsmile <gatorsm...@gmail.com> Date: 2016-01-13T06:15:31Z address comments. commit 27fcaa5ad6a3b4228ef4fc46b963c1e818d2f5c4 Author: gatorsmile <gatorsm...@gmail.com> Date: 2016-01-13T08:30:12Z address comments. commit 7fc98e49a26fd03f398b2241b4cfd19e969b770e Author: gatorsmile <gatorsm...@gmail.com> Date: 2016-01-17T05:03:23Z added a support to more operators. commit 03112397437cf0f49eea8a347383d9d642e0995b Author: gatorsmile <gatorsm...@gmail.com> Date: 2016-01-17T05:24:14Z Merge remote-tracking branch 'upstream/master' into sortWindows commit 522626bbd483054f441d2ca49bc06512901258ea Author: gatorsmile <gatorsm...@gmail.com> Date: 2016-01-17T05:25:56Z style fix. commit 26945fa63809a8671461404eb2e661e1605dc196 Author: gatorsmile <gatorsm...@gmail.com> Date: 2016-01-17T07:14:3
[GitHub] spark pull request: [SPARK-12987][SQL]Fixing the name resolution i...
GitHub user kevinyu98 reopened a pull request: https://github.com/apache/spark/pull/11009 [SPARK-12987][SQL]Fixing the name resolution in drop column @marmbrus @cloud-fan @thomas @jayadevan : Hello All: Can you help review this code fix? This problem is coming from drop column, after we drop off the old column, construct the new dataframe from the remaining columns. the new dataframe is using the Schema information to construct the column name, in this case, the name is string ‘a.c’. Since it is from Schema, we should take the name as it is, we should not do any parsing on the name. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark work_on_spark-12987 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11009.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11009 commit ff187c38234686ede0f859f4fe00a8013d8dc86f Author: Kevin Yu <q...@us.ibm.com> Date: 2016-02-02T00:08:01Z Fixing the name resolution in drop column --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12987][SQL]Fixing the name resolution i...
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/11009 [SPARK-12987][SQL]Fixing the name resolution in drop column @marmbrus @cloud-fan @thomas @jayadevan : Hello All: Can you help review this code fix? This problem is coming from drop column, after we drop off the old column, construct the new dataframe from the remaining columns. the new dataframe is using the Schema information to construct the column name, in this case, the name is string ‘a.c’. Since it is from Schema, we should take the name as it is, we should not do any parsing on the name. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark work_on_spark-12987 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11009.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11009 commit ff187c38234686ede0f859f4fe00a8013d8dc86f Author: Kevin Yu <q...@us.ibm.com> Date: 2016-02-02T00:08:01Z Fixing the name resolution in drop column --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12987][SQL]Fixing the name resolution i...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/11009#issuecomment-179478292 @marmbrus @cloud-fan @dilipbiswal @yzhou2001 : I have change the code based on Michael's comments, can you help review it again? Not sure why the first test failed, I run the sql test locally, it passed. [info] Run completed in 3 minutes, 23 seconds. [info] Total number of tests run: 1553 [info] Suites: completed 110, aborted 0 [info] Tests: succeeded 1553, failed 0, canceled 0, ignored 10, pending 0 [info] All tests passed. [info] Passed: Total 1553, Failed 0, Errors 0, Passed 1553, Ignored 10 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12987][SQL]Fixing the name resolution i...
Github user kevinyu98 closed the pull request at: https://github.com/apache/spark/pull/11009 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12987][SQL]Fixing the name resolution i...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/11009#issuecomment-178712237 @marmbrus @cloud-fan @dilipbiswal @yzhou2001 : it seems this is the duplicate of [SPARK-12988][SQL] Can't drop columns that contain dots #10943, I will close this PR. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/12646#issuecomment-214038288 Hello : I removed some invalid unit test cases and correct the error messages in the unit test cases. It passed the local tests. Can you retest it ? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/12646#issuecomment-214465479 Hello Dongjoon: Thanks for your comments, I will make changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/12646#issuecomment-214521095 @dongjoon-hyun Hello Dongjoon: I have fix the comments, let me know if you see anything else I need to change. Also I did git fetch upstream, git merge upstream/master, and merge my branch with the latest master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/12646#issuecomment-214921975 retest please, I just did rebase to resolve the conflicts. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/12646#issuecomment-215444282 retest it please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/12646#issuecomment-213853674 @hvanhovell @yhuai @chenghao-intel @gatorsmile @dilipbiswal @viirya @xwu0226 can you help take a look at this PR? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/12646 [SPARK-14878][SQL] Trim characters string function support What changes were proposed in this pull request? This PR enhances the TRIM function support in Spark SQL by allowing the specification of trim characters as per the SQL 2003 standard. Below is the SQL syntax : ``` SQL ::= TRIM ::= [ [ ] [ ] FROM ] ::= ::= LEADING | TRAILING | BOTH ::= ``` Here are the documentation link of support of this feature by other mainstream databases. - **Oracle:** [TRIM function](http://docs.oracle.com/javadb/10.6.1.0/ref/rreftrimfunc.html) - **DB2:** [TRIM scalar function](http://www.ibm.com/support/knowledgecenter/SSEPGG_9.8.0/com.ibm.db2.luw.sql.ref.doc/doc/r0023198.html) - **MySQL:** [Trim function](http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_trim) This PR is to implement the above enhancement. In the implementation, the design principle is to keep the changes to the minimum. Also, the exiting trim functions (which handles a special case, i.e., trimming space characters) are kept unchanged for performane reasons. How was this patch tested? The unit test cases are added in the following files: - UTF8StringSuite.java - StringExpressionsSuite.scala - sql/SQLQuerySuite.scala - StringFunctionsSuite.scala - ExpressionToSQLSuite.scala - execution/SQLQuerySuite.scala You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark spark-14878 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12646.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12646 commit ac718e268d6090fd788e5ec8addb10230cfae16b Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-06T21:38:53Z draft of seq[expression] commit c78ae966f30ac2437fe8292d9024adbef2f60860 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-07T02:20:23Z trim with binaryExpression commit c749691d532c0f09400c143379f1486c39fbaed8 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-08T06:11:00Z utf8 string code change commit 3c014a57daff15bb86995993de1bcdd0ab136fec Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-08T06:15:18Z Merge branch 'trim-fun4' into trim-seqexp I am using seq[expression] now commit ae68402631b8325c5037fed8bf4b45599f8d3000 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-11T21:03:45Z adding seq(expression) commit 7bb9770a75ccddd69eaee4c06674aa64220d828b Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-11T22:19:36Z fix2 commit 9525770c0e5bbba26b17d544653c8722ba261a37 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-15T23:12:51Z trim character commit 209bd195a9bc96889908b96ec26318b46a6d Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-16T00:13:26Z fix style at utf8stringsuite commit 4a49fcfa9ae102859ea78f0f4ec6d95a0d7855ed Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-16T17:10:16Z simply trim method commit 18c17b5bcb1e5574d58f46f1bf55defbbc1647ac Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-18T05:19:11Z fixing style and simply code commit 5833d26e8299efa6c47d4281eec7ea23f5dd3ec7 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-18T16:08:41Z simply trimleft commit 4e93a5032b352f3c0985d7e0fb362495077efdf7 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-18T16:26:02Z fixing more styles commit d6a1cb0dca88629d5d1d9ef8d05d08dcdb1089bc Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-19T16:12:04Z fixing style3 commit 3b44c5978bd44db986621d3e8511e9165b66926b Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-20T18:06:30Z adding testcase commit 7dc5ecaf52936017ac739ba58fe4b7c9036570e6 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-22T01:44:26Z fixing style 4 commit 25dbb2351bea034ffe300d94ea45c3277d399641 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-22T04:50:00Z adding trim comments commit 257303d5099dc405d5845bbcb9a5249d50aff018 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-22T06:42:04Z fixing more style5 commit 11438c030a6066daf2caf6252b645ae6c464efee Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-22T16:27:50Z fixing comments commit de7bff8d1a654919a1f509aaf1c7a5799e1815b4 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-22T21:31:43Z fixing more styles commit 18b4a31c687b264b50aa5f5a74455956911f738a Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-22T21:48:00Z Merge remote-tracking branch 'upstream/master' commit 4f4d1c8f2801b1e662304ab2b33351173e71b427 Author: Ke
[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/12646#issuecomment-216018276 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10125#issuecomment-219945586 I just did rebase to solve the conflict. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10125#issuecomment-219945597 retest it please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10125#issuecomment-220346292 retest it please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10125#discussion_r63880607 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala --- @@ -109,6 +109,7 @@ object DecimalType extends AbstractDataType { val MAX_SCALE = 38 val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 18) val USER_DEFAULT: DecimalType = DecimalType(10, 0) + val BIGINT_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 0) --- End diff -- sure, I will do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10125#discussion_r63900108 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java --- @@ -163,7 +168,9 @@ void validateDataFrameWithBeans(Bean bean, Dataset df) { Assert.assertEquals( new StructField("d", new ArrayType(DataTypes.StringType, true), true, Metadata.empty()), schema.apply("d")); -Row first = df.select("a", "b", "c", "d").first(); +Assert.assertEquals(new StructField("e", DataTypes.createDecimalType(38,0), true, Metadata.empty()), + schema.apply("e")); +Row first = df.select("a", "b", "c", "d","e").first(); --- End diff -- will add --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10125#discussion_r63900146 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java --- @@ -182,6 +189,8 @@ void validateDataFrameWithBeans(Bean bean, Dataset df) { for (int i = 0; i < d.length(); i++) { Assert.assertEquals(bean.getD().get(i), d.apply(i)); } + // Java.math.BigInteger is equavient to Spark Decimal(38,0) +Assert.assertEquals(new BigDecimal(bean.getE()), first.getDecimal(4).setScale(0)); --- End diff -- will remove that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10125#issuecomment-220361041 I will push the latest one after jenkins finish. Thanks very much ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10125#issuecomment-220237290 @cloud-fan I tried, and it still fail. It didn't go through the createDataFrame you added in SparkSession. It went with this createDataFrame(data: java.util.List[_], beanClass: Class[_]): DataFrame -> val rows = SQLContext.beansToRows(data.asScala.iterator, beanInfo, attrSeq) the beanToRows will create internal rows and it is from SQLContext. Should we add RowEncoder into the beansToRows call or leave the code as it is ? Thanks. here is the trace scala.MatchError: 1234567 (of class java.math.BigInteger) at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:326) at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:323) at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:892) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:892) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:892) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:890) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$class.toStream(Iterator.scala:1322) at scala.collection.AbstractIterator.toStream(Iterator.scala:1336) at scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298) at scala.collection.AbstractIterator.toSeq(Iterator.scala:1336) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:373) at test.org.apache.spark.sql.JavaDataFrameSuite.testCreateDataFrameFromLocalJavaBeans(JavaDataFrameSuite.java:200) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SPARK-15445] Build fails for java 1.7...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/13223#issuecomment-220639711 @techaddict @srowen @cloud-fan @gatorsmile : Hi Sandeep , thanks for fixing this. I didn't realize the method is java 1.8 only. The code looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10125#discussion_r63608338 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala --- @@ -321,11 +323,13 @@ object CatalystTypeConverters { } private class DecimalConverter(dataType: DecimalType) -extends CatalystTypeConverter[Any, JavaBigDecimal, Decimal] { + extends CatalystTypeConverter[Any, JavaBigDecimal, Decimal] { --- End diff -- sure, I will take this out. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10125#discussion_r63284859 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -129,6 +129,23 @@ final class Decimal extends Ordered[Decimal] with Serializable { } /** + * Set this Decimal to the given BigInteger value. Will have precision 38 and scale 0. + */ + def set(BigIntVal: BigInteger): Decimal = { --- End diff -- I will change it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14878][SQL] Trim characters string func...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/12646#issuecomment-219164845 retest it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10125#discussion_r63655581 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala --- @@ -321,11 +323,13 @@ object CatalystTypeConverters { } private class DecimalConverter(dataType: DecimalType) -extends CatalystTypeConverter[Any, JavaBigDecimal, Decimal] { + extends CatalystTypeConverter[Any, JavaBigDecimal, Decimal] { --- End diff -- Hello Wenchen: I have to keep case d: JavaBigInteger => Decimal(d) there, otherwise, this testcase will fail with the java.math.BigInteger. @Test public void testCreateDataFrameFromLocalJavaBeans() { Bean bean = new Bean(); List data = Arrays.asList(bean); Dataset df = spark.createDataFrame(data, Bean.class); validateDataFrameWithBeans(bean, df); } here is the trace scala.MatchError: 1234567 (of class java.math.BigInteger) at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:326) at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:323) at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:892) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:892) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:892) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:890) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$class.toStream(Iterator.scala:1322) at scala.collection.AbstractIterator.toStream(Iterator.scala:1336) at scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298) at scala.collection.AbstractIterator.toSeq(Iterator.scala:1336) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:373) at test.org.apache.spark.sql.JavaDataFrameSuite.testCreateDataFrameFromLocalJavaBeans(JavaDataFrameSuite.java:200) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10125#issuecomment-220037011 @cloud-fan can you help take a look? I have made changes based on your comments. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10125#issuecomment-220078687 sure, I will do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10125#discussion_r63228007 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ScalaReflectionRelationSuite.scala --- @@ -34,7 +34,13 @@ case class ReflectData( decimalField: java.math.BigDecimal, date: Date, timestampField: Timestamp, -seqInt: Seq[Int]) +seqInt: Seq[Int], +javaBigInt: java.math.BigInteger, +scalaBigInt: scala.math.BigInt) + +case class ReflectData3( + scalaBigInt: scala.math.BigInt + ) --- End diff -- I just removed that code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10125#issuecomment-219120180 @srowen @davies @cloud-fan I updated the code, can you help review? Sorry for the delay. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10125#issuecomment-219136812 retest it please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10125#issuecomment-219136788 I just run the ./dev/mima locally, it works, [info] Done packaging. [info] spark-examples: previous-artifact not set, not analyzing binary compatibility [info] spark-mllib: found 0 potential binary incompatibilities while checking against org.apache.spark:spark-mllib_2.11:1.6.0 (filtered 500) [info] spark-sql: found 0 potential binary incompatibilities while checking against org.apache.spark:spark-sql_2.11:1.6.0 (filtered 752) [success] Total time: 231 s, completed May 13, 2016 12:22:16 PM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/12893 [Spark-15051] [SQL] Create a TypedColumn alias ## What changes were proposed in this pull request? Currently when we try to create an alias against an aggregator TypedColumn, it is using the alias' function from Column, the function will create a column with TypedAggregateExpression, it is unresolved because the inputDeserializer is not defined. But the aggregator function will inject the inputDeserializer back only if it is TypedColumn, so the TypedAggregateExpression will remain unresolved and caused the problem reported by this jira [15051](https://issues.apache.org/jira/browse/SPARK-15051?jql=project%20%3D%20SPARK). This PR propose to create a TypedColumn's own alias function which will return TypedColumn , when it is used with aggregator function, the aggregator function will inject the inputDeserializer back . ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Add test cases in DatasetAggregatorSuite.scala run the sql related queries against this patch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark spark-15051 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12893.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12893 commit 3b44c5978bd44db986621d3e8511e9165b66926b Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-20T18:06:30Z adding testcase commit 18b4a31c687b264b50aa5f5a74455956911f738a Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-22T21:48:00Z Merge remote-tracking branch 'upstream/master' commit 4f4d1c8f2801b1e662304ab2b33351173e71b427 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-23T16:50:19Z Merge remote-tracking branch 'upstream/master' get latest code from upstream commit f5f0cbed1eb5754c04c36933b374c3b3d2ae4f4e Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-23T22:20:53Z Merge remote-tracking branch 'upstream/master' adding trim characters support commit d8b2edbd13ee9a4f057bca7dcb0c0940e8e867b8 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-25T20:24:33Z Merge remote-tracking branch 'upstream/master' get latest code for pr12646 commit 196b6c66b0d55232f427c860c0e7c6876c216a67 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-25T23:45:57Z Merge remote-tracking branch 'upstream/master' merge latest code commit f37a01e005f3e27ae2be056462d6eb6730933ba5 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-27T14:15:06Z Merge remote-tracking branch 'upstream/master' merge upstream/master commit bb5b01fd3abeea1b03315eccf26762fcc23f80c0 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-04-30T23:49:31Z Merge remote-tracking branch 'upstream/master' commit 99027fa9cfd3e968bd5dc3808e8af7f8456e1f2d Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-04T03:51:36Z fix commit bde5820a181cf84e0879038ad8c4cebac63c1e24 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-04T03:52:31Z Merge remote-tracking branch 'upstream/master' commit cc8f34006c916d3a5deb50d3def9d6029b514683 Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-04T03:53:53Z Merge branch 'testing-jira' into spark-15051 commit 0a348415e708464ba101fb0eafa0306c01f23aee Author: Kevin Yu <q...@us.ibm.com> Date: 2016-05-04T07:54:00Z fixing the typeColumn --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10125#issuecomment-217526107 @srowen: sorry for the long delay. I will work on it now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/12893#issuecomment-217443570 test please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/12893#issuecomment-216911919 @cloud-fan can you help take a look at this pr? Thanks very much ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/12893#discussion_r62150990 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala --- @@ -68,6 +68,25 @@ class TypedColumn[-T, U]( } new TypedColumn[T, U](newExpr, encoder) } + + /** Creates a TypedColumn based on the given expression. */ + private def withExpr(newExpr: Expression): TypedColumn[T, U] = --- End diff -- Sure, I will remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/12893#discussion_r62151072 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala --- @@ -68,6 +68,25 @@ class TypedColumn[-T, U]( } new TypedColumn[T, U](newExpr, encoder) } + + /** Creates a TypedColumn based on the given expression. */ + private def withExpr(newExpr: Expression): TypedColumn[T, U] = +new TypedColumn[T, U](newExpr, encoder) + + /** + * Gives the TypedColumn a name (alias). + * If the current TypedColumn has metadata associated with it, this metadata will be propagated + * to the new column. + * + * @group expr_ops + * @since 2.0.0 + */ + override def as(alias: String): TypedColumn[T, U] = withExpr { --- End diff -- @rxin @cloud-fan : Thanks very much. I have made the changes based on your comments. Can you help check? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org