[GitHub] spark pull request #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/14426 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14426 Thank you for guide. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15546: [SPARK-17982][SQL] SQLBuilder should wrap the gen...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15546#discussion_r86680401 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala --- @@ -45,7 +45,7 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils { // Used for generating new query answer files by saving private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" - private val goldenSQLPath = getTestResourcePath("sqlgen") + private val goldenSQLPath = "src/test/resources/sqlgen/" --- End diff -- Sure. I'll make a PR for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15546: [SPARK-17982][SQL] SQLBuilder should wrap the generated ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15546 Thank you for review, @gatorsmile . I'll add the failed plans, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should n...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/15789 [SPARK-18292][SQL] LogicalPlanToSQLSuite should not use resource dependent path for golden file generation ## What changes were proposed in this pull request? `LogicalPlanToSQLSuite` uses the following command to update the existing answer files. ```bash SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "hive/test-only *LogicalPlanToSQLSuite" ``` However, after introducing `getTestResourcePath`, it fails to update the previous golden answer files in the predefined directory. This issue aims to fix that by recovering the original path. ```scala - private val goldenSQLPath = getTestResourcePath("sqlgen") + private val goldenSQLPath = "src/test/resources/sqlgen/" ``` ## How was this patch tested? It's a testsuite update. Manual. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-18292 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15789.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15789 commit 725cd454fb44e19f773934724420bb01adf98a28 Author: Dongjoon Hyun Date: 2016-11-06T07:59:49Z [SPARK-18292][SQL] LogicalPlanToSQLSuite should not use resource dependent path for golden file generation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should not use ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15789 Hi, @gatorsmile . I create a PR for `LogicalPlanToSQLSuite` part from #15546 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should not use ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15789 Oh, you mean the the following. I'll fix this PR again. Sorry, @gatorsmile . ```scala private val baseResourcePath = { // If regenerateGoldenFiles is true, we must be running this in SBT and we use hard-coded // relative path. Otherwise, we use classloader's getResource to find the location. if (regenerateGoldenFiles) { java.nio.file.Paths.get("src", "test", "resources", "sql-tests").toFile } else { val res = getClass.getClassLoader.getResource("sql-tests") new File(res.getFile) } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should not use ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15789 It's fixed like `SQLQueryTestSuite`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should not use ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15789 There occurs four `randomized aggregation test`s failures due to `OutOfMemoryError`. But, it seems to be irrelevant. I'll wait the next test running currently. ``` [info] - randomized aggregation test - [typed, with partial + safe] - with grouping keys - with non-empty input *** FAILED *** (1 second, 512 milliseconds) [info] org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 1897.0 failed 1 times, most recent failure: Lost task 4.0 in stage 1897.0 (TID 6408, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should not use ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15789 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should not use ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15789 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should not use ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15789 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should not use ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15789 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15546: [SPARK-17982][SQL] SQLBuilder should wrap the generated ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15546 Hi, @gatorsmile . The PR is updated according to the advice. - Remove the `goldenSQLPath` from this PR. - Update the PR description about the target fixed cases and the detailed plans, which includes the `SubqueryAlias` cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should n...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15789#discussion_r86691156 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala --- @@ -46,7 +46,15 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils { // Used for generating new query answer files by saving private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" - private val goldenSQLPath = getTestResourcePath("sqlgen") + private val goldenSQLPath = { +// If regenerateGoldenFiles is true, we must be running this in SBT and we use hard-coded +// relative path. Otherwise, we use classloader's getResource to find the location. +if (regenerateGoldenFiles) { + java.nio.file.Paths.get("src", "test", "resources", "sqlgen").toFile.getCanonicalPath --- End diff -- Sure. I tested this in locally, too. - Change the query to check the regeneration in the correct directory. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should n...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15789#discussion_r86691196 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala --- @@ -46,7 +46,15 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils { // Used for generating new query answer files by saving private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" - private val goldenSQLPath = getTestResourcePath("sqlgen") + private val goldenSQLPath = { +// If regenerateGoldenFiles is true, we must be running this in SBT and we use hard-coded +// relative path. Otherwise, we use classloader's getResource to find the location. +if (regenerateGoldenFiles) { + java.nio.file.Paths.get("src", "test", "resources", "sqlgen").toFile.getCanonicalPath --- End diff -- Thank you for review, @srowen . For this one, there is a request in #15546 to follow the way `SQLQueryTestSuite`. - https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L88-L93 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should not use ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15789 Thank you, @srowen and @gatorsmile . I replied those two comments. For `java.nio.file.Paths.get("src", "test", "resources", "sqlgen").toFile.getCanonicalPath`, I was a little bit confused, so I didn't update so far. Could you give me advice again how to finalize that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should n...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15789#discussion_r86691489 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala --- @@ -46,7 +46,15 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils { // Used for generating new query answer files by saving private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" - private val goldenSQLPath = getTestResourcePath("sqlgen") + private val goldenSQLPath = { +// If regenerateGoldenFiles is true, we must be running this in SBT and we use hard-coded +// relative path. Otherwise, we use classloader's getResource to find the location. +if (regenerateGoldenFiles) { + java.nio.file.Paths.get("src", "test", "resources", "sqlgen").toFile.getCanonicalPath --- End diff -- Yep. I tested this in my private branch in a separate folder. When I changed the query for `range`, the output file is changed like the following. ```bash SPARK-18292-SQLGEN:SPARK-18292$ pwd /Users/dhyun/SPARK-18292-SQLGEN SPARK-18292-SQLGEN:SPARK-18292$ git branch * SPARK-18292 master SPARK-18292-SQLGEN:SPARK-18292$ pwd /Users/dhyun/SPARK-18292-SQLGEN SPARK-18292-SQLGEN:SPARK-18292$ git status On branch SPARK-18292 Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) modified: sql/hive/src/test/resources/sqlgen/range.sql modified: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should not use ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15789 Thank you, @srowen and @gatorsmile . I replied the comment. For `java.nio.file.Paths.get("src", "test", "resources", "sqlgen").toFile.getCanonicalPath`, could you give me advice again if I need to change that again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should not use ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15789 Thank you, @srowen ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should not use ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15789 Hi, @rxin . Yep. The actual issue is the current code generates the new golden files in the following folder. So, we cannot identify which file is changed and cannot commit the newly generated files easily. ``` .../spark/sql/hive/target/scala-2.11/test-classes/sqlgen ``` When we added this module initially, we faced the exactly same issue and decided to use the absolute path and to assume the following command in the source tree root. ``` SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "hive/test-only *LogicalPlanToSQLSuite" ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15789: [SPARK-18292][SQL] LogicalPlanToSQLSuite should not use ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15789 Thank you, @srowen ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] SET command should show a result sort...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16605#discussion_r96469606 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3073,7 +3073,12 @@ object functions { */ def udf[RT: TypeTag, A1: TypeTag](f: Function1[A1, RT]): UserDefinedFunction = { val inputTypes = Try(ScalaReflection.schemaFor(typeTag[A1]).dataType :: Nil).toOption +val inputConverters = Try( + ScalaReflection.scalaConverterFor(typeTag[A1]) :: + Nil +).toOption --- End diff -- Please update the template in the [comment](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3035-L3054) and make `val inputConverters` into single lines like `val inputTypes` in line 3075. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16605#discussion_r96470295 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -137,7 +137,11 @@ class UDFRegistration private[sql] (functionRegistry: FunctionRegistry) extends def register[RT: TypeTag, A1: TypeTag](name: String, func: Function1[A1, RT]): UserDefinedFunction = { val dataType = ScalaReflection.schemaFor[RT].dataType val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: Nil).toOption -def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes.getOrElse(Nil)) +val inputConverters = Try( --- End diff -- Please insert `inputConverters` into the template [comment](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L80-L117) and make `inputConverters` into a single line like line 139. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16605#discussion_r96473118 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -84,7 +86,9 @@ case class ScalaUDF( case 1 => val func = function.asInstanceOf[(Any) => Any] val child0 = children(0) - lazy val converter0 = CatalystTypeConverters.createToScalaConverter(child0.dataType) + lazy val converter0 = inputConverters.map { --- End diff -- Also, please update the template [comment](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala#L52-L71) and follow the similar syntax. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16605#discussion_r96466671 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1428,6 +1428,134 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { checkAnswer(df.select(primitiveUDF($"age")), Row(44) :: Row(null) :: Nil) } + test("SPARK-18884 correctly handle array inputs in functions.udf") { +Seq("true", "false").foreach { codegenEnabled => + withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> codegenEnabled) { +// scalastyle:off line.size.limit +Seq(( +udf { (ar1: Array[Int]) => ar1.sum }, +udf { (ar1: Array[Int], ar2: Array[Int]) => (ar1 ++ ar2).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int]) => (ar1 ++ ar2 ++ ar3).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int], ar4: Array[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int], ar4: Array[Int], ar5: Array[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int], ar4: Array[Int], ar5: Array[Int], ar6: Array[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int], ar4: Array[Int], ar5: Array[Int], ar6: Array[Int], ar7: Array[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int], ar4: Array[Int], ar5: Array[Int], ar6: Array[Int], ar7: Array[Int], ar8: Array[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7 ++ ar8).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int], ar4: Array[Int], ar5: Array[Int], ar6: Array[Int], ar7: Array[Int], ar8: Array[Int], ar9: Array[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7 ++ ar8 ++ ar9).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int], ar4: Array[Int], ar5: Array[Int], ar6: Array[Int], ar7: Array[Int], ar8: Array[Int], ar9: Array[Int], ar10: Array[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7 ++ ar8 ++ ar9 ++ ar10).sum } + ), ( +udf { (ar1: Seq[Int]) => ar1.sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int]) => (ar1 ++ ar2).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int]) => (ar1 ++ ar2 ++ ar3).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int], ar4: Seq[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int], ar4: Seq[Int], ar5: Seq[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int], ar4: Seq[Int], ar5: Seq[Int], ar6: Seq[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int], ar4: Seq[Int], ar5: Seq[Int], ar6: Seq[Int], ar7: Seq[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int], ar4: Seq[Int], ar5: Seq[Int], ar6: Seq[Int], ar7: Seq[Int], ar8: Seq[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7 ++ ar8).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int], ar4: Seq[Int], ar5: Seq[Int], ar6: Seq[Int], ar7: Seq[Int], ar8: Seq[Int], ar9: Seq[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7 ++ ar8 ++ ar9).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int], ar4: Seq[Int], ar5: Seq[Int], ar6: Seq[Int], ar7: Seq[Int], ar8: Seq[Int], ar9: Seq[Int], ar10: Seq[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7 ++ ar8 ++ ar9 ++ ar10).sum } + ) +).map { case (udf1, udf2, udf3, udf4, udf5, udf6, udf7, udf8, udf9, udf10) => + val arVal = functions.array(lit(1), lit(1)) --- End diff -- Could you change this to access the column value instead of `Literal`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [WIP][SPARK-19218][SQL] SET command should show a result...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 The root cause is not inside a newly added code. ``` org.apache.spark.sql.SQLQuerySuite.SET -v test 43 ms 1 org.apache.spark.sql.SQLQuerySuite.`SET -v` commands should return a list sorted by key ``` The current `set -v` implementation seems to have issue according to the error message. I'll make another PR to clarify that. ``` org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 480.0 failed 1 times, most recent failure: Lost task 0.0 in stage 480.0 (TID 1470, localhost, executor driver): java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org .apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) at org.apache.spark.scheduler.Task.run(Task.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:313) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [WIP][SPARK-19218][SQL] SET command should show a...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r96509362 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala --- @@ -79,7 +79,7 @@ case class SetCommand(kv: Option[(String, Option[String])]) extends RunnableComm // Queries all key-value pairs that are set in the SQLConf of the sparkSession. case None => val runFunc = (sparkSession: SparkSession) => { -sparkSession.conf.getAll.map { case (k, v) => Row(k, v) }.toSeq +sparkSession.conf.getAll.map { case (k, v) => Row(k, v) }.toSeq.sortBy(_.getString(0)) --- End diff -- Oh, @gatorsmile . I missed your comment. The return value from `sql("set -v")` seems not to be safe. I think there may be some synchronization issue here. I'll create a separate PR for that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16624: [WIP] Add two test cases for `SET -v`.
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/16624 [WIP] Add two test cases for `SET -v`. ## What changes were proposed in this pull request? This is to investigate the current `SET -v` behavior. ## How was this patch tested? Add two test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark setv Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16624.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16624 commit 60953bf1f1ba144e709fdae3903a390ff9479fd0 Author: Dongjoon Hyun Date: 2017-01-17T21:14:38Z Add two test cases for `SET -v`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16624: [WIP] Add two test cases for `SET -v`.
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16624 This is just to see the isolated result of `SET -v`. After resolving the failure, this will be close and be merged into #16579 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16624: [WIP] Add two test cases for `SET -v`.
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16624 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16624: [WIP] Add two test cases for `SET -v`.
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16624 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16625: [SPARK-17874][core] Add SSL port configuration.
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16625#discussion_r96544456 --- Diff: core/src/main/scala/org/apache/spark/SSLOptions.scala --- @@ -164,6 +167,11 @@ private[spark] object SSLOptions extends Logging { def parse(conf: SparkConf, ns: String, defaults: Option[SSLOptions] = None): SSLOptions = { val enabled = conf.getBoolean(s"$ns.enabled", defaultValue = defaults.exists(_.enabled)) +val port = conf.getOption(s"$ns.port").map(_.toInt) +port.foreach { p => + require(p >= 0, "Port number must be a positive value.") --- End diff -- nit. `positive` -> `non-negative`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16624: [WIP] Add two test cases for `SET -v`.
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16624 Finally, we get the expected failures. (The first MiMa failure and the second `HiveSparkSubmitSuite` was just flaky failures.) ``` org.apache.spark.sql.SQLQuerySuite.`SET -v` collect test 16 ms 1 org.apache.spark.sql.SQLQuerySuite.`SET -v` key collect test ``` Hi, @gatorsmile and @srowen . The current `set -v` command has a problem as we can see in these test failures. I will fix this first here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16605#discussion_r96550175 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1428,6 +1428,134 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { checkAnswer(df.select(primitiveUDF($"age")), Row(44) :: Row(null) :: Nil) } + test("SPARK-18884 correctly handle array inputs in functions.udf") { +Seq("true", "false").foreach { codegenEnabled => + withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> codegenEnabled) { +// scalastyle:off line.size.limit +Seq(( +udf { (ar1: Array[Int]) => ar1.sum }, +udf { (ar1: Array[Int], ar2: Array[Int]) => (ar1 ++ ar2).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int]) => (ar1 ++ ar2 ++ ar3).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int], ar4: Array[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int], ar4: Array[Int], ar5: Array[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int], ar4: Array[Int], ar5: Array[Int], ar6: Array[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int], ar4: Array[Int], ar5: Array[Int], ar6: Array[Int], ar7: Array[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int], ar4: Array[Int], ar5: Array[Int], ar6: Array[Int], ar7: Array[Int], ar8: Array[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7 ++ ar8).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int], ar4: Array[Int], ar5: Array[Int], ar6: Array[Int], ar7: Array[Int], ar8: Array[Int], ar9: Array[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7 ++ ar8 ++ ar9).sum }, +udf { (ar1: Array[Int], ar2: Array[Int], ar3: Array[Int], ar4: Array[Int], ar5: Array[Int], ar6: Array[Int], ar7: Array[Int], ar8: Array[Int], ar9: Array[Int], ar10: Array[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7 ++ ar8 ++ ar9 ++ ar10).sum } + ), ( +udf { (ar1: Seq[Int]) => ar1.sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int]) => (ar1 ++ ar2).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int]) => (ar1 ++ ar2 ++ ar3).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int], ar4: Seq[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int], ar4: Seq[Int], ar5: Seq[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int], ar4: Seq[Int], ar5: Seq[Int], ar6: Seq[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int], ar4: Seq[Int], ar5: Seq[Int], ar6: Seq[Int], ar7: Seq[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int], ar4: Seq[Int], ar5: Seq[Int], ar6: Seq[Int], ar7: Seq[Int], ar8: Seq[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7 ++ ar8).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int], ar4: Seq[Int], ar5: Seq[Int], ar6: Seq[Int], ar7: Seq[Int], ar8: Seq[Int], ar9: Seq[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7 ++ ar8 ++ ar9).sum }, +udf { (ar1: Seq[Int], ar2: Seq[Int], ar3: Seq[Int], ar4: Seq[Int], ar5: Seq[Int], ar6: Seq[Int], ar7: Seq[Int], ar8: Seq[Int], ar9: Seq[Int], ar10: Seq[Int]) => (ar1 ++ ar2 ++ ar3 ++ ar4 ++ ar5 ++ ar6 ++ ar7 ++ ar8 ++ ar9 ++ ar10).sum } + ) +).map { case (udf1, udf2, udf3, udf4, udf5, udf6, udf7, udf8, udf9, udf10) => + val arVal = functions.array(lit(1), lit(1)) --- End diff -- +1. Yes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16605#discussion_r96691353 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -84,7 +86,9 @@ case class ScalaUDF( case 1 => val func = function.asInstanceOf[(Any) => Any] val child0 = children(0) - lazy val converter0 = CatalystTypeConverters.createToScalaConverter(child0.dataType) + lazy val converter0 = inputConverters.map { --- End diff -- Hi, I think you missed this comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16605 Hi, @maropu . In the current master with your example, we can do the following. How do you think about this? ```scala scala> import scala.collection.mutable.WrappedArray import scala.collection.mutable.WrappedArray scala> val testArrayUdf = udf { (ar: WrappedArray[Int]) => ar.sum } testArrayUdf: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(,IntegerType,Some(List(ArrayType(IntegerType,false scala> df.select(testArrayUdf($"ar")).show +---+ |UDF(ar)| +---+ | 1| +---+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16621#discussion_r96726793 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -118,6 +118,14 @@ class SessionCatalog( } /** + * A cache of qualified table name to table relation plan. + */ + val tableRelationCache: Cache[QualifiedTableName, LogicalPlan] = { +// TODO: create a config instead of hardcode 1000 here. --- End diff -- Hi, @cloud-fan . Why not making this config in this PR? It seems to be easy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16621#discussion_r96785980 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -118,6 +118,14 @@ class SessionCatalog( } /** + * A cache of qualified table name to table relation plan. + */ + val tableRelationCache: Cache[QualifiedTableName, LogicalPlan] = { +// TODO: create a config instead of hardcode 1000 here. --- End diff -- Yep. Sure~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16605 Sure, @maropu . `WrappedArray` is not documented for now. Hi, @gatorsmile and @cloud-fan . Could you review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [WIP][SPARK-19218][SQL] SET command should show a result...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 Hi, Sorry for the delays, as you see https://github.com/apache/spark/pull/16624 , it's still unknown issue for the existing `SET -v`. In fact, that is orthogonal to this PR. If we removes the following, this PR will pass. I'll try to fix that TODAY again at there. BTW, if you don't mind, I will remove that test cases for now. ``` + test("SET -v test") { +sql("SET -v").map(_.getString(0)).collect() + } + + test("`SET -v` commands should return a list sorted by key") { +val result = sql("SET -v").map(_.getString(0)).collect() +assert(result === result.sorted) + } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [WIP][SPARK-19218][SQL] SET command should show a result...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97209831 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala --- @@ -79,16 +79,17 @@ case class SetCommand(kv: Option[(String, Option[String])]) extends RunnableComm // Queries all key-value pairs that are set in the SQLConf of the sparkSession. case None => val runFunc = (sparkSession: SparkSession) => { -sparkSession.conf.getAll.map { case (k, v) => Row(k, v) }.toSeq +sparkSession.conf.getAll.toSeq.sorted.map { case (k, v) => Row(k, v) } } (keyValueOutput, runFunc) // Queries all properties along with their default values and docs that are defined in the // SQLConf of the sparkSession. case Some(("-v", None)) => val runFunc = (sparkSession: SparkSession) => { -sparkSession.sessionState.conf.getAllDefinedConfs.map { case (key, defaultValue, doc) => - Row(key, defaultValue, doc) +sparkSession.sessionState.conf.getAllDefinedConfs.sorted.map { + case (key, defaultValue, doc) => +Row(key, if (defaultValue == null) "null" else defaultValue, doc) --- End diff -- This is the root cause to raise exceptions during decoding. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16624: [WIP] Fix `SET -v` not to raise exceptions for configs w...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16624 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 Hi, @srowen and @gatorsmile . Finally, this PR resolved all issues. Could you review this again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16624: [WIP] Fix `SET -v` not to raise exceptions for configs w...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16624 The final failure, `HiveSparkSubmitSuite.dir` is irrelevant to this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 For `SET -v` without sorting, please refer #16624 , too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16624: [WIP] Fix `SET -v` not to raise exceptions for configs w...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16624 Hi, @gatorsmile . I tested here and applied to #16579 . PR #16579 has two fixes. After merging #16579 , I'm going to close this one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 Hi, @gatorsmile . This is the original PR which has two fixes together now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 Thank you for keeping your attention and approval again, @srowen ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97230236 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala --- @@ -79,16 +79,17 @@ case class SetCommand(kv: Option[(String, Option[String])]) extends RunnableComm // Queries all key-value pairs that are set in the SQLConf of the sparkSession. case None => val runFunc = (sparkSession: SparkSession) => { -sparkSession.conf.getAll.map { case (k, v) => Row(k, v) }.toSeq +sparkSession.conf.getAll.toSeq.sorted.map { case (k, v) => Row(k, v) } } (keyValueOutput, runFunc) // Queries all properties along with their default values and docs that are defined in the // SQLConf of the sparkSession. case Some(("-v", None)) => val runFunc = (sparkSession: SparkSession) => { -sparkSession.sessionState.conf.getAllDefinedConfs.map { case (key, defaultValue, doc) => - Row(key, defaultValue, doc) +sparkSession.sessionState.conf.getAllDefinedConfs.sorted.map { + case (key, defaultValue, doc) => +Row(key, if (defaultValue == null) "null" else defaultValue, doc) --- End diff -- It's easy to change to that, but that seems to be for `Option` type config. Is it better to use the same ``? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 Yes. It was because the `stringConf` with default `null` is in YARN module. So, when I run a Unit test or `sql` module test. It doesn't meet this. Yesterday, I reproduced this in my local by add `stringConf` with default `null` in `SQLConf.scala` and fixed this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 Yes. BTW, to do that, we need to add the test case in `SET -v` unit test. Is it okay? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 Yep! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97230429 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala --- @@ -79,16 +79,17 @@ case class SetCommand(kv: Option[(String, Option[String])]) extends RunnableComm // Queries all key-value pairs that are set in the SQLConf of the sparkSession. case None => val runFunc = (sparkSession: SparkSession) => { -sparkSession.conf.getAll.map { case (k, v) => Row(k, v) }.toSeq +sparkSession.conf.getAll.toSeq.sorted.map { case (k, v) => Row(k, v) } } (keyValueOutput, runFunc) // Queries all properties along with their default values and docs that are defined in the // SQLConf of the sparkSession. case Some(("-v", None)) => val runFunc = (sparkSession: SparkSession) => { -sparkSession.sessionState.conf.getAllDefinedConfs.map { case (key, defaultValue, doc) => - Row(key, defaultValue, doc) +sparkSession.sessionState.conf.getAllDefinedConfs.sorted.map { + case (key, defaultValue, doc) => +Row(key, if (defaultValue == null) "null" else defaultValue, doc) --- End diff -- Oh, I see. Thank you. I'll update like that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 @gatorsmile . I updated with `` and added the test case. If you can revert the change on `SetCommand.scala`, this test case will fail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97239778 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala --- @@ -79,16 +79,17 @@ case class SetCommand(kv: Option[(String, Option[String])]) extends RunnableComm // Queries all key-value pairs that are set in the SQLConf of the sparkSession. case None => val runFunc = (sparkSession: SparkSession) => { -sparkSession.conf.getAll.map { case (k, v) => Row(k, v) }.toSeq +sparkSession.conf.getAll.toSeq.sorted.map { case (k, v) => Row(k, v) } } (keyValueOutput, runFunc) // Queries all properties along with their default values and docs that are defined in the // SQLConf of the sparkSession. case Some(("-v", None)) => val runFunc = (sparkSession: SparkSession) => { -sparkSession.sessionState.conf.getAllDefinedConfs.map { case (key, defaultValue, doc) => - Row(key, defaultValue, doc) +sparkSession.sessionState.conf.getAllDefinedConfs.sorted.map { + case (key, defaultValue, doc) => +Row(key, if (defaultValue == null) "" else defaultValue, doc) --- End diff -- Yep. It looks much better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97240545 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,28 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 Fix SET command to show a result correctly and in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) + +// Previsouly, `SET -v` fails with NPE during decoding for null value. +import SQLConf._ + SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +val result2 = sql("SET -v").collect() +assert(result2 === result2.sortBy(_.getString(0))) +spark.sessionState.conf.clear() --- End diff -- +1. Thank you, @gatorsmile . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 Thank you, @viirya . I noticed that `spark.sessionState.conf.clear()` is useless. I removed that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97249218 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- Ah, I see what you meant. Actually, previously, `SET -v` raises exceptions, so this case use `try` and `catch`. But, as you mentioned, now it's not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97249317 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- However, IMO, it's needed if there occurs some regression for this case in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97249644 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- Yes, but we need to clean up `spark.test` in order not to interrupt the other test cases here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97249959 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- The whole Jenkins test does not fail. You can see the test report in the PR description. Here. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71539/testReport/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97250196 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- :) The point is `the other test cases` are still running. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97250343 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- Maybe, we are confusing on *terms*. - You meant the other test *statements*. - I meant the other test *cases* --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97250719 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- Oh, I understand. Thanks. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 The only failure is irrelevant to this PR. ``` [info] - set spark.sql.warehouse.dir *** FAILED *** (5 minutes, 0 seconds) [info] Timeout of './bin/spark-submit' '--class' 'org.apache.spark.sql.hive.SetWarehouseLocationTest' '--name' 'SetSparkWarehouseLocationTest' '--master' 'local-cluster[2,1,1024]' '--conf' ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16624: [WIP] Fix `SET -v` not to raise exceptions for co...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/16624 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 Thank you, @gatorsmile , @srowen , and @viirya . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16684: [SPARK-16101][HOTFIX] Fix the build with Scala 2....
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16684#discussion_r97446421 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -62,7 +62,7 @@ private[csv] class UnivocityParser( } else { requiredSchema } -fields.map(schema.indexOf).toArray --- End diff -- Thanks, @HyukjinKwon ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16701: [SPARK-18909][SQL] The error messages in `Express...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/16701 [SPARK-18909][SQL] The error messages in `ExpressionEncoder.toRow/fromRow` are too verbose ## What changes were proposed in this pull request? In `ExpressionEncoder.toRow` and `fromRow`, we will catch the exception and put the treeString of serializer/deserializer expressions in the error message. However, encoder can be very complex and the serializer/deserializer expressions can be very large trees and blow up the log files(e.g. generate over 500mb logs for this single error message.) We should simplify it. **BEFORE** ``` scala> :paste // Entering paste mode (ctrl-D to finish) case class TestCaseClass(value: Int) import spark.implicits._ Seq(TestCaseClass(1)).toDS().collect() // Exiting paste mode, now interpreting. java.lang.RuntimeException: Error while decoding: java.lang.NullPointerException newInstance(class TestCaseClass) +- assertnotnull(input[0, int, false], - field (class: "scala.Int", name: "value"), - root class: "TestCaseClass") +- input[0, int, false] at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:303) ... ``` **AFTER** ``` ... // Exiting paste mode, now interpreting. java.lang.RuntimeException: Error while decoding: java.lang.NullPointerException newInstance(class TestCaseClass) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:303) ... ``` ## How was this patch tested? Manual. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-18909-EXPR-ERROR Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16701.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16701 commit 207a7472837e8110bdfa1a7b4b0e66a75ce29d73 Author: Dongjoon Hyun Date: 2017-01-25T09:27:26Z [SPARK-18909][SQL] The error messages in `ExpressionEncoder.toRow/fromRow` are too verbose --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16701: [SPARK-18909][SQL] The error messages in `Express...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16701#discussion_r97739693 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -288,7 +288,7 @@ case class ExpressionEncoder[T]( } catch { case e: Exception => throw new RuntimeException( -s"Error while encoding: $e\n${serializer.map(_.treeString).mkString("\n")}", e) +s"Error while encoding: $e\n${serializer.map(_.simpleString).mkString("\n")}", e) --- End diff -- This is a first try to start this issue. We can improve this if needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16701: [SPARK-18909][SQL] The error messages in `ExpressionEnco...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16701 Thank you for review, @srowen . Yes. These two are the scope of the issue description of SPARK-18909. @cloud-fan , did I understand correctly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(l, r)` ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16719 Hi, @lw-lin . Thank you for pining me. I'll take a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98278557 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -75,10 +107,14 @@ class ExpressionSetSuite extends SparkFunSuite { setTest(1, aUpper >= bUpper, bUpper <= aUpper) // `Not` canonicalization - setTest(1, Not(aUpper > 1), aUpper <= 1, Not(Literal(1) < aUpper), Literal(1) >= aUpper) - setTest(1, Not(aUpper < 1), aUpper >= 1, Not(Literal(1) > aUpper), Literal(1) <= aUpper) - setTest(1, Not(aUpper >= 1), aUpper < 1, Not(Literal(1) <= aUpper), Literal(1) > aUpper) - setTest(1, Not(aUpper <= 1), aUpper > 1, Not(Literal(1) >= aUpper), Literal(1) < aUpper) + setTest(1, Not(maxHash > 1), maxHash <= 1, Not(Literal(1) < maxHash), Literal(1) >= maxHash) + setTest(1, Not(minHash > 1), minHash <= 1, Not(Literal(1) < minHash), Literal(1) >= minHash) + setTest(1, Not(maxHash < 1), maxHash >= 1, Not(Literal(1) > maxHash), Literal(1) <= maxHash) + setTest(1, Not(minHash < 1), minHash >= 1, Not(Literal(1) > minHash), Literal(1) <= minHash) + setTest(1, Not(maxHash >= 1), maxHash < 1, Not(Literal(1) <= maxHash), Literal(1) > maxHash) + setTest(1, Not(minHash >= 1), minHash < 1, Not(Literal(1) <= minHash), Literal(1) > minHash) + setTest(1, Not(maxHash <= 1), maxHash > 1, Not(Literal(1) >= maxHash), Literal(1) < maxHash) + setTest(1, Not(minHash <= 1), minHash > 1, Not(Literal(1) >= minHash), Literal(1) < minHash) --- End diff -- These test cases are covered previously correctly. Actually, this PR simplifies the logics only. Am I right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98280054 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala --- @@ -78,14 +78,18 @@ object Canonicalize extends { case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l) -case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r) -case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, l) -case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r) -case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r) -case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) -case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r) +case Not(GreaterThan(l, r)) => + assert(l.hashCode() <= r.hashCode()) --- End diff -- Can we remove these `assert`s? It seems to be verified with your test cases now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(l, r)` ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16719 The original logic was designed to be safe for changing the caller bottom-up code, [here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L154-L157). ```scala lazy val canonicalized: Expression = { val canonicalizedChildren = children.map(_.canonicalized) Canonicalize.execute(withNewChildren(canonicalizedChildren)) } ``` But, I agree that it's safe to simplify that with the new @lw-lin 's test cases. For the `assert` statements, I think @cloud-fan and @gatorsmile can give more insightful advice. For me, LGTM except that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98360513 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite { val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil) + // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules + // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier + val maxHash = +Canonicalize.ignoreNamesTypes( + AttributeReference("maxHash", IntegerType)(exprId = +new ExprId(4, NamedExpression.jvmId) { + // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this + // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost + // `Int.MaxValue` + override def hashCode: Int = 826929706 --- End diff -- Great! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16734: [SPARK-19396] [DOC] JDBC Options are Case In-sens...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16734#discussion_r98368350 --- Diff: docs/sql-programming-guide.md --- @@ -1135,7 +1135,7 @@ Tables from the remote database can be loaded as a DataFrame or Spark SQL tempor the Data Sources API. Users can specify the JDBC connection properties in the data source options. user and password are normally provided as connection properties for logging into the data sources. In addition to the connection properties, Spark also supports -the following case-sensitive options: +the following case-insensitive options: --- End diff -- +1, LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16737: [SPARK-19397] [SQL] Make option names of LIBSVM and TEXT...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16737 Oh, then, are these the last piece of the case-sensitive options? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16737: [SPARK-19397] [SQL] Make option names of LIBSVM a...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16737#discussion_r98602527 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMOptions.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.source.libsvm + +import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap + +/** + * Options for the Text data source. --- End diff -- `Options for the LibSVM data source.`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16737: [SPARK-19397] [SQL] Make option names of LIBSVM and TEXT...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16737 +1. LGTM except one typo! @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16726: [SPARK-19390][SQL] Replace the unnecessary usages of hiv...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16726 +1 for reducing the dependency! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16746: [SPARK-15648][SQL] Add teradataDialect for JDBC c...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16746#discussion_r98607321 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.jdbc + +import java.sql.Types +import org.apache.spark.sql.types._ + + +private case object TeradataDialect extends JdbcDialect { + + override def canHandle(url: String): Boolean = { url.startsWith("jdbc:teradata") } + + override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { +case StringType => Some(JdbcType("VARCHAR(255)", java.sql.Types.VARCHAR)) +case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR)) +case _ => None + } --- End diff -- Hi, @klinvill . According to the description and initial PR in SPARK-15648, Teradata didn't support `LIMIT` query at that time. Now, it support `LIMIT`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16746: [SPARK-15648][SQL] Add teradataDialect for JDBC c...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16746#discussion_r98607602 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.jdbc + +import java.sql.Types --- End diff -- A blank line is needed here. You can run the following command line to check that and to confirm after fixing. ``` $ dev/lint-scala ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16746: [SPARK-15648][SQL] Add teradataDialect for JDBC c...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16746#discussion_r98608480 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.jdbc + +import java.sql.Types +import org.apache.spark.sql.types._ + + +private case object TeradataDialect extends JdbcDialect { + + override def canHandle(url: String): Boolean = { url.startsWith("jdbc:teradata") } + + override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { +case StringType => Some(JdbcType("VARCHAR(255)", java.sql.Types.VARCHAR)) +case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR)) +case _ => None + } --- End diff -- Could you verify if we need to override the followings together? ```scala override def quoteIdentifier(colName: String): String = ... override def getTableExistsQuery(table: String): String = ... override def isCascadingTruncateTable(): Option[Boolean] = ... ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16746: [SPARK-15648][SQL] Add teradataDialect for JDBC connecti...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16746 BTW, @klinvill . Do you use a real instance? Could you advice how the other persons like me can verify your PR on Teradata? Maybe, can we check Teradata Express or AWS Marketplace? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8....
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/16751 [SPARK-19409][BUILD] Bump parquet version to 1.8.2 ## What changes were proposed in this pull request? Apache Parquet 1.8.2 is released officially last week on 26 Jan. https://lists.apache.org/thread.html/af0c813f1419899289a336d96ec02b3bbeecaea23aa6ef69f435c142@%3Cdev.parquet.apache.org%3E This PR only aims to bump Parquet version to 1.8.2. It didn't touch other codes. ## How was this patch tested? Pass the existing tests and also manually by doing `./dev/test-dependencies.sh`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-19409 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16751.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16751 commit 92dc3e50f136be088357aa7b477ffd79f138be0e Author: Dongjoon Hyun Date: 2017-01-31T08:41:46Z [SPARK-19409][BUILD] Bump parquet version to 1.8.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16281 Hi, all. Now, I'm trying to upgrade Apache Spark to 1.8.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16751 Thank you for review and merging, @viirya , @srowen , and @rxin ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16281 Thank you for sharing that information, @mallman . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16751 Ya, @mallman . However, with the same reason, I conclude to put them away from here. Exactly, the opposite direction of your opinion. If we try to fix all of them in a single shot, it will not merged for a long time. At least, you can see #16756 starts already. I think those workarounds are going to be cleaned up soon if this commits are not reverted for some reason. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16746: [SPARK-15648][SQL] Add teradataDialect for JDBC c...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16746#discussion_r98750209 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.jdbc + +import java.sql.Types +import org.apache.spark.sql.types._ + + +private case object TeradataDialect extends JdbcDialect { + + override def canHandle(url: String): Boolean = { url.startsWith("jdbc:teradata") } + + override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { +case StringType => Some(JdbcType("VARCHAR(255)", java.sql.Types.VARCHAR)) +case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR)) +case _ => None + } --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16746: [SPARK-15648][SQL] Add teradataDialect for JDBC c...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16746#discussion_r98751002 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.jdbc + +import java.sql.Types +import org.apache.spark.sql.types._ + + +private case object TeradataDialect extends JdbcDialect { + + override def canHandle(url: String): Boolean = { url.startsWith("jdbc:teradata") } + + override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { +case StringType => Some(JdbcType("VARCHAR(255)", java.sql.Types.VARCHAR)) +case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR)) +case _ => None + } --- End diff -- What about `isCascadingTruncateTable`? Could you check if Teradata does truncate cascadingly by default for `TRUNCATE TABLE` statement? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16751 Hi, @rxin . Sure, I'll try to put them in a single PR except the ongoing one. BTW, every time, I noticed that committers have a better and broader perspective than me. Do you have something more in mind beside those issues mentioned #16281 and here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16751 Thank you for informing that, @robbinspg . Could you make a JIRA issue to keep track? I'll investigate there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16701: [SPARK-18909][SQL] The error messages in `ExpressionEnco...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16701 Thank you for review and merging, @cloud-fan ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12583: [SPARK-14819] [SQL] Improve SET / SET -v command
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/12583 Hi, @bomeng , do you have something to refresh here? I think we can close this for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org