[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22707#discussion_r240073151 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala --- @@ -774,4 +774,23 @@ class InsertSuite extends QueryTest with TestHiveSingleton with BeforeAndAfter } } } + + test("SPARK-25717: Insert overwrite a recreated external and partitioned table " --- End diff -- How about `SPARK-25717: Insert overwrites remove old partition dirs correctly`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22707#discussion_r240002357 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -227,18 +227,22 @@ case class InsertIntoHiveTable( // Newer Hive largely improves insert overwrite performance. As Spark uses older Hive // version and we may not want to catch up new Hive version every time. We delete the // Hive partition first and then load data file into the Hive partition. - if (oldPart.nonEmpty && overwrite) { -oldPart.get.storage.locationUri.foreach { uri => - val partitionPath = new Path(uri) - val fs = partitionPath.getFileSystem(hadoopConf) - if (fs.exists(partitionPath)) { -if (!fs.delete(partitionPath, true)) { - throw new RuntimeException( -"Cannot remove partition directory '" + partitionPath.toString) -} -// Don't let Hive do overwrite operation since it is slower. -doHiveOverwrite = false + if (overwrite) { +val oldPartitionPath = oldPart.flatMap(_.storage.locationUri.map(new Path(_))) + .getOrElse { +ExternalCatalogUtils.generatePartitionPath( + partitionSpec, + partitionColumnNames, + new Path(table.location)) --- End diff -- We still need to consider the old path, `oldPart`? Can't we write this?; ``` val oldPartitionPath = ExternalCatalogUtils.generatePartitionPath( partitionSpec, partitionColumnNames, new Path(table.location)) ``` Also, can you write a comment about how to solve this issue here and in the pr description? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22707#discussion_r239997805 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala --- @@ -774,4 +774,23 @@ class InsertSuite extends QueryTest with TestHiveSingleton with BeforeAndAfter } } } + + test("SPARK-25717: Insert overwrite a recreated external and partitioned table " ++ "should remove the historical partition first") { +withTempDir { tmpDir => + withTable("test_table") { +(0 until 3).foreach { _ => + sql("DROP TABLE IF EXISTS test_table") + sql( +s""" + |CREATE EXTERNAL TABLE test_table (key int) + |PARTITIONED BY (p int) + |LOCATION '${tmpDir.toURI.toString.stripSuffix("/")}/test_table' + """.stripMargin) + sql("INSERT OVERWRITE TABLE test_table PARTITION(p=1) SELECT 1") +} +checkAnswer(sql("SELECT COUNT(*) FROM test_table"), Row(1)) + } +} --- End diff -- How about wrigint tests for this?; ``` withTempDir { tmpDir => withTable("test_table") { // Prepare table data sql( s""" |CREATE EXTERNAL TABLE test_table(key int) |PARTITIONED BY (p int) |LOCATION '${tmpDir.toURI}' """.stripMargin) sql("INSERT INTO test_table PARTITION(p=1) SELECT 1") checkAnswer(sql("SELECT COUNT(*) FROM test_table"), Row(1)) // Run the test... sql("DROP TABLE test_table") sql( s""" |CREATE EXTERNAL TABLE test_table(key int) |PARTITIONED BY (p int) |LOCATION '${tmpDir.toURI}' """.stripMargin) sql("INSERT OVERWRITE TABLE test_table PARTITION(p=1) SELECT 1") checkAnswer(sql("SELECT COUNT(*) FROM test_table"), Row(1)) } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22707#discussion_r239996822 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala --- @@ -774,4 +774,23 @@ class InsertSuite extends QueryTest with TestHiveSingleton with BeforeAndAfter } } } + + test("SPARK-25717: Insert overwrite a recreated external and partitioned table " ++ "should remove the historical partition first") { +withTempDir { tmpDir => + withTable("test_table") { +(0 until 3).foreach { _ => + sql("DROP TABLE IF EXISTS test_table") + sql( +s""" + |CREATE EXTERNAL TABLE test_table (key int) + |PARTITIONED BY (p int) + |LOCATION '${tmpDir.toURI.toString.stripSuffix("/")}/test_table' --- End diff -- nit: `|LOCATION '${tmpDir.toURI}'`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22707#discussion_r239996528 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala --- @@ -774,4 +774,23 @@ class InsertSuite extends QueryTest with TestHiveSingleton with BeforeAndAfter } } } + + test("SPARK-25717: Insert overwrite a recreated external and partitioned table " --- End diff -- Can you make the title shorter in a single line? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22707: [SPARK-25717][SQL] Insert overwrite a recreated external...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22707 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reser...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23259#discussion_r239994423 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -769,7 +774,7 @@ nonReserved | REVOKE | GRANT | LOCK | UNLOCK | MSCK | REPAIR | RECOVER | EXPORT | IMPORT | LOAD | VALUES | COMMENT | ROLE | ROLES | COMPACTIONS | PRINCIPALS | TRANSACTIONS | INDEX | INDEXES | LOCKS | OPTION | LOCAL | INPATH | ASC | DESC | LIMIT | RENAME | SETS -| AT | NULLS | OVERWRITE | ALL | ANY | ALTER | AS | BETWEEN | BY | CREATE | DELETE +| AT | NULLS | OVERWRITE | ANY | ALTER | AS | BETWEEN | BY | CREATE | DELETE --- End diff -- yea, thanks. you're right. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23259 To discuss this topic smoothly, I made this pr. Any comment/suggestion is welcome. cc: @gatorsmile @cloud-fan @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reser...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/23259 [SPARK-26215][SQL][WIP] Define reserved/non-reserved keywords based on the ANSI SQL standard ## What changes were proposed in this pull request? This pr targeted to define reserved/non-reserved keywords based on the ANSI SQL standard. TODO: - Where should we hanlde reserved key words? - Which SQL standard does Spark SQL follow (e.g., 2011 or 2016)? - Where should we docment the list of reserved/non-reserved key words? - Others? ## How was this patch tested? Added tests in `TableIdentifierParserSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-26215-WIP Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23259.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23259 commit 01bc38347496a6194b46ace0feb7d2cd1adb614e Author: Takeshi Yamamuro Date: 2018-12-06T08:04:49Z WIP: SQL Reserved/Non-Reserved Key Words --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23258#discussion_r239993889 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with SQLMetricsTestUtils with Shared } test("Sort metrics") { -// Assume the execution plan is -// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1)) -val ds = spark.range(10).sort('id) -testSparkPlanMetrics(ds.toDF(), 2, Map.empty) +// Assume the execution plan with node id is +// Sort(nodeId = 0) +// Exchange(nodeId = 1) +// Range(nodeId = 2) +val df = spark.range(9, -1, -1).sort('id).toDF() --- End diff -- We need to use `range` here? How about just writing `Seq(1, 3, 2, ...).toDF("id")`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23258#discussion_r239993718 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with SQLMetricsTestUtils with Shared } test("Sort metrics") { -// Assume the execution plan is -// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1)) -val ds = spark.range(10).sort('id) -testSparkPlanMetrics(ds.toDF(), 2, Map.empty) +// Assume the execution plan with node id is +// Sort(nodeId = 0) +// Exchange(nodeId = 1) +// Range(nodeId = 2) +val df = spark.range(9, -1, -1).sort('id).toDF() +testSparkPlanMetrics(df, 2, Map.empty) + df.queryExecution.executedPlan.find(_.isInstanceOf[SortExec]).getOrElse(assert(false)) --- End diff -- `assert(df.queryExecution.executedPlan.find(_.isInstanceOf[SortExec]).isDefined)`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23258#discussion_r239993561 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -26,7 +26,7 @@ import org.apache.spark.SparkFunSuite import org.apache.spark.sql._ import org.apache.spark.sql.catalyst.expressions.aggregate.{Final, Partial} import org.apache.spark.sql.catalyst.plans.logical.LocalRelation -import org.apache.spark.sql.execution.{FilterExec, RangeExec, SparkPlan, WholeStageCodegenExec} +import org.apache.spark.sql.execution._ --- End diff -- nit: unfold? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metrics whi...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23258 cc: @mgaido91 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22141 nvm ;) we still have much time until the next release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23213 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23206: [SPARK-26249][SQL] Add ability to inject a rule in order...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23206 What's a concrete example? IMHO the current proposed API is some complicated/cumbersome to users and I feel its error-prone. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22141 Any update? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21777: [WIP][SPARK-24498][SQL] Add JDK compiler for runtime cod...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21777 @kiszk Can you close this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23213 Anyway, if we can accept the additional test time, I think it is the best to run the tests on all the 4 patterns above for strict checks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23206: [SPARK-26249][SQL] Add ability to inject a rule in order...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23206 cc: @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23206: [SPARK-26249][SQL] Add ability to inject a rule in order...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23206 The current post hook is not enough for the use case you assume? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23213 In short, we need to run the tests on the three patterns below, right? - wholeStage=true, factoryMode=CODEGEN_ONLY (default behaviour in Spark) - wholeStage=false, factoryMode=CODEGEN_ONLY - wholeStage=false, factoryMode=NO_CODEGEN --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23213 yea, I think they're not totally the same..., but I'm not sure that the test run (`wholeStage=false, factoryMode=CODE_ONLY`) is worth the time cost. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23213 Sorry, my bad; it was longer than the current master by ~2 times. That's because the current master has already run two config set patterns (`wholeStage=true,factoryMode=CODEGEN_ONLY` and `wholeStage=true,factoryMode=NO_CODEGEN`) in `SQLQueryTestSuite`. The second test run (`wholeStage=true,factoryMode=NO_CODEGEN`) was introduced in my previous pr (#22512). IMHO two config set patterns below could cover most code paths in Spark? - wholeStage=true, factoryMode=CODEGEN_ONLY - wholeStage=false, factoryMode=NO_CODEGEN In this case, there is little change in the test time; ``` // the current master === Codegen/Interpreter Time Metrics === Total time: 358.584989321 seconds Configs Run Time spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=NO_CODEGEN 165961038511 spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=CODEGEN_ONLY 192623950810 // with this pr === Codegen/Interpreter Time Metrics === Total time: 345.468455247 seconds Configs Run Time spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=CODEGEN_ONLY 196572976377 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=NO_CODEGEN 148895478870 ``` WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23213 yea, it seems its longer by ~4 times; ``` 23:25:43.880 WARN org.apache.spark.sql.SQLQueryTestSuite: === Codegen/Interpreter Time Metrics === Total time: 602.64531157 seconds Configs Run Time (seconds) spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=NO_CODEGEN 156414789416 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=CODEGEN_ONLY 138343055840 spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=CODEGEN_ONLY 171905020550 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=NO_CODEGEN 135982445764 ``` https://github.com/apache/spark/commit/7a69e0b6700fc5c7ad3acef35137f220b8804fd6 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23213 I'm looking into that now ;) Just give me more time to check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixe...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238905795 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala --- @@ -53,6 +55,133 @@ class ExplainSuite extends QueryTest with SharedSQLContext { checkKeywordsExistsInExplain(df, keywords = "InMemoryRelation", "StorageLevel(disk, memory, deserialized, 1 replicas)") } + + test("optimized plan should show the rewritten aggregate expression") { +withTempView("test_agg") { + sql( +""" + |CREATE TEMPORARY VIEW test_agg AS SELECT * FROM VALUES + | (1, true), (1, false), + | (2, true), + | (3, false), (3, null), + | (4, null), (4, null), + | (5, null), (5, true), (5, false) AS test_agg(k, v) +""".stripMargin) + + // simple explain of queries having every/some/any aggregates. Optimized + // plan should show the rewritten aggregate expression. + val df = sql("SELECT k, every(v), some(v), any(v) FROM test_agg GROUP BY k") + checkKeywordsExistsInExplain(df, +"Aggregate [k#x], [k#x, min(v#x) AS every(v)#x, max(v#x) AS some(v)#x, " + --- End diff -- I forgot to set true at extended in explain... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixe...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238899777 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2899,6 +2899,144 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + private def checkKeywordsExistsInExplain(df: DataFrame, keywords: String*): Unit = { +val output = new java.io.ByteArrayOutputStream() +Console.withOut(output) { + df.explain(extended = true) +} +val normalizedOutput = output.toString.replaceAll("#\\d+", "#x") +for (key <- keywords) { + assert(normalizedOutput.contains(key)) +} + } + + test("optimized plan should show the rewritten aggregate expression") { --- End diff -- updated! Thanks, guys! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238895286 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -144,9 +144,10 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { val (comments, code) = input.split("\n").partition(_.startsWith("--")) // Runs all the tests on both codegen-only and interpreter modes -val codegenConfigSets = Array(CODEGEN_ONLY, NO_CODEGEN).map { - case codegenFactoryMode => -Array(SQLConf.CODEGEN_FACTORY_MODE.key -> codegenFactoryMode.toString) +val codegenConfigSets = Array(("false", "NO_CODEGEN"), ("true", "CODEGEN_ONLY")).map { --- End diff -- I will check the time later, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238894837 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala --- @@ -535,4 +535,98 @@ class UnsafeRowConverterSuite extends SparkFunSuite with Matchers with PlanTestB assert(unsafeRow.getSizeInBytes == 8 + 8 * 2 + roundedSize(field1.getSizeInBytes) + roundedSize(field2.getSizeInBytes)) } + + testBothCodegenAndInterpreted("SPARK-25374 converts back into safe representation") { +def convertBackToInternalRow(inputRow: InternalRow, fields: Array[DataType]): InternalRow = { + val unsafeProj = UnsafeProjection.create(fields) + val unsafeRow = unsafeProj(inputRow) + val safeProj = SafeProjection.create(fields) + safeProj(unsafeRow) +} + +// Simple tests +val inputRow = InternalRow.fromSeq(Seq( + false, 3.toByte, 15.toShort, -83, 129L, 1.0f, 8.0, UTF8String.fromString("test"), + Decimal(255), CalendarInterval.fromString("interval 1 day"), Array[Byte](1, 2) +)) +val fields1 = Array( + BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, + DoubleType, StringType, DecimalType.defaultConcreteType, CalendarIntervalType, + BinaryType) + +assert(convertBackToInternalRow(inputRow, fields1) === inputRow) + +// Array tests +val arrayRow = InternalRow.fromSeq(Seq( + createArray(1, 2, 3), + createArray( +createArray(Seq("a", "b", "c").map(UTF8String.fromString): _*), +createArray(Seq("d").map(UTF8String.fromString): _*)) +)) +val fields2 = Array[DataType]( + ArrayType(IntegerType), + ArrayType(ArrayType(StringType))) + +assert(convertBackToInternalRow(arrayRow, fields2) === arrayRow) + +// Struct tests +val structRow = InternalRow.fromSeq(Seq( + InternalRow.fromSeq(Seq[Any](1, 4.0)), + InternalRow.fromSeq(Seq( +UTF8String.fromString("test"), +InternalRow.fromSeq(Seq( + 1, + createArray(Seq("2", "3").map(UTF8String.fromString): _*) +)) + )) +)) +val fields3 = Array[DataType]( + StructType( +StructField("c0", IntegerType) :: +StructField("c1", DoubleType) :: +Nil), + StructType( +StructField("c2", StringType) :: +StructField("c3", StructType( + StructField("c4", IntegerType) :: + StructField("c5", ArrayType(StringType)) :: + Nil)) :: +Nil)) + +assert(convertBackToInternalRow(structRow, fields3) === structRow) + +// Map tests +val mapRow = InternalRow.fromSeq(Seq( + createMap(Seq("k1", "k2").map(UTF8String.fromString): _*)(1, 2), + createMap( +createMap(3, 5)(Seq("v1", "v2").map(UTF8String.fromString): _*), +createMap(7, 9)(Seq("v3", "v4").map(UTF8String.fromString): _*) + )( +createMap(Seq("k3", "k4").map(UTF8String.fromString): _*)(3.toShort, 4.toShort), +createMap(Seq("k5", "k6").map(UTF8String.fromString): _*)(5.toShort, 6.toShort) + ))) +val fields4 = Array[DataType]( + MapType(StringType, IntegerType), + MapType(MapType(IntegerType, StringType), MapType(StringType, ShortType))) + +val mapResultRow = convertBackToInternalRow(mapRow, fields4).toSeq(fields4) +val mapExpectedRow = mapRow.toSeq(fields4) +// Since `ArrayBasedMapData` does not override `equals` and `hashCode`, --- End diff -- Aha, thanks. I remember that its related to SPARK-18134. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23194 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238630406 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2899,6 +2899,144 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + private def checkKeywordsExistsInExplain(df: DataFrame, keywords: String*): Unit = { +val output = new java.io.ByteArrayOutputStream() +Console.withOut(output) { + df.explain(extended = true) +} +val normalizedOutput = output.toString.replaceAll("#\\d+", "#x") +for (key <- keywords) { + assert(normalizedOutput.contains(key)) +} + } + + test("optimized plan should show the rewritten aggregate expression") { --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23213 yea, its similar, but I personally think its orthogonal to SPARK-24562. This pr only targets a default config set for codegen-only and interpreter mode tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238625915 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -144,9 +144,10 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { val (comments, code) = input.split("\n").partition(_.startsWith("--")) // Runs all the tests on both codegen-only and interpreter modes -val codegenConfigSets = Array(CODEGEN_ONLY, NO_CODEGEN).map { - case codegenFactoryMode => -Array(SQLConf.CODEGEN_FACTORY_MODE.key -> codegenFactoryMode.toString) +val codegenConfigSets = Array(("false", "NO_CODEGEN"), ("true", "CODEGEN_ONLY")).map { --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238625477 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2899,6 +2899,144 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + private def checkKeywordsExistsInExplain(df: DataFrame, keywords: String*): Unit = { +val output = new java.io.ByteArrayOutputStream() +Console.withOut(output) { + df.explain(extended = true) +} +val normalizedOutput = output.toString.replaceAll("#\\d+", "#x") +for (key <- keywords) { + assert(normalizedOutput.contains(key)) +} + } + + test("optimized plan should show the rewritten aggregate expression") { --- End diff -- all the tests? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20433 cc: @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23213 cc: @cloud-fan @mgaido91 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22468 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238543369 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala --- @@ -535,4 +535,98 @@ class UnsafeRowConverterSuite extends SparkFunSuite with Matchers with PlanTestB assert(unsafeRow.getSizeInBytes == 8 + 8 * 2 + roundedSize(field1.getSizeInBytes) + roundedSize(field2.getSizeInBytes)) } + + testBothCodegenAndInterpreted("SPARK-25374 converts back into safe representation") { +def convertBackToInternalRow(inputRow: InternalRow, fields: Array[DataType]): InternalRow = { + val unsafeProj = UnsafeProjection.create(fields) + val unsafeRow = unsafeProj(inputRow) + val safeProj = SafeProjection.create(fields) + safeProj(unsafeRow) +} + +// Simple tests +val inputRow = InternalRow.fromSeq(Seq( + false, 3.toByte, 15.toShort, -83, 129L, 1.0f, 8.0, UTF8String.fromString("test"), + Decimal(255), CalendarInterval.fromString("interval 1 day"), Array[Byte](1, 2) +)) +val fields1 = Array( + BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, + DoubleType, StringType, DecimalType.defaultConcreteType, CalendarIntervalType, + BinaryType) + +assert(convertBackToInternalRow(inputRow, fields1) === inputRow) + +// Array tests +val arrayRow = InternalRow.fromSeq(Seq( + createArray(1, 2, 3), + createArray( +createArray(Seq("a", "b", "c").map(UTF8String.fromString): _*), +createArray(Seq("d").map(UTF8String.fromString): _*)) +)) +val fields2 = Array[DataType]( + ArrayType(IntegerType), + ArrayType(ArrayType(StringType))) + +assert(convertBackToInternalRow(arrayRow, fields2) === arrayRow) + +// Struct tests +val structRow = InternalRow.fromSeq(Seq( + InternalRow.fromSeq(Seq[Any](1, 4.0)), + InternalRow.fromSeq(Seq( +UTF8String.fromString("test"), +InternalRow.fromSeq(Seq( + 1, + createArray(Seq("2", "3").map(UTF8String.fromString): _*) +)) + )) +)) +val fields3 = Array[DataType]( + StructType( +StructField("c0", IntegerType) :: +StructField("c1", DoubleType) :: +Nil), + StructType( +StructField("c2", StringType) :: +StructField("c3", StructType( + StructField("c4", IntegerType) :: + StructField("c5", ArrayType(StringType)) :: + Nil)) :: +Nil)) + +assert(convertBackToInternalRow(structRow, fields3) === structRow) + +// Map tests +val mapRow = InternalRow.fromSeq(Seq( + createMap(Seq("k1", "k2").map(UTF8String.fromString): _*)(1, 2), + createMap( +createMap(3, 5)(Seq("v1", "v2").map(UTF8String.fromString): _*), +createMap(7, 9)(Seq("v3", "v4").map(UTF8String.fromString): _*) + )( +createMap(Seq("k3", "k4").map(UTF8String.fromString): _*)(3.toShort, 4.toShort), +createMap(Seq("k5", "k6").map(UTF8String.fromString): _*)(5.toShort, 6.toShort) + ))) +val fields4 = Array[DataType]( + MapType(StringType, IntegerType), + MapType(MapType(IntegerType, StringType), MapType(StringType, ShortType))) + +val mapResultRow = convertBackToInternalRow(mapRow, fields4).toSeq(fields4) +val mapExpectedRow = mapRow.toSeq(fields4) +// Since `ArrayBasedMapData` does not override `equals` and `hashCode`, --- End diff -- fixed code to use `ExpressionEvalHelper.checkResult`. I don't remember correctly though, we might have some historical reasons about that; `ArrayBasedMapData` has no `hashCode` and `equals`. Probably, somebody might know this... cc: @hvanhovell @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238530264 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala --- @@ -535,4 +535,100 @@ class UnsafeRowConverterSuite extends SparkFunSuite with Matchers with PlanTestB assert(unsafeRow.getSizeInBytes == 8 + 8 * 2 + roundedSize(field1.getSizeInBytes) + roundedSize(field2.getSizeInBytes)) } + + testBothCodegenAndInterpreted("SPARK-25374 converts back into safe representation") { +def convertBackToInternalRow(inputRow: InternalRow, fields: Array[DataType]): InternalRow = { + val unsafeProj = UnsafeProjection.create(fields) + val unsafeRow = unsafeProj(inputRow) + val safeProj = SafeProjection.create(fields) + safeProj(unsafeRow) +} + +// Simple tests +val inputRow = InternalRow.fromSeq(Seq( + false, 3.toByte, 15.toShort, -83, 129L, 1.0f, 8.0, UTF8String.fromString("test"), + Decimal(255), CalendarInterval.fromString("interval 1 day"), Array[Byte](1, 2) +)) +val fields1 = Array( + BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, + DoubleType, StringType, DecimalType.defaultConcreteType, CalendarIntervalType, + BinaryType) + +assert(convertBackToInternalRow(inputRow, fields1) === inputRow) + +// Array tests +val arrayRow = InternalRow.fromSeq(Seq( + createArray(1, 2, 3), + createArray( +createArray(Seq("a", "b", "c").map(UTF8String.fromString): _*), +createArray(Seq("d").map(UTF8String.fromString): _*)) +)) +val fields2 = Array[DataType]( + ArrayType(IntegerType), + ArrayType(ArrayType(StringType))) + +assert(convertBackToInternalRow(arrayRow, fields2) === arrayRow) + +// Struct tests +val structRow = InternalRow.fromSeq(Seq( + InternalRow.fromSeq(Seq[Any](1, 4.0)), + InternalRow.fromSeq(Seq( +UTF8String.fromString("test"), +InternalRow.fromSeq(Seq( + 1, + createArray(Seq("2", "3").map(UTF8String.fromString): _*) +)) + )) +)) +val fields3 = Array[DataType]( + StructType( +StructField("c0", IntegerType) :: +StructField("c1", DoubleType) :: +Nil), + StructType( +StructField("c2", StringType) :: +StructField("c3", StructType( + StructField("c4", IntegerType) :: + StructField("c5", ArrayType(StringType)) :: + Nil)) :: +Nil)) + +assert(convertBackToInternalRow(structRow, fields3) === structRow) + +// Map tests +val mapRow = InternalRow.fromSeq(Seq( + createMap(Seq("k1", "k2").map(UTF8String.fromString): _*)(1, 2), + createMap( +createMap(3, 5)(Seq("v1", "v2").map(UTF8String.fromString): _*), +createMap(7, 9)(Seq("v3", "v4").map(UTF8String.fromString): _*) + )( +createMap(Seq("k3", "k4").map(UTF8String.fromString): _*)(3.toShort, 4.toShort), +createMap(Seq("k5", "k6").map(UTF8String.fromString): _*)(5.toShort, 6.toShort) + ))) +val fields4 = Array[DataType]( + MapType(StringType, IntegerType), + MapType(MapType(IntegerType, StringType), MapType(StringType, ShortType))) + +// Since `ArrayBasedMapData` does not override `equals` and `hashCode`, +// we need to take care of it to compare rows. +def toComparable(d: Any): Any = d match { --- End diff -- Since we cannot compare `ArrayBasedMapData`s directly (that is, `assert(mapResultRow === mapExpectedRow)` fails), I just converted them into the `Seq`s of keys/values by this method. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/23213 [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE_CODEGEN_ENABLED=false ## What changes were proposed in this pull request? For better test coverage, this pr set `false` at `WHOLESTAGE_CODEGEN_ENABLED` for interpreter execution tests when running `SQLQueryTestSuite`. This pr moved the existing tests into `SQLQuerySuite` because explain output results are different between codegen-only and interpreter modes. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark InterpreterModeTest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23213.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23213 commit 2ced0cab0c16ee7a2400035a5a7794033eae3ed9 Author: Takeshi Yamamuro Date: 2018-12-04T04:17:48Z Fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23194 LGTM except for minor comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23194: [MINOR][SQL] Combine the same codes in test cases
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23194#discussion_r238526892 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -377,41 +377,41 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } - test("CTAS a managed table with the existing empty directory") { -val tableLoc = new File(spark.sessionState.catalog.defaultTablePath(TableIdentifier("tab1"))) + protected def withEmptyDirInTablePath(dirName: String)(f: File => Unit): Unit = { --- End diff -- private? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23212: [SPARK-25498][SQL][FOLLOW-UP] Return an empty con...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/23212 [SPARK-25498][SQL][FOLLOW-UP] Return an empty config set when regenerating the golden files ## What changes were proposed in this pull request? This pr is to return an empty config set when regenerating the golden files in `SQLQueryTestSuite`. This is the follow-up of #22512. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-25498-FOLLOWUP Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23212.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23212 commit ca09bb82d49d29a5d6f088d2250e980bab64a8b7 Author: Takeshi Yamamuro Date: 2018-12-04T00:39:46Z Fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23212: [SPARK-25498][SQL][FOLLOW-UP] Return an empty config set...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23212 cc: @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20433 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238490121 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -157,4 +157,22 @@ object InternalRow { getValueNullSafe } } + + /** + * Returns a writer for an `InternalRow` with given data type. + */ + def getWriter(ordinal: Int, dt: DataType): (InternalRow, Any) => Unit = dt match { --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r238489997 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -148,12 +156,25 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { }) // When we are regenerating the golden files we don't need to run all the configs as they // all need to return the same result - if (regenerateGoldenFiles && configs.nonEmpty) { -configs.take(1) + if (regenerateGoldenFiles) { +if (configs.nonEmpty) { + configs.take(1) +} else { + Array.empty[Array[(String, String)]] --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23194: [MINOR][SQL] Combine the same codes in test cases
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23194#discussion_r238256775 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -439,31 +440,22 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { }.getMessage assert(ex.contains(exMsgWithDefaultDB)) } - } finally { -waitForTasksToFinish() -Utils.deleteRecursively(tableLoc) } } } test("rename a managed table with existing empty directory") { -val tableLoc = new File(spark.sessionState.catalog.defaultTablePath(TableIdentifier("tab2"))) -try { +withTableLocation("tab2") { tableLoc => withTable("tab1") { sql(s"CREATE TABLE tab1 USING $dataSource AS SELECT 1, 'a'") -tableLoc.mkdir() val ex = intercept[AnalysisException] { sql("ALTER TABLE tab1 RENAME TO tab2") }.getMessage val expectedMsg = "Can not rename the managed table('`tab1`'). The associated location" assert(ex.contains(expectedMsg)) } -} finally { - waitForTasksToFinish() - Utils.deleteRecursively(tableLoc) } } - --- End diff -- nit: revert this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23194: [MINOR][SQL] Combine the same codes in test cases
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23194#discussion_r238256611 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -377,41 +377,42 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } - test("CTAS a managed table with the existing empty directory") { -val tableLoc = new File(spark.sessionState.catalog.defaultTablePath(TableIdentifier("tab1"))) -try { + protected def withTableLocation(tableNames: String)(f: File => Unit): Unit = { +val tableLoc = + new File(spark.sessionState.catalog.defaultTablePath(TableIdentifier(tableNames))) +try --- End diff -- nit `try {` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23194: [MINOR][SQL] Combine the same codes in test cases
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23194#discussion_r238256563 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -377,41 +377,42 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } - test("CTAS a managed table with the existing empty directory") { -val tableLoc = new File(spark.sessionState.catalog.defaultTablePath(TableIdentifier("tab1"))) -try { + protected def withTableLocation(tableNames: String)(f: File => Unit): Unit = { --- End diff -- How about `withEmptyDirInTablePath(dirName: String)`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22318 In the example @viirya described above (https://github.com/apache/spark/pull/22318#issuecomment-426317617), I think the interpretation is unclear to most users and I'm fairly concerned that it could be error-prone... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23094: [SPARK-26077][SQL] Reserved SQL words are not esc...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23094#discussion_r238239581 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -87,11 +87,32 @@ abstract class JdbcDialect extends Serializable { def getJDBCType(dt: DataType): Option[JdbcType] = None /** - * Quotes the identifier. This is used to put quotes around the identifier in case the column - * name is a reserved keyword, or in case it contains characters that require quotes (e.g. space). + * Gets the character used for identifier quoting. + */ + def getIdentifierQuoteCharacter: String = """"""" --- End diff -- btw, I feel we need more general name handling here like `UnresolvedAttribute. parseAttributeName` https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala#L151 cc: @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23094: [SPARK-26077][SQL] Reserved SQL words are not esc...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23094#discussion_r238239195 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -87,11 +87,32 @@ abstract class JdbcDialect extends Serializable { def getJDBCType(dt: DataType): Option[JdbcType] = None /** - * Quotes the identifier. This is used to put quotes around the identifier in case the column - * name is a reserved keyword, or in case it contains characters that require quotes (e.g. space). + * Gets the character used for identifier quoting. + */ + def getIdentifierQuoteCharacter: String = """"""" --- End diff -- I like a simpler API design; how about splitting an identifier into the two parts (db and table names) outside `JdbcDialect`? Then, how about applying `quoteIdentifer` into each name part? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23094: [SPARK-26077][SQL] Reserved SQL words are not esc...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23094#discussion_r238232050 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -87,11 +87,32 @@ abstract class JdbcDialect extends Serializable { def getJDBCType(dt: DataType): Option[JdbcType] = None /** - * Quotes the identifier. This is used to put quotes around the identifier in case the column - * name is a reserved keyword, or in case it contains characters that require quotes (e.g. space). + * Gets the character used for identifier quoting. + */ + def getIdentifierQuoteCharacter: String = """"""" --- End diff -- We need this new API instead of `quoteIdentifier`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23094: [SPARK-26077][SQL] Reserved SQL words are not escaped by...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23094 Also, can you add tests in MySQLIntegrationSuite, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23094: [SPARK-26077][SQL] Reserved SQL words are not escaped by...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23094 @golovan thanks for your work. Can you make the title complete (... -> for table names)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23094: [SPARK-26077][SQL] Reserved SQL words are not escaped by...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23094 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22468 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22204: [SPARK-25196][SQL] Extends Analyze commands for cached t...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22204 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22198 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20433 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r238177033 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -148,12 +156,21 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { }) // When we are regenerating the golden files we don't need to run all the configs as they // all need to return the same result - if (regenerateGoldenFiles && configs.nonEmpty) { + if (regenerateGoldenFiles) { configs.take(1) --- End diff -- For better readability, fixed in https://github.com/apache/spark/pull/22512/commits/4cdc5040feb3da1e4cf9efcf434138d5873fae04 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r238176184 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala --- @@ -64,7 +85,7 @@ class InterpretedMutableProjection(expressions: Seq[Expression]) extends Mutable i = 0 while (i < validExprs.length) { val (_, ordinal) = validExprs(i) - mutableRow(ordinal) = buffer(ordinal) + fieldWriters(i)(buffer(ordinal)) --- End diff -- fixed in https://github.com/apache/spark/pull/22512/commits/95411c8b8b76503dff93756482642083a694b0b7 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r238175630 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -148,12 +156,21 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { }) // When we are regenerating the golden files we don't need to run all the configs as they // all need to return the same result - if (regenerateGoldenFiles && configs.nonEmpty) { + if (regenerateGoldenFiles) { configs.take(1) --- End diff -- Actually, it returns an empty array? ``` scala> Array.empty.take(1) res0: Array[Nothing] = Array() scala> Seq.empty.take(1) res1: Seq[Nothing] = List() ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22204: [SPARK-25196][SQL] Extends Analyze commands for cached t...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22204 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22198 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r238152087 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -335,6 +335,12 @@ object SQLConf { .booleanConf .createWithDefault(true) + val ANSI_SQL_PARSER = +buildConf("spark.sql.parser.ansi.enabled") + .doc("When true, tries to conform to ANSI SQL syntax.") + .booleanConf + .createWithDefault(false) --- End diff -- updated --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20433 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r238151565 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala --- @@ -64,7 +85,7 @@ class InterpretedMutableProjection(expressions: Seq[Expression]) extends Mutable i = 0 while (i < validExprs.length) { val (_, ordinal) = validExprs(i) - mutableRow(ordinal) = buffer(ordinal) + fieldWriters(i)(buffer(ordinal)) --- End diff -- ah, sounds reasonable. I'll update later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22999: [SPARK-20319][SQL] Already quoted identifiers are gettin...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22999 This is only the Oracle dialect issue? How about other dialects? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22318 @peter-toth we cannot fix the issue of the description without changing [the existing behaviour](https://github.com/apache/spark/pull/22318#issuecomment-426317617)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23032: [WIP][SPARK-26061][SQL][MINOR] Reduce the number of unus...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23032 We cannot fix the root cause in `GenerateUnsafeProjection.createCode()`? I think, since we don't check if all the mutable variables are used in gen'd code, the similar issue can easily happen in other places... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23199: [SPARK-26245][SQL] Add Float literal
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23199#discussion_r238130900 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -1045,6 +1046,11 @@ DOUBLE_LITERAL | DECIMAL_DIGITS EXPONENT? 'D' {isValidDecimal()}? ; +FLOAT_LITERAL +: DIGIT+ EXPONENT? 'F' +| DECIMAL_DIGITS EXPONENT? 'F' {isValidDecimal()}? --- End diff -- This is implementation-specific; ``` hive> create temporary table t1 as select 1.0F; hive> describe t1; OK f decimal(1,0) hive> create temporary table t2 as select 1.0D; hive> describe t2; OK _c0 double ``` ``` postgres=# create temporary table t1 as select 1.0F; postgres=# \d t1 f | numeric | postgres=# create temporary table t2 as select 1.0D; postgres=# \d t2 d | numeric | ``` ``` mysql> create temporary table t1 as select 1.0F; mysql> describe t1; +---+--+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +---+--+--+-+-+---+ | F | decimal(2,1) | NO | | 0.0 | NULL | +---+--+--+-+-+---+ mysql> create temporary table t2 as select 1.0D; mysql> describe t2; +---+--+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +---+--+--+-+-+---+ | D | decimal(2,1) | NO | | 0.0 | NULL | +---+--+--+-+-+---+ ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22512 Yea, I'll update in a few days. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22512 I'm looking into the failure reason... (passed in the local, but failed in the jenkins...) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r227626505 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MutableProjectionSuite.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.CalendarInterval + +class MutableProjectionSuite extends SparkFunSuite with ExpressionEvalHelper { + + private def createMutableProjection(dataTypes: Array[DataType]): MutableProjection = { +MutableProjection.create(dataTypes.zipWithIndex.map(x => BoundReference(x._2, x._1, true))) + } + + testBothCodegenAndInterpreted("fixed-length types") { +val fixedLengthTypes = Array[DataType]( + BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, DoubleType, + DateType, TimestampType) +val proj = createMutableProjection(fixedLengthTypes) +val inputRow = InternalRow.fromSeq( + Seq(false, 3.toByte, 15.toShort, -83, 129L, 1.0f, 5.0, 100, 200L)) +assert(proj(inputRow) === inputRow) + +// Use UnsafeRow as buffer +val numBytes = UnsafeRow.calculateBitSetWidthInBytes(fixedLengthTypes.length) +val unsafeBuffer = UnsafeRow.createFromByteArray(numBytes, fixedLengthTypes.length) +val projUnsafeRow = proj.target(unsafeBuffer)(inputRow) +assert(FromUnsafeProjection(fixedLengthTypes)(projUnsafeRow) === inputRow) + } + + testBothCodegenAndInterpreted("variable-length types") { +val variableLengthTypes = Array( + StringType, DecimalType.defaultConcreteType, CalendarIntervalType, BinaryType, + ArrayType(StringType), MapType(IntegerType, StringType), + StructType.fromDDL("a INT, b STRING"), ObjectType(classOf[java.lang.Integer])) +val proj = createMutableProjection(variableLengthTypes) --- End diff -- sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r227626456 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -143,4 +144,25 @@ object InternalRow { case u: UserDefinedType[_] => getAccessor(u.sqlType) case _ => (input, ordinal) => input.get(ordinal, dataType) } + + /** + * Returns a writer for an `InternalRow` with given data type. + */ + def getWriter(ordinal: Int, dt: DataType): (InternalRow, Any) => Unit = dt match { +case BooleanType => (input, v) => input.setBoolean(ordinal, v.asInstanceOf[Boolean]) +case ByteType => (input, v) => input.setByte(ordinal, v.asInstanceOf[Byte]) +case ShortType => (input, v) => input.setShort(ordinal, v.asInstanceOf[Short]) +case IntegerType | DateType => (input, v) => input.setInt(ordinal, v.asInstanceOf[Int]) +case LongType | TimestampType => (input, v) => input.setLong(ordinal, v.asInstanceOf[Long]) +case FloatType => (input, v) => input.setFloat(ordinal, v.asInstanceOf[Float]) +case DoubleType => (input, v) => input.setDouble(ordinal, v.asInstanceOf[Double]) +case DecimalType.Fixed(precision, _) => + (input, v) => input.setDecimal(ordinal, v.asInstanceOf[Decimal], precision) +case CalendarIntervalType | BinaryType | _: ArrayType | StringType | _: StructType | + _: MapType | _: ObjectType => + (input, v) => input.update(ordinal, v) +case udt: UserDefinedType[_] => getWriter(ordinal, udt.sqlType) +case NullType => (input, _) => input.setNullAt(ordinal) +case _ => throw new SparkException(s"Unsupported data type $dt") --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21860 LGTM, pending Jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21860 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21860 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22755: [SPARK-25755][SQL][Test] Supplementation of non-CodeGen ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22755 It looks good to improve the test coverage. But, it seems to be hard to wrap all the test case with `withSQLConf`. So, how about adding a helper function for turning off/on codegen (`WHOLESTAGE_CODEGEN_ENABLED` and `CODEGEN_FACTORY_MODE`) like `testBothCodegenAndInterpreted`? @cloud-fan @gatorsmile https://github.com/apache/spark/blob/584e767d372d41071c3436f9ad4bf77a820f12b4/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala#L38 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17174: [SPARK-19145][SQL] Timestamp to String casting is slowin...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17174 @dongjoon-hyun Thanks for your checks!! I still wait for other developer's feedbacks. If the approach is positive, I'll make a pr. Also, welcome another idea to solve this. cc: @gatorsmile @hvanhovell @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r227609747 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -143,4 +144,24 @@ object InternalRow { case u: UserDefinedType[_] => getAccessor(u.sqlType) case _ => (input, ordinal) => input.get(ordinal, dataType) } + + /** + * Returns a writer for an `InternalRow` with given data type. + */ + def getWriter(ordinal: Int, dt: DataType): (InternalRow, Any) => Unit = dt match { +case BooleanType => (input, v) => input.setBoolean(ordinal, v.asInstanceOf[Boolean]) +case ByteType => (input, v) => input.setByte(ordinal, v.asInstanceOf[Byte]) +case ShortType => (input, v) => input.setShort(ordinal, v.asInstanceOf[Short]) +case IntegerType | DateType => (input, v) => input.setInt(ordinal, v.asInstanceOf[Int]) +case LongType | TimestampType => (input, v) => input.setLong(ordinal, v.asInstanceOf[Long]) +case FloatType => (input, v) => input.setFloat(ordinal, v.asInstanceOf[Float]) +case DoubleType => (input, v) => input.setDouble(ordinal, v.asInstanceOf[Double]) +case DecimalType.Fixed(precision, _) => + (input, v) => input.setDecimal(ordinal, v.asInstanceOf[Decimal], precision) +case CalendarIntervalType | BinaryType | _: ArrayType | StringType | _: StructType | + _: MapType | _: ObjectType | _: UserDefinedType[_] => + (input, v) => input.update(ordinal, v) +case NullType => (input, v) => {} --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r227609600 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -143,4 +144,24 @@ object InternalRow { case u: UserDefinedType[_] => getAccessor(u.sqlType) case _ => (input, ordinal) => input.get(ordinal, dataType) } + + /** + * Returns a writer for an `InternalRow` with given data type. + */ + def getWriter(ordinal: Int, dt: DataType): (InternalRow, Any) => Unit = dt match { +case BooleanType => (input, v) => input.setBoolean(ordinal, v.asInstanceOf[Boolean]) +case ByteType => (input, v) => input.setByte(ordinal, v.asInstanceOf[Byte]) +case ShortType => (input, v) => input.setShort(ordinal, v.asInstanceOf[Short]) +case IntegerType | DateType => (input, v) => input.setInt(ordinal, v.asInstanceOf[Int]) +case LongType | TimestampType => (input, v) => input.setLong(ordinal, v.asInstanceOf[Long]) +case FloatType => (input, v) => input.setFloat(ordinal, v.asInstanceOf[Float]) +case DoubleType => (input, v) => input.setDouble(ordinal, v.asInstanceOf[Double]) +case DecimalType.Fixed(precision, _) => + (input, v) => input.setDecimal(ordinal, v.asInstanceOf[Decimal], precision) +case CalendarIntervalType | BinaryType | _: ArrayType | StringType | _: StructType | + _: MapType | _: ObjectType | _: UserDefinedType[_] => --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r227226902 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -140,6 +141,14 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { val input = fileToString(new File(testCase.inputFile)) val (comments, code) = input.split("\n").partition(_.startsWith("--")) + +// Runs all the tests on both codegen-only and interpreter modes. Since explain results differ +// when `WHOLESTAGE_CODEGEN_ENABLED` disabled, we don't run these tests now. +val codegenConfigSets = Array(("false", "NO_CODEGEN"), ("true", "CODEGEN_ONLY")).map { + case (wholeStageCodegenEnabled, codegenFactoryMode) => +Array( // SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> wholeStageCodegenEnabled, --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22512 ok, I'll add tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r227224209 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala --- @@ -49,10 +51,54 @@ class InterpretedMutableProjection(expressions: Seq[Expression]) extends Mutable def currentValue: InternalRow = mutableRow override def target(row: InternalRow): MutableProjection = { +// If `mutableRow` is `UnsafeRow`, `MutableProjection` accepts fixed-length types only +assert(!row.isInstanceOf[UnsafeRow] || + validExprs.forall { case (e, _) => UnsafeRow.isFixedLength(e.dataType) }) mutableRow = row this } + private[this] val fieldWriters = validExprs.map { case (e, i) => +val writer = generateRowWriter(i, e.dataType) +if (!e.nullable) { + (v: Any) => writer(v) +} else { + (v: Any) => { +if (v == null) { + mutableRow.setNullAt(i) +} else { + writer(v) +} + } +} + } + + private def generateRowWriter(ordinal: Int, dt: DataType): Any => Unit = dt match { --- End diff -- oh, yes! yea, I will. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r227224030 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala --- @@ -49,10 +51,54 @@ class InterpretedMutableProjection(expressions: Seq[Expression]) extends Mutable def currentValue: InternalRow = mutableRow override def target(row: InternalRow): MutableProjection = { +// If `mutableRow` is `UnsafeRow`, `MutableProjection` accepts fixed-length types only +assert(!row.isInstanceOf[UnsafeRow] || + validExprs.forall { case (e, _) => UnsafeRow.isFixedLength(e.dataType) }) mutableRow = row this } + private[this] val fieldWriters = validExprs.map { case (e, i) => +val writer = generateRowWriter(i, e.dataType) +if (!e.nullable) { + (v: Any) => writer(v) +} else { + (v: Any) => { +if (v == null) { + mutableRow.setNullAt(i) +} else { + writer(v) +} + } +} + } + + private def generateRowWriter(ordinal: Int, dt: DataType): Any => Unit = dt match { +case BooleanType => + v => mutableRow.setBoolean(ordinal, v.asInstanceOf[Boolean]) +case ByteType => + v => mutableRow.setByte(ordinal, v.asInstanceOf[Byte]) +case ShortType => + v => mutableRow.setShort(ordinal, v.asInstanceOf[Short]) +case IntegerType | DateType => + v => mutableRow.setInt(ordinal, v.asInstanceOf[Int]) +case LongType | TimestampType => + v => mutableRow.setLong(ordinal, v.asInstanceOf[Long]) +case FloatType => + v => mutableRow.setFloat(ordinal, v.asInstanceOf[Float]) +case DoubleType => + v => mutableRow.setDouble(ordinal, v.asInstanceOf[Double]) +case DecimalType.Fixed(precision, _) => + v => mutableRow.setDecimal(ordinal, v.asInstanceOf[Decimal], precision) +case CalendarIntervalType | BinaryType | _: ArrayType | StringType | _: StructType | + _: MapType | _: UserDefinedType[_] => + v => mutableRow.update(ordinal, v) +case NullType => + v => {} --- End diff -- We need to take care of `e.nullable && e.dataType == NullType` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22755: [SPARK-25755][SQL][Test] Supplementation of non-C...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22755#discussion_r227215773 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/ExistenceJoinSuite.scala --- @@ -122,19 +122,22 @@ class ExistenceJoinSuite extends SparkPlanTest with SharedSQLContext { test(s"$testName using BroadcastHashJoin") { extractJoinParts().foreach { case (_, leftKeys, rightKeys, boundCondition, _, _) => -withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "1") { - checkAnswer2(leftRows, rightRows, (left: SparkPlan, right: SparkPlan) => -EnsureRequirements(left.sqlContext.sessionState.conf).apply( - BroadcastHashJoinExec( -leftKeys, rightKeys, joinType, BuildRight, boundCondition, left, right)), -expectedAnswer, -sortAnswers = true) - checkAnswer2(leftRows, rightRows, (left: SparkPlan, right: SparkPlan) => -EnsureRequirements(left.sqlContext.sessionState.conf).apply( - createLeftSemiPlusJoin(BroadcastHashJoinExec( -leftKeys, rightKeys, leftSemiPlus, BuildRight, boundCondition, left, right))), -expectedAnswer, -sortAnswers = true) +Seq("false", "true").foreach { v => --- End diff -- nit: v -> codegenEnabled --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22755: [SPARK-25755][SQL][Test] Supplementation of non-CodeGen ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22755 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22778#discussion_r227215135 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala --- @@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest { Batch("Rewrite Subquery", FixedPoint(1), RewritePredicateSubquery, ColumnPruning, +InferFiltersFromConstraints, +PushDownPredicate, CollapseProject, RemoveRedundantProject) :: Nil } test("Column pruning after rewriting predicate subquery") { -val relation = LocalRelation('a.int, 'b.int) -val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int) +withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") { + val relation = LocalRelation('a.int, 'b.int) + val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int) -val query = relation.where('a.in(ListQuery(relInSubquery.select('x.select('a) + val query = relation.where('a.in(ListQuery(relInSubquery.select('x.select('a) -val optimized = Optimize.execute(query.analyze) -val correctAnswer = relation - .select('a) - .join(relInSubquery.select('x), LeftSemi, Some('a === 'x)) - .analyze + val optimized = Optimize.execute(query.analyze) + val correctAnswer = relation +.select('a) +.join(relInSubquery.select('x), LeftSemi, Some('a === 'x)) +.analyze -comparePlans(optimized, correctAnswer) + comparePlans(optimized, correctAnswer) +} + } + + test("Infer filters and push down predicate after rewriting predicate subquery") { --- End diff -- How about making the test title simple, then leaving comments about what's tested clearly here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22778#discussion_r227214593 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala --- @@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest { Batch("Rewrite Subquery", FixedPoint(1), RewritePredicateSubquery, ColumnPruning, +InferFiltersFromConstraints, +PushDownPredicate, CollapseProject, RemoveRedundantProject) :: Nil } test("Column pruning after rewriting predicate subquery") { -val relation = LocalRelation('a.int, 'b.int) -val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int) +withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") { + val relation = LocalRelation('a.int, 'b.int) + val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int) -val query = relation.where('a.in(ListQuery(relInSubquery.select('x.select('a) + val query = relation.where('a.in(ListQuery(relInSubquery.select('x.select('a) -val optimized = Optimize.execute(query.analyze) -val correctAnswer = relation - .select('a) - .join(relInSubquery.select('x), LeftSemi, Some('a === 'x)) - .analyze + val optimized = Optimize.execute(query.analyze) + val correctAnswer = relation +.select('a) +.join(relInSubquery.select('x), LeftSemi, Some('a === 'x)) +.analyze -comparePlans(optimized, correctAnswer) + comparePlans(optimized, correctAnswer) +} + } + + test("Infer filters and push down predicate after rewriting predicate subquery") { --- End diff -- Need the column pruning in the test title? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22778#discussion_r227214404 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala --- @@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest { Batch("Rewrite Subquery", FixedPoint(1), RewritePredicateSubquery, ColumnPruning, +InferFiltersFromConstraints, +PushDownPredicate, CollapseProject, RemoveRedundantProject) :: Nil } test("Column pruning after rewriting predicate subquery") { -val relation = LocalRelation('a.int, 'b.int) -val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int) +withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") { --- End diff -- Ah, I see. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22778#discussion_r227205054 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala --- @@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest { Batch("Rewrite Subquery", FixedPoint(1), RewritePredicateSubquery, ColumnPruning, +InferFiltersFromConstraints, +PushDownPredicate, CollapseProject, RemoveRedundantProject) :: Nil } test("Column pruning after rewriting predicate subquery") { -val relation = LocalRelation('a.int, 'b.int) -val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int) +withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") { --- End diff -- We need to modify this existing test? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22778 Also, to make sure no performance regression in the optimizer, can you check optimizer statistics in TPCDS by running `TPCDSQuerySuite`, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22778 Can you put the concrete example of the missing case you described in the PR description? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22712: [SPARK-25724] Add sorting functionality in MapTyp...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22712#discussion_r227201626 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ordering.scala --- @@ -53,6 +53,10 @@ class InterpretedOrdering(ordering: Seq[SortOrder]) extends Ordering[InternalRow a.interpretedOrdering.asInstanceOf[Ordering[Any]].compare(left, right) case a: ArrayType if order.direction == Descending => a.interpretedOrdering.asInstanceOf[Ordering[Any]].reverse.compare(left, right) + case m: MapType if m.isOrdered && order.direction == Ascending => + m.interpretedOrdering.asInstanceOf[Ordering[Any]].compare(left, right) + case m: MapType if m.isOrdered && order.direction == Descending => + m.interpretedOrdering.asInstanceOf[Ordering[Any]].reverse.compare(left, right) --- End diff -- You mean we can't remove this? If not necessary, better to remove it off. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22219: [SPARK-25224][SQL] Improvement of Spark SQL ThriftServer...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22219 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20433 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org