[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20705 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r174637563 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -526,7 +526,7 @@ object SQLConf { val DEFAULT_DATA_SOURCE_NAME = buildConf("spark.sql.sources.default") .doc("The default data source to use in input/output.") .stringConf -.createWithDefault("parquet") +.createWithDefault("orc") --- End diff -- Yep. It's back now, @gatorsmile . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r174019305 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -526,7 +526,7 @@ object SQLConf { val DEFAULT_DATA_SOURCE_NAME = buildConf("spark.sql.sources.default") .doc("The default data source to use in input/output.") .stringConf -.createWithDefault("parquet") +.createWithDefault("orc") --- End diff -- Can you change it back? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r173358166 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala --- @@ -852,52 +846,52 @@ class MetastoreDataSourcesSuite extends QueryTest with SQLTestUtils with TestHiv (from to to).map(i => i -> s"str$i").toDF("c1", "c2") } -withTable("insertParquet") { - createDF(0, 9).write.format("parquet").saveAsTable("insertParquet") +withTable("t") { + createDF(0, 9).write.saveAsTable("t") checkAnswer( -sql("SELECT p.c1, p.c2 FROM insertParquet p WHERE p.c1 > 5"), +sql("SELECT p.c1, p.c2 FROM t p WHERE p.c1 > 5"), (6 to 9).map(i => Row(i, s"str$i"))) intercept[AnalysisException] { -createDF(10, 19).write.format("parquet").saveAsTable("insertParquet") +createDF(10, 19).write.saveAsTable("t") } - createDF(10, 19).write.mode(SaveMode.Append).format("parquet").saveAsTable("insertParquet") + createDF(10, 19).write.mode(SaveMode.Append).saveAsTable("t") checkAnswer( -sql("SELECT p.c1, p.c2 FROM insertParquet p WHERE p.c1 > 5"), +sql("SELECT p.c1, p.c2 FROM t p WHERE p.c1 > 5"), (6 to 19).map(i => Row(i, s"str$i"))) - createDF(20, 29).write.mode(SaveMode.Append).format("parquet").saveAsTable("insertParquet") + createDF(20, 29).write.mode(SaveMode.Append).saveAsTable("t") checkAnswer( -sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 5 AND p.c1 < 25"), +sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 5 AND p.c1 < 25"), (6 to 24).map(i => Row(i, s"str$i"))) intercept[AnalysisException] { -createDF(30, 39).write.saveAsTable("insertParquet") +createDF(30, 39).write.saveAsTable("t") } - createDF(30, 39).write.mode(SaveMode.Append).saveAsTable("insertParquet") + createDF(30, 39).write.mode(SaveMode.Append).saveAsTable("t") checkAnswer( -sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 5 AND p.c1 < 35"), +sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 5 AND p.c1 < 35"), (6 to 34).map(i => Row(i, s"str$i"))) - createDF(40, 49).write.mode(SaveMode.Append).insertInto("insertParquet") + createDF(40, 49).write.mode(SaveMode.Append).insertInto("t") checkAnswer( -sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 5 AND p.c1 < 45"), +sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 5 AND p.c1 < 45"), (6 to 44).map(i => Row(i, s"str$i"))) - createDF(50, 59).write.mode(SaveMode.Overwrite).saveAsTable("insertParquet") + createDF(50, 59).write.mode(SaveMode.Overwrite).saveAsTable("t") checkAnswer( -sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 51 AND p.c1 < 55"), +sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 51 AND p.c1 < 55"), (52 to 54).map(i => Row(i, s"str$i"))) - createDF(60, 69).write.mode(SaveMode.Ignore).saveAsTable("insertParquet") + createDF(60, 69).write.mode(SaveMode.Ignore).saveAsTable("t") checkAnswer( -sql("SELECT p.c1, c2 FROM insertParquet p"), +sql("SELECT p.c1, c2 FROM t p"), (50 to 59).map(i => Row(i, s"str$i"))) - createDF(70, 79).write.mode(SaveMode.Overwrite).insertInto("insertParquet") + createDF(70, 79).write.mode(SaveMode.Overwrite).insertInto("t") checkAnswer( -sql("SELECT p.c1, c2 FROM insertParquet p"), +sql("SELECT p.c1, c2 FROM t p"), (70 to 79).map(i => Row(i, s"str$i"))) --- End diff -- That is because this PR minimally changed only the test case causing failures. We cannot generalize all test cases at an one-shot huge PR for all modules. That will make it difficult to backport the other commits. The main goal of this PR is improving test- ability for new data sources. For example, `SPARK-8156:create table to specific database by 'use dbname'` writes to parquet, but reads with SQL, not by `read.parquet`. So, it doesn't fail. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user bersprockets commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r173331220 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala --- @@ -591,7 +591,7 @@ class MetastoreDataSourcesSuite extends QueryTest with SQLTestUtils with TestHiv } test("Pre insert nullability check (ArrayType)") { -withTable("arrayInParquet") { +withTable("array") { --- End diff -- It would be good, maybe in a future cleanup, to replace all these repeating string literals (e.g, "array" 7 times, "map" 7 times) with a variable name. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user bersprockets commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r173332327 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala --- @@ -852,52 +846,52 @@ class MetastoreDataSourcesSuite extends QueryTest with SQLTestUtils with TestHiv (from to to).map(i => i -> s"str$i").toDF("c1", "c2") } -withTable("insertParquet") { - createDF(0, 9).write.format("parquet").saveAsTable("insertParquet") +withTable("t") { + createDF(0, 9).write.saveAsTable("t") checkAnswer( -sql("SELECT p.c1, p.c2 FROM insertParquet p WHERE p.c1 > 5"), +sql("SELECT p.c1, p.c2 FROM t p WHERE p.c1 > 5"), (6 to 9).map(i => Row(i, s"str$i"))) intercept[AnalysisException] { -createDF(10, 19).write.format("parquet").saveAsTable("insertParquet") +createDF(10, 19).write.saveAsTable("t") } - createDF(10, 19).write.mode(SaveMode.Append).format("parquet").saveAsTable("insertParquet") + createDF(10, 19).write.mode(SaveMode.Append).saveAsTable("t") checkAnswer( -sql("SELECT p.c1, p.c2 FROM insertParquet p WHERE p.c1 > 5"), +sql("SELECT p.c1, p.c2 FROM t p WHERE p.c1 > 5"), (6 to 19).map(i => Row(i, s"str$i"))) - createDF(20, 29).write.mode(SaveMode.Append).format("parquet").saveAsTable("insertParquet") + createDF(20, 29).write.mode(SaveMode.Append).saveAsTable("t") checkAnswer( -sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 5 AND p.c1 < 25"), +sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 5 AND p.c1 < 25"), (6 to 24).map(i => Row(i, s"str$i"))) intercept[AnalysisException] { -createDF(30, 39).write.saveAsTable("insertParquet") +createDF(30, 39).write.saveAsTable("t") } - createDF(30, 39).write.mode(SaveMode.Append).saveAsTable("insertParquet") + createDF(30, 39).write.mode(SaveMode.Append).saveAsTable("t") checkAnswer( -sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 5 AND p.c1 < 35"), +sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 5 AND p.c1 < 35"), (6 to 34).map(i => Row(i, s"str$i"))) - createDF(40, 49).write.mode(SaveMode.Append).insertInto("insertParquet") + createDF(40, 49).write.mode(SaveMode.Append).insertInto("t") checkAnswer( -sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 5 AND p.c1 < 45"), +sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 5 AND p.c1 < 45"), (6 to 44).map(i => Row(i, s"str$i"))) - createDF(50, 59).write.mode(SaveMode.Overwrite).saveAsTable("insertParquet") + createDF(50, 59).write.mode(SaveMode.Overwrite).saveAsTable("t") checkAnswer( -sql("SELECT p.c1, c2 FROM insertParquet p WHERE p.c1 > 51 AND p.c1 < 55"), +sql("SELECT p.c1, c2 FROM t p WHERE p.c1 > 51 AND p.c1 < 55"), (52 to 54).map(i => Row(i, s"str$i"))) - createDF(60, 69).write.mode(SaveMode.Ignore).saveAsTable("insertParquet") + createDF(60, 69).write.mode(SaveMode.Ignore).saveAsTable("t") checkAnswer( -sql("SELECT p.c1, c2 FROM insertParquet p"), +sql("SELECT p.c1, c2 FROM t p"), (50 to 59).map(i => Row(i, s"str$i"))) - createDF(70, 79).write.mode(SaveMode.Overwrite).insertInto("insertParquet") + createDF(70, 79).write.mode(SaveMode.Overwrite).insertInto("t") checkAnswer( -sql("SELECT p.c1, c2 FROM insertParquet p"), +sql("SELECT p.c1, c2 FROM t p"), (70 to 79).map(i => Row(i, s"str$i"))) --- End diff -- Curious about why the test named "SPARK-8156:create table to specific database by 'use dbname'" still has a hard-coded format of parquet. Is it testing functionality that is orthogonal to the format maybe? I changed the hard-coded format to json, orc, and csv, and each time that test passed. Similarly with Suite: org.apache.spark.sql.sources.SaveLoadSuite Test: SPARK-23459: Improve error message when specified unknown column in partition columns --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r173281238 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2476,7 +2477,7 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { withTempDir { dir => val parquetDir = new File(dir, "parquet").getCanonicalPath spark.range(10).withColumn("_col", $"id").write.partitionBy("_col").save(parquetDir) --- End diff -- Thank you for review, @bersprockets . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user bersprockets commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r173275865 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2476,7 +2477,7 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { withTempDir { dir => val parquetDir = new File(dir, "parquet").getCanonicalPath spark.range(10).withColumn("_col", $"id").write.partitionBy("_col").save(parquetDir) --- End diff -- Since the data format may not be parquet, maybe the directory name should be more generic, like dataDir. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r172024213 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2150,7 +2150,8 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { test("data source table created in InMemoryCatalog should be able to read/write") { withTable("tbl") { - sql("CREATE TABLE tbl(i INT, j STRING) USING parquet") + val provider = spark.sessionState.conf.defaultDataSourceName --- End diff -- So far, the purpose of this PR is **setting once in `SQLConf.scala`** to order to test a new data source to find out the limitation instead of **touching every data suite**. BTW, `spark.sql.sources.default=parquet` doesn't help this existing code because the SQL has a fixed string `USING parquet`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r172024043 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2150,7 +2150,8 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { test("data source table created in InMemoryCatalog should be able to read/write") { withTable("tbl") { - sql("CREATE TABLE tbl(i INT, j STRING) USING parquet") + val provider = spark.sessionState.conf.defaultDataSourceName --- End diff -- This is `SQLQuerySuite`. The test case is correctly testing its **purpose**. Every data source have its own capability and limitation. Your example is only `text` data source's limitation supporting `a single column schema`, isn't it? For the other `csv/json/orc/parquet` will pass this specific test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r172023909 --- Diff: python/pyspark/sql/readwriter.py --- @@ -147,6 +147,7 @@ def load(self, path=None, format=None, schema=None, **options): or a DDL-formatted string (For example ``col0 INT, col1 DOUBLE``). :param options: all other string options +>>> spark.conf.set("spark.sql.sources.default", "parquet") --- End diff -- Yep. That was my first commit [here](https://github.com/apache/spark/pull/20705#discussion-diff-171729865R150). I'll rollback this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r172022980 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2150,7 +2150,8 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { test("data source table created in InMemoryCatalog should be able to read/write") { withTable("tbl") { - sql("CREATE TABLE tbl(i INT, j STRING) USING parquet") + val provider = spark.sessionState.conf.defaultDataSourceName --- End diff -- Hm .. how about just explicitly setting `spark.sql.sources.default` to `parquet` in all places rather than using the default? If it's set to, for example, `text`, this test becomes failed. I thought it's a bit odd a test it is dependent on a default value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r172022384 --- Diff: python/pyspark/sql/readwriter.py --- @@ -147,6 +147,7 @@ def load(self, path=None, format=None, schema=None, **options): or a DDL-formatted string (For example ``col0 INT, col1 DOUBLE``). :param options: all other string options +>>> spark.conf.set("spark.sql.sources.default", "parquet") --- End diff -- Can we just call `format('parquet')` like the doctest for JSON below? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r172007674 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala --- @@ -57,6 +57,16 @@ class ParquetPartitionDiscoverySuite extends QueryTest with ParquetTest with Sha val timeZone = TimeZone.getDefault() val timeZoneId = timeZone.getID + protected override def beforeAll(): Unit = { +super.beforeAll() +spark.conf.set(SQLConf.DEFAULT_DATA_SOURCE_NAME.key, "parquet") --- End diff -- Since this is `ParquetPartitionDiscoverySuite`, the test cases' assumption is legitimate. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r172007654 --- Diff: python/pyspark/sql/readwriter.py --- @@ -147,6 +147,7 @@ def load(self, path=None, format=None, schema=None, **options): or a DDL-formatted string (For example ``col0 INT, col1 DOUBLE``). :param options: all other string options +>>> spark.conf.set("spark.sql.sources.default", "parquet") >>> df = spark.read.load('python/test_support/sql/parquet_partitioned', opt1=True, --- End diff -- The built-in test data is `parquet`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r171729865 --- Diff: python/pyspark/sql/readwriter.py --- @@ -147,8 +147,8 @@ def load(self, path=None, format=None, schema=None, **options): or a DDL-formatted string (For example ``col0 INT, col1 DOUBLE``). :param options: all other string options ->>> df = spark.read.load('python/test_support/sql/parquet_partitioned', opt1=True, -... opt2=1, opt3='str') +>>> df = spark.read.format("parquet").load('python/test_support/sql/parquet_partitioned', +... opt1=True, opt2=1, opt3='str') --- End diff -- Unlike the other things, there is some difference from the original semantics. As another approach, we can add the following with keeping the original `spark.read.load`. ```python spark.conf.set("spark.sql.sources.default", "parquet") ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r171668693 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -516,24 +516,19 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { test("CTAS with default fileformat") { val table = "ctas1" val ctas = s"CREATE TABLE IF NOT EXISTS $table SELECT key k, value FROM src" -withSQLConf(SQLConf.CONVERT_CTAS.key -> "true") { - withSQLConf("hive.default.fileformat" -> "textfile") { +Seq("orc", "parquet").foreach { dataSourceFormat => + withSQLConf( +SQLConf.CONVERT_CTAS.key -> "true", +SQLConf.DEFAULT_DATA_SOURCE_NAME.key -> dataSourceFormat, +"hive.default.fileformat" -> "textfile") { withTable(table) { sql(ctas) - // We should use parquet here as that is the default datasource fileformat. The default - // datasource file format is controlled by `spark.sql.sources.default` configuration. + // The default datasource file format is controlled by `spark.sql.sources.default`. // This testcase verifies that setting `hive.default.fileformat` has no impact on // the target table's fileformat in case of CTAS. - assert(sessionState.conf.defaultDataSourceName === "parquet") - checkRelation(tableName = table, isDataSourceTable = true, format = "parquet") + checkRelation(tableName = table, isDataSourceTable = true, format = dataSourceFormat) } } --- End diff -- Previously, `spark.sql.source.default=orc` with `hive.default.fileformat=textfile` is not tested properly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r171668051 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala --- @@ -739,15 +739,15 @@ class ParquetPartitionDiscoverySuite extends QueryTest with ParquetTest with Sha withTempPath { dir => df.write.format("parquet").partitionBy(partitionColumns.map(_.name): _*).save(dir.toString) val fields = schema.map(f => Column(f.name).cast(f.dataType)) - checkAnswer(spark.read.load(dir.toString).select(fields: _*), row) + checkAnswer(spark.read.parquet(dir.toString).select(fields: _*), row) --- End diff -- Since this is `ParquetPartitionDiscoverySuite`, `parquet` is more proper than `load`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r171657406 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -526,7 +526,7 @@ object SQLConf { val DEFAULT_DATA_SOURCE_NAME = buildConf("spark.sql.sources.default") .doc("The default data source to use in input/output.") .stringConf -.createWithDefault("parquet") +.createWithDefault("orc") --- End diff -- This is a testing purpose during review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/20705 [SPARK-23553][TESTS] Tests should not assume the default value of `spark.sql.sources.default` ## What changes were proposed in this pull request? Currently, some tests have an assumption that `spark.sql.sources.default=parquet`. In fact, that is a correct assumption, but that assumption makes it difficult to test new data source format. This PR improves test suites more robust and makes it easy to test new data sources. As an example, the PR uses `spark.sql.sources.default=orc` during reviews. This PR also aims to test new native ORC data source with the full existing Apache Spark test coverage. ## How was this patch tested? Pass the Jenkins with updated tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-23553 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20705.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20705 commit 427b6f01a34938487f162ddf0a38d9217bfb4ece Author: Dongjoon Hyun Date: 2018-01-11T05:01:21Z [SPARK-23553][TESTS] Tests should not assume the default value of `spark.sql.sources.default` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org