[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/20331 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20331#discussion_r173071130 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonHadoopFsRelationSuite.scala --- @@ -110,14 +113,16 @@ class JsonHadoopFsRelationSuite extends HadoopFsRelationTest { test("invalid json with leading nulls - from file (multiLine=true)") { import testImplicits._ -withTempDir { tempDir => - val path = tempDir.getAbsolutePath - Seq(badJson, """{"a":1}""").toDS().write.mode("overwrite").text(path) - val expected = s"""$badJson\n{"a":1}\n""" - val schema = new StructType().add("a", IntegerType).add("_corrupt_record", StringType) - val df = -spark.read.format(dataSourceName).option("multiLine", true).schema(schema).load(path) - checkAnswer(df, Row(null, expected)) +withSQLConf(SQLConf.MAX_RECORDS_PER_FILE.key -> "2") { --- End diff -- I think the default value won't be less than 2, we don't need to be so careful... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20331#discussion_r163369167 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonHadoopFsRelationSuite.scala --- @@ -110,14 +113,16 @@ class JsonHadoopFsRelationSuite extends HadoopFsRelationTest { test("invalid json with leading nulls - from file (multiLine=true)") { import testImplicits._ -withTempDir { tempDir => - val path = tempDir.getAbsolutePath - Seq(badJson, """{"a":1}""").toDS().write.mode("overwrite").text(path) - val expected = s"""$badJson\n{"a":1}\n""" - val schema = new StructType().add("a", IntegerType).add("_corrupt_record", StringType) - val df = -spark.read.format(dataSourceName).option("multiLine", true).schema(schema).load(path) - checkAnswer(df, Row(null, expected)) +withSQLConf(SQLConf.MAX_RECORDS_PER_FILE.key -> "2") { --- End diff -- The test will fail if `SQLConf.MAX_RECORDS_PER_FILE.key` is less than 2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20331#discussion_r163354261 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonHadoopFsRelationSuite.scala --- @@ -110,14 +113,16 @@ class JsonHadoopFsRelationSuite extends HadoopFsRelationTest { test("invalid json with leading nulls - from file (multiLine=true)") { import testImplicits._ -withTempDir { tempDir => - val path = tempDir.getAbsolutePath - Seq(badJson, """{"a":1}""").toDS().write.mode("overwrite").text(path) - val expected = s"""$badJson\n{"a":1}\n""" - val schema = new StructType().add("a", IntegerType).add("_corrupt_record", StringType) - val df = -spark.read.format(dataSourceName).option("multiLine", true).schema(schema).load(path) - checkAnswer(df, Row(null, expected)) +withSQLConf(SQLConf.MAX_RECORDS_PER_FILE.key -> "2") { --- End diff -- Just curious, why this change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20331#discussion_r162838622 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcHadoopFsRelationSuite.scala --- @@ -82,44 +80,4 @@ class OrcHadoopFsRelationSuite extends HadoopFsRelationTest { } } } - - test("SPARK-13543: Support for specifying compression codec for ORC via option()") { -withTempPath { dir => - val path = s"${dir.getCanonicalPath}/table1" - val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b") - df.write -.option("compression", "ZlIb") -.orc(path) - - // Check if this is compressed as ZLIB. - val maybeOrcFile = new File(path).listFiles().find { f => -!f.getName.startsWith("_") && f.getName.endsWith(".zlib.orc") - } - assert(maybeOrcFile.isDefined) - val orcFilePath = maybeOrcFile.get.toPath.toString - val expectedCompressionKind = -OrcFileOperator.getFileReader(orcFilePath).get.getCompression - assert("ZLIB" === expectedCompressionKind.name()) - - val copyDf = spark -.read -.orc(path) - checkAnswer(df, copyDf) -} - } - - test("Default compression codec is snappy for ORC compression") { -withTempPath { file => - spark.range(0, 10).write -.orc(file.getCanonicalPath) - val expectedCompressionKind = - OrcFileOperator.getFileReader(file.getCanonicalPath).get.getCompression --- End diff -- @gatorsmile . This test case should be tested on `native` implementation, too. `HiveOrcHadoopFsRelationSuite` test coverage is only `hive` implementation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20331#discussion_r162838329 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcHadoopFsRelationSuite.scala --- @@ -82,44 +80,4 @@ class OrcHadoopFsRelationSuite extends HadoopFsRelationTest { } } } - - test("SPARK-13543: Support for specifying compression codec for ORC via option()") { -withTempPath { dir => - val path = s"${dir.getCanonicalPath}/table1" - val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b") - df.write -.option("compression", "ZlIb") -.orc(path) - - // Check if this is compressed as ZLIB. - val maybeOrcFile = new File(path).listFiles().find { f => -!f.getName.startsWith("_") && f.getName.endsWith(".zlib.orc") - } - assert(maybeOrcFile.isDefined) - val orcFilePath = maybeOrcFile.get.toPath.toString - val expectedCompressionKind = -OrcFileOperator.getFileReader(orcFilePath).get.getCompression - assert("ZLIB" === expectedCompressionKind.name()) - - val copyDf = spark -.read -.orc(path) - checkAnswer(df, copyDf) -} - } - - test("Default compression codec is snappy for ORC compression") { -withTempPath { file => - spark.range(0, 10).write -.orc(file.getCanonicalPath) - val expectedCompressionKind = - OrcFileOperator.getFileReader(file.getCanonicalPath).get.getCompression - assert("SNAPPY" === expectedCompressionKind.name()) -} - } -} - -class HiveOrcHadoopFsRelationSuite extends OrcHadoopFsRelationSuite { --- End diff -- Thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20331#discussion_r162683746 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcHadoopFsRelationSuite.scala --- @@ -82,44 +80,4 @@ class OrcHadoopFsRelationSuite extends HadoopFsRelationTest { } } } - - test("SPARK-13543: Support for specifying compression codec for ORC via option()") { -withTempPath { dir => - val path = s"${dir.getCanonicalPath}/table1" - val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b") - df.write -.option("compression", "ZlIb") -.orc(path) - - // Check if this is compressed as ZLIB. - val maybeOrcFile = new File(path).listFiles().find { f => -!f.getName.startsWith("_") && f.getName.endsWith(".zlib.orc") - } - assert(maybeOrcFile.isDefined) - val orcFilePath = maybeOrcFile.get.toPath.toString - val expectedCompressionKind = -OrcFileOperator.getFileReader(orcFilePath).get.getCompression --- End diff -- The same here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20331#discussion_r162683705 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcHadoopFsRelationSuite.scala --- @@ -82,44 +80,4 @@ class OrcHadoopFsRelationSuite extends HadoopFsRelationTest { } } } - - test("SPARK-13543: Support for specifying compression codec for ORC via option()") { -withTempPath { dir => - val path = s"${dir.getCanonicalPath}/table1" - val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b") - df.write -.option("compression", "ZlIb") -.orc(path) - - // Check if this is compressed as ZLIB. - val maybeOrcFile = new File(path).listFiles().find { f => -!f.getName.startsWith("_") && f.getName.endsWith(".zlib.orc") - } - assert(maybeOrcFile.isDefined) - val orcFilePath = maybeOrcFile.get.toPath.toString - val expectedCompressionKind = -OrcFileOperator.getFileReader(orcFilePath).get.getCompression - assert("ZLIB" === expectedCompressionKind.name()) - - val copyDf = spark -.read -.orc(path) - checkAnswer(df, copyDf) -} - } - - test("Default compression codec is snappy for ORC compression") { -withTempPath { file => - spark.range(0, 10).write -.orc(file.getCanonicalPath) - val expectedCompressionKind = - OrcFileOperator.getFileReader(file.getCanonicalPath).get.getCompression --- End diff -- `OrcFileOperator` is defined in `sql\hive`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20331#discussion_r162683627 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcHadoopFsRelationSuite.scala --- @@ -82,44 +80,4 @@ class OrcHadoopFsRelationSuite extends HadoopFsRelationTest { } } } - - test("SPARK-13543: Support for specifying compression codec for ORC via option()") { -withTempPath { dir => - val path = s"${dir.getCanonicalPath}/table1" - val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b") - df.write -.option("compression", "ZlIb") -.orc(path) - - // Check if this is compressed as ZLIB. - val maybeOrcFile = new File(path).listFiles().find { f => -!f.getName.startsWith("_") && f.getName.endsWith(".zlib.orc") - } - assert(maybeOrcFile.isDefined) - val orcFilePath = maybeOrcFile.get.toPath.toString - val expectedCompressionKind = -OrcFileOperator.getFileReader(orcFilePath).get.getCompression - assert("ZLIB" === expectedCompressionKind.name()) - - val copyDf = spark -.read -.orc(path) - checkAnswer(df, copyDf) -} - } - - test("Default compression codec is snappy for ORC compression") { -withTempPath { file => - spark.range(0, 10).write -.orc(file.getCanonicalPath) - val expectedCompressionKind = - OrcFileOperator.getFileReader(file.getCanonicalPath).get.getCompression - assert("SNAPPY" === expectedCompressionKind.name()) -} - } -} - -class HiveOrcHadoopFsRelationSuite extends OrcHadoopFsRelationSuite { --- End diff -- This is Hive only. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/20331 [SPARK-23158] [SQL] Move HadoopFsRelationTest test suites to from sql/hive to sql/core ## What changes were proposed in this pull request? The test suites that extend HadoopFsRelationTest are not in sql/hive packages, but their directories are in sql/hive. We should move them to sql/core. ## How was this patch tested? The existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark moveTests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20331.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20331 commit f7693f0abfe0923868c1918ddcaeaece2c107c5d Author: gatorsmile Date: 2018-01-19T16:57:50Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org