[GitHub] spark pull request #20479: [SPARK-23305][SQL][TEST] Add `spark.sql.files.ign...

2018-02-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20479#discussion_r165579611
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
@@ -92,4 +96,37 @@ class FileBasedDataSourceSuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  allFileBasedDataSources.foreach { format =>
--- End diff --

Thank you so much, @HyukjinKwon !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20479: [SPARK-23305][SQL][TEST] Add `spark.sql.files.ign...

2018-02-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20479#discussion_r165578302
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
@@ -92,4 +96,39 @@ class FileBasedDataSourceSuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  // Only ORC/Parquet support this.
--- End diff --

Ur, let me check back.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20479: [SPARK-23305][SQL][TEST] Add `spark.sql.files.ign...

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20479#discussion_r165577600
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
@@ -92,4 +96,39 @@ class FileBasedDataSourceSuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  // Only ORC/Parquet support this.
--- End diff --

Wait .. don't other sources support this option?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20479: [SPARK-23305][SQL][TEST] Add `spark.sql.files.ign...

2018-02-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20479#discussion_r165576607
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
@@ -92,4 +96,39 @@ class FileBasedDataSourceSuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  // Only ORC/Parquet support this.
+  Seq("orc", "parquet").foreach { format =>
--- End diff --

Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20479: [SPARK-23305][SQL][TEST] Add `spark.sql.files.ign...

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20479#discussion_r165575853
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
@@ -92,4 +96,39 @@ class FileBasedDataSourceSuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  // Only ORC/Parquet support this.
+  Seq("orc", "parquet").foreach { format =>
--- End diff --

Ah, yeah. This sounds better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20479: [SPARK-23305][SQL][TEST] Add `spark.sql.files.ign...

2018-02-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20479#discussion_r165559636
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
 ---
@@ -655,4 +655,35 @@ class OrcQuerySuite extends OrcQueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  testQuietly("Enabling/disabling ignoreMissingFiles") {
--- End diff --

+1. Which suite is proper for that base test class?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20479: [SPARK-23305][SQL][TEST] Add `spark.sql.files.ign...

2018-02-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20479#discussion_r165557435
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
 ---
@@ -655,4 +655,35 @@ class OrcQuerySuite extends OrcQueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  testQuietly("Enabling/disabling ignoreMissingFiles") {
--- End diff --

Basically, this is copied from Parquet. To avoid duplicate codes, create a 
common base test class for parquet and orc? Then, we can deduplicate the codes? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20479: [SPARK-23305][SQL][TEST] Add `spark.sql.files.ign...

2018-02-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20479#discussion_r165537214
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
 ---
@@ -655,4 +655,35 @@ class OrcQuerySuite extends OrcQueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  testQuietly("Enabling/disabling ignoreMissingFiles") {
+def testIgnoreMissingFiles(): Unit = {
+  withTempDir { dir =>
+val basePath = dir.getCanonicalPath
+spark.range(1).toDF("a").write.orc(new Path(basePath, 
"first").toString)
+spark.range(1, 2).toDF("a").write.orc(new Path(basePath, 
"second").toString)
+val thirdPath = new Path(basePath, "third")
+spark.range(2, 3).toDF("a").write.orc(thirdPath.toString)
+val df = spark.read.orc(
+  new Path(basePath, "first").toString,
+  new Path(basePath, "second").toString,
+  new Path(basePath, "third").toString)
+
+val fs = 
thirdPath.getFileSystem(spark.sparkContext.hadoopConfiguration)
+fs.delete(thirdPath, true)
--- End diff --

Sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20479: [SPARK-23305][SQL][TEST] Add `spark.sql.files.ign...

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20479#discussion_r165528709
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
 ---
@@ -655,4 +655,35 @@ class OrcQuerySuite extends OrcQueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  testQuietly("Enabling/disabling ignoreMissingFiles") {
+def testIgnoreMissingFiles(): Unit = {
+  withTempDir { dir =>
+val basePath = dir.getCanonicalPath
+spark.range(1).toDF("a").write.orc(new Path(basePath, 
"first").toString)
+spark.range(1, 2).toDF("a").write.orc(new Path(basePath, 
"second").toString)
+val thirdPath = new Path(basePath, "third")
+spark.range(2, 3).toDF("a").write.orc(thirdPath.toString)
+val df = spark.read.orc(
+  new Path(basePath, "first").toString,
+  new Path(basePath, "second").toString,
+  new Path(basePath, "third").toString)
+
+val fs = 
thirdPath.getFileSystem(spark.sparkContext.hadoopConfiguration)
+fs.delete(thirdPath, true)
--- End diff --

Shall we assert `true`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20479: [SPARK-23305][SQL][TEST] Add `spark.sql.files.ign...

2018-02-01 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/20479

[SPARK-23305][SQL][TEST] Add `spark.sql.files.ignoreMissingFiles` test case 
for ORC

## What changes were proposed in this pull request?

Like Parquet, Apache Spark ORC already handles 
`spark.sql.files.ignoreMissingFiles` correctly. We had better have a test 
coverage for feature parity.

## How was this patch tested?

Pass Jenkins with a newly added test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-23305

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20479.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20479


commit 7fdfc35e5fc0f92b4bf2f687606bb48ad6fa68e6
Author: Dongjoon Hyun 
Date:   2018-02-01T18:23:00Z

[SPARK-23305][SQL][TEST] Add `spark.sql.files.ignoreMissingFiles` test case 
for ORC




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org