MaxGekk commented on a change in pull request #27302: [SPARK-30506][SQL][DOC] 
Document for generic file source options/configs
URL: https://github.com/apache/spark/pull/27302#discussion_r368845030
 
 

 ##########
 File path: 
examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala
 ##########
 @@ -40,6 +41,54 @@ object SQLDataSourceExample {
     spark.stop()
   }
 
+  private def runGenericFileSourceOptionsExample(spark: SparkSession): Unit = {
+    // $example on:ignore_corrupt_files$
+    // enable ignore corrupt files
+    spark.sql("set spark.sql.files.ignoreCorruptFiles=true")
+    // dir1/file3.json is corrupt from parquet's view
+    val testCorruptDF = spark.read.parquet(
+      "examples/src/main/resources/dir1/",
+      "examples/src/main/resources/dir1/dir2/")
+    testCorruptDF.show()
+    // +-------------+
+    // |         file|
+    // +-------------+
+    // |file1.parquet|
+    // |file2.parquet|
+    // +-------------+
+    // $example off:ignore_corrupt_files$
+    // $example on:ignore_missing_files$
+    // enable ignore missing files
+    spark.sql("set spark.sql.files.ignoreMissingFiles=true")
+    val testMissingDF = 
spark.read.parquet("examples/src/main/resources/dir1/dir2/")
+    testMissingDF.show()
+    // +-------------+
+    // |         file|
+    // +-------------+
+    // |file2.parquet|
+    // +-------------+
+    // $example off:ignore_missing_files$
+    spark.sql("set spark.sql.files.ignoreMissingFiles=false")
+    // $example on:load_with_path_glob_filter$
+    val partitionedUsersDF = spark.read.format("orc")
 
 Review comment:
   The `Avro` format can be much useful for users because
   1. we point out the option in Avro docs, see 
https://github.com/apache/spark/pull/27194/files#diff-a083509bb52aa58580f177d4ed67ebd4R233
   2. Users face to Avro files without extensions more often in the wild

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to