[GitHub] [spark] cloud-fan commented on a change in pull request #27415: [SPARK-30506][SQL][DOC] Document for generic file source options/configs

GitBox Fri, 31 Jan 2020 00:51:18 -0800

cloud-fan commented on a change in pull request #27415: [SPARK-30506][SQL][DOC] 
Document for generic file source options/configs
URL: https://github.com/apache/spark/pull/27415#discussion_r373370720


 ##########
 File path: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
 ##########
 @@ -106,6 +107,43 @@ public static void main(String[] args) {
     spark.stop();
   }
 
+  private static void runGenericFileSourceOptionsExample(SparkSession spark) {
+    // $example on:ignore_corrupt_files$
+    // enable ignore corrupt files
+    spark.sql("set spark.sql.files.ignoreCorruptFiles=true");
+    // dir1/file3.json is corrupt from parquet's view
+    Dataset<Row> testCorruptDF = spark.read().parquet(
+            "examples/src/main/resources/dir1/",
+            "examples/src/main/resources/dir1/dir2/");
+    testCorruptDF.show();
+    // +-------------+
+    // |         file|
+    // +-------------+
+    // |file1.parquet|
+    // |file2.parquet|
+    // +-------------+
+    // $example off:ignore_corrupt_files$
+    spark.sql("set spark.sql.files.ignoreMissingFiles=false");
+    // $example on:load_with_path_glob_filter$
+    Dataset<Row> partitionedUsersDF = spark.read().format("orc")
+            .option("pathGlobFilter", "*.orc")
+            .load("examples/src/main/resources/partitioned_users.orc");
 
 Review comment:
   shall we show the result like the previous example? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #27415: [SPARK-30506][SQL][DOC] Document for generic file source options/configs

Reply via email to