[GitHub] [spark] itholic commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

GitBox Tue, 06 Apr 2021 22:23:05 -0700


itholic commented on a change in pull request #32053:
URL: https://github.com/apache/spark/pull/32053#discussion_r608347743




##########
File path: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
##########
@@ -389,6 +392,67 @@ private static void runCsvDatasetExample(SparkSession 
spark) {
     // $example off:csv_dataset$
   }
 
+  private static void runTextDatasetExample(SparkSession spark) {
+    // $example on:text_dataset$
+    // A text dataset is pointed to by path.
+    // The path can be either a single text file or a directory of text files
+    String path = "examples/src/main/resources/people.text";
+
+    Dataset<Row> df1 = spark.read().text(path);
+    df1.show();
+    // +-----------+
+    // |      value|
+    // +-----------+
+    // |Michael, 29|
+    // |   Andy, 30|
+    // | Justin, 19|
+    // +-----------+
+
+    // You can use 'lineSep' option to define the line separator.
+    // If None is set, it covers all `\r`, `\r\n` and `\n` (default).
+    Dataset<Row> df2 = spark.read().option("lineSep", ",").text(path);
+    df2.show();
+    // +-----------+
+    // |      value|
+    // +-----------+
+    // |    Michael|
+    // |   29\nAndy|
+    // | 30\nJustin|
+    // |       19\n|
+    // +-----------+
+
+    // You can also use 'wholetext' option to read each input file as a single 
row.
+    Dataset<Row> df3 = spark.read().option("wholetext", "true").text(path);
+    df3.show();
+    //  +--------------------+
+    //  |               value|
+    //  +--------------------+
+    //  |Michael, 29\nAndy...|
+    //  +--------------------+
+
+    // "output" is a folder which contains multiple text files and a _SUCCESS 
file.
+    df1.write().text("output");
+
+    // You can specify the compression format using the 'compression' option.
+    df1.write().option("compression", "gzip").text("output_compressed");
+
+    // Read all files in a folder.
+    String folderPath = "examples/src/main/resources";
+    Dataset<Row> df = spark.read().text(folderPath);
+    df.show();
+    // +-----------+
+    // |      value|
+    // +-----------+
+    // |238val_238|

Review comment:
       Thanks!
   Just removed this from the examples block, and rather add the more comments 
to the main contents block.
   (because we already have the case for read one proper text file above)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] itholic commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

Reply via email to