[GitHub] [spark] HyukjinKwon commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

GitBox Mon, 05 Apr 2021 23:41:15 -0700


HyukjinKwon commented on a change in pull request #32053:
URL: https://github.com/apache/spark/pull/32053#discussion_r607561917




##########
File path: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
##########
@@ -389,6 +392,67 @@ private static void runCsvDatasetExample(SparkSession 
spark) {
     // $example off:csv_dataset$
   }
 
+  private static void runTextDatasetExample(SparkSession spark) {
+    // $example on:text_dataset$
+    // A text dataset is pointed to by path.
+    // The path can be either a single text file or a directory of text files
+    String path = "examples/src/main/resources/people.text";
+
+    Dataset<Row> df1 = spark.read().text(path);
+    df1.show();
+    // +-----------+
+    // |      value|
+    // +-----------+
+    // |Michael, 29|
+    // |   Andy, 30|
+    // | Justin, 19|
+    // +-----------+
+
+    // You can use 'lineSep' option to define the line separator.
+    // If None is set, it covers all `\r`, `\r\n` and `\n` (default).
+    Dataset<Row> df2 = spark.read().option("lineSep", ",").text(path);
+    df2.show();
+    // +-----------+
+    // |      value|
+    // +-----------+
+    // |    Michael|
+    // |   29\nAndy|
+    // | 30\nJustin|
+    // |       19\n|
+    // +-----------+
+
+    // You can also use 'wholetext' option to read each input file as a single 
row.
+    Dataset<Row> df3 = spark.read().option("wholetext", "true").text(path);
+    df3.show();
+    //  +--------------------+
+    //  |               value|
+    //  +--------------------+
+    //  |Michael, 29\nAndy...|
+    //  +--------------------+
+
+    // "output" is a folder which contains multiple text files and a _SUCCESS 
file.
+    df1.write().text("output");
+
+    // You can specify the compression format using the 'compression' option.
+    df1.write().option("compression", "gzip").text("output_compressed");
+
+    // Read all files in a folder.
+    String folderPath = "examples/src/main/resources";
+    Dataset<Row> df = spark.read().text(folderPath);
+    df.show();
+    // +-----------+
+    // |      value|
+    // +-----------+
+    // |238val_238|

Review comment:
       indentation looks weird here. Was it really printed out like this?

##########
File path: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
##########
@@ -389,6 +392,67 @@ private static void runCsvDatasetExample(SparkSession 
spark) {
     // $example off:csv_dataset$
   }
 
+  private static void runTextDatasetExample(SparkSession spark) {
+    // $example on:text_dataset$
+    // A text dataset is pointed to by path.
+    // The path can be either a single text file or a directory of text files
+    String path = "examples/src/main/resources/people.text";

Review comment:
       `people.text` -> `people.txt`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

Reply via email to