itholic commented on a change in pull request #32053:
URL: https://github.com/apache/spark/pull/32053#discussion_r607572400
##########
File path:
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
##########
@@ -389,6 +392,67 @@ private static void runCsvDatasetExample(SparkSession
spark) {
// $example off:csv_dataset$
}
+ private static void runTextDatasetExample(SparkSession spark) {
+ // $example on:text_dataset$
+ // A text dataset is pointed to by path.
+ // The path can be either a single text file or a directory of text files
+ String path = "examples/src/main/resources/people.text";
+
+ Dataset<Row> df1 = spark.read().text(path);
+ df1.show();
+ // +-----------+
+ // | value|
+ // +-----------+
+ // |Michael, 29|
+ // | Andy, 30|
+ // | Justin, 19|
+ // +-----------+
+
+ // You can use 'lineSep' option to define the line separator.
+ // If None is set, it covers all `\r`, `\r\n` and `\n` (default).
+ Dataset<Row> df2 = spark.read().option("lineSep", ",").text(path);
+ df2.show();
+ // +-----------+
+ // | value|
+ // +-----------+
+ // | Michael|
+ // | 29\nAndy|
+ // | 30\nJustin|
+ // | 19\n|
+ // +-----------+
+
+ // You can also use 'wholetext' option to read each input file as a single
row.
+ Dataset<Row> df3 = spark.read().option("wholetext", "true").text(path);
+ df3.show();
+ // +--------------------+
+ // | value|
+ // +--------------------+
+ // |Michael, 29\nAndy...|
+ // +--------------------+
+
+ // "output" is a folder which contains multiple text files and a _SUCCESS
file.
+ df1.write().text("output");
+
+ // You can specify the compression format using the 'compression' option.
+ df1.write().option("compression", "gzip").text("output_compressed");
+
+ // Read all files in a folder.
+ String folderPath = "examples/src/main/resources";
+ Dataset<Row> df = spark.read().text(folderPath);
+ df.show();
+ // +-----------+
+ // | value|
+ // +-----------+
+ // |238val_238|
Review comment:
Yeah, the output is really look so as below. Should we align the output
anyway ??

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]