[GitHub] [spark] twoentartian commented on a change in pull request #31827: [SPARK-34492][DOCS] Add "CSV Files" page for Data Source documents.

GitBox Thu, 18 Mar 2021 11:59:35 -0700


twoentartian commented on a change in pull request #31827:
URL: https://github.com/apache/spark/pull/31827#discussion_r597158695




##########
File path: examples/src/main/python/sql/datasource.py
##########
@@ -234,6 +234,52 @@ def json_dataset_example(spark):
     # $example off:json_dataset$
 
 
+def csv_dataset_example(spark):
+    # $example on:csv_dataset$
+    # spark is from the previous example
+    sc = spark.sparkContext
+
+    path = "examples/src/main/resources/people.csv"
+
+    df = spark.read.csv(path)
+    df.show()
+    # +------------------+
+    # |               _c0|
+    # +------------------+
+    # |      name;age;job|
+    # |Jorge;30;Developer|
+    # |  Bob;32;Developer|
+    # +------------------+
+
+    # Read a csv with delimiter, the default delimiter is ","
+    df2 = spark.read.option(delimiter=';').csv(path)
+    df2.show()
+    # +-----+---+---------+
+    # |  _c0|_c1|      _c2|
+    # +-----+---+---------+
+    # | name|age|      job|
+    # |Jorge| 30|Developer|
+    # |  Bob| 32|Developer|
+    # +-----+---+---------+
+
+    # Read a csv with delimiter and a header
+    df3 = spark.read.option("delimiter", ";").option("header", True).csv(path)
+    df3.show()
+    # +-----+---+---------+
+    # | name|age|      job|
+    # +-----+---+---------+
+    # |Jorge| 30|Developer|
+    # |  Bob| 32|Developer|
+    # +-----+---+---------+
+
+    # You can also use options() to use multiple options
+    df4 = spark.read.options(delimiter=";", header=True).csv(path)
+
+    df3.write.csv("output")
+    # "output" is a folder which contains multiple csv files and a _SUCCESS 
file.

Review comment:
       Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] twoentartian commented on a change in pull request #31827: [SPARK-34492][DOCS] Add "CSV Files" page for Data Source documents.

Reply via email to