[GitHub] [spark] itholic commented on a diff in pull request #37009: [SPARK-38292][PYTHON]na_filter added to csv

GitBox Tue, 05 Jul 2022 00:07:55 -0700


itholic commented on code in PR #37009:
URL: https://github.com/apache/spark/pull/37009#discussion_r913436566



##########
python/pyspark/pandas/namespace.py:
##########
@@ -285,6 +286,9 @@ def read_csv(
         Indicates the encoding to read file
     options : dict
         All other options passed directly into Spark's data source.
+    na_filter : bool

Review Comment:
   Maybe `na_filter : bool, default True` ??



##########
python/pyspark/sql/readwriter.py:
##########
@@ -550,6 +552,7 @@ def func(iterator):
             # There aren't any jvm api for creating a dataframe from rdd 
storing csv.
             # We can do it through creating a jvm dataset firstly and using 
the jvm api
             # for creating a dataframe from dataset storing csv.
+

Review Comment:
   Let's revert the unrelated change ?



##########
python/pyspark/pandas/namespace.py:
##########
@@ -285,6 +286,9 @@ def read_csv(
         Indicates the encoding to read file
     options : dict
         All other options passed directly into Spark's data source.
+    na_filter : bool
+        If na_filter is false missing values will remain as is otherwise it
+        will be converted to None. By default it will True

Review Comment:
   Let's follow the description from pandas as below:
   
   ![Screen Shot 2022-07-05 at 3 43 02 
PM](https://user-images.githubusercontent.com/44108233/177265854-56e8d26f-cb5e-48c9-ac21-9fd91f3ec7ba.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] itholic commented on a diff in pull request #37009: [SPARK-38292][PYTHON]na_filter added to csv

Reply via email to