[GitHub] [spark] pralabhkumar commented on a diff in pull request #37009: [SPARK-38292][PYTHON]Support na_filter for pyspark.pandas.read_csv

GitBox Wed, 06 Jul 2022 20:33:05 -0700


pralabhkumar commented on code in PR #37009:
URL: https://github.com/apache/spark/pull/37009#discussion_r915427993



##########
python/pyspark/pandas/namespace.py:
##########
@@ -285,6 +286,9 @@ def read_csv(
         Indicates the encoding to read file
     options : dict
         All other options passed directly into Spark's data source.
+    na_filter : bool
+        If na_filter is false missing values will remain as is otherwise it
+        will be converted to None. By default it will True

Review Comment:
   Have done the changes . However , in case of Spark IMO , there wouldn't be 
any performance improvement , since it is using Univocity parsers which handle 
the null  when reading data.  Therefore I have not added performance related 
things in the documenataion  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] pralabhkumar commented on a diff in pull request #37009: [SPARK-38292][PYTHON]Support na_filter for pyspark.pandas.read_csv

Reply via email to