[GitHub] [spark] pralabhkumar opened a new pull request, #37009: [SPARK-38292][PYTHON]na_filter added to csv

GitBox Mon, 27 Jun 2022 10:27:03 -0700


pralabhkumar opened a new pull request, #37009:
URL: https://github.com/apache/spark/pull/37009


   ### What changes were proposed in this pull request?
   na filter is added in the read csv option . This is similar to na filter 
option in pandas
   
   data.csv
   A,B,C
   ,val1,val2
   val3
   
   from pyspark import pandas as ps
   import pandas as pd
   ps.read_csv("data.csv")
   
         A     B     C
   0  None  val1  val2
   1  val3  None  None
   
   
   ps.read_csv("data.csv", na_filter=False)
   
         A     B     C
   0        val1  val2
   1  val3            
     
   
   pd.read_csv("/Users/pralkuma/Desktop/rk_read_panda_csv/data.csv", 
na_filter=False)
   
         A     B     C
   0        val1  val2
   1  val3            
   
   pd.read_csv("/Users/pralkuma/Desktop/rk_read_panda_csv/data.csv")
   
         A     B     C
   0   NaN  val1  val2
   1  val3   NaN   NaN
   
   ### Why are the changes needed?
   Added na_filter option
   
   
   ### Does this PR introduce _any_ user-facing change?
   yes
   
   
   ### How was this patch tested?
   Unit test cases 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] pralabhkumar opened a new pull request, #37009: [SPARK-38292][PYTHON]na_filter added to csv

Reply via email to