zero323 commented on pull request #35296:
URL: https://github.com/apache/spark/pull/35296#issuecomment-1021692805


   > Yes, about testing the only test I find for IO in pandas is for testing 
csv. I was thinking about making a test where I use shape function to test a 
file before and after writing and write.
   
   One shouldn't really depend on schema inference and reader behavior to test 
writer for formats which, like JSON lines, provide no schema and / or are less 
expressive than Spark SQL types. In general case, irrespective of options, the 
following
   
   ```python
   spark: SparkSession
   df: DataFrame
   path: str
   
   df.write.json(path)
   assert spark.read.json(path).schema == df.schema
   ```
   
   is not, and cannot be, guaranteed.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to