zero323 commented on pull request #35296: URL: https://github.com/apache/spark/pull/35296#issuecomment-1021692805
> Yes, about testing the only test I find for IO in pandas is for testing csv. I was thinking about making a test where I use shape function to test a file before and after writing and write. One shouldn't really depend on schema inference and reader behavior to test writer for formats which, like JSON lines, provide no schema and / or are less expressive than Spark SQL types. In general case, irrespective of options, the following ```python spark: SparkSession df: DataFrame path: str df.write.json(path) assert spark.read.json(path).schema == df.schema ``` is not, and cannot be, guaranteed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
