zhengruifeng opened a new pull request, #53880: URL: https://github.com/apache/spark/pull/53880
### What changes were proposed in this pull request? Refactor tests for python udf return type coercion ### Why are the changes needed? 0, respect `SPARK_GENERATE_GOLDEN_FILES` which was already used in multiple places, to make the regeneration easily `SPARK_GENERATE_GOLDEN_FILES=1 python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.tests.coercion.test_python_udf_return_type'`; 1, move to a new directory, since we might add more non-udf coercion cases (e.g. createDataFrame/toPandas/toArrow/etc) in the future; 2, use `pandas` to read/write `csv` files as the golden files, existing raw test file processing is too complex and seems unnecessary; 3, with pandas dataframe, we can programatically change the expected results in different testing envs, e.g the support of `np.float128`; 4, also generate markdown files for review, with `pandas` we can easily switch the golden file format; 5, additionly timezone setup to make result deterministic, the original tests actually depends on timezone but it is not set and output different results in different timezone; ### Does this PR introduce _any_ user-facing change? no, test-only ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
