zhengruifeng opened a new pull request, #53880:
URL: https://github.com/apache/spark/pull/53880

   ### What changes were proposed in this pull request?
   Refactor tests for python udf return type coercion
   
   
   ### Why are the changes needed?
   0, respect `SPARK_GENERATE_GOLDEN_FILES` which was already used in multiple 
places, to make the regeneration easily `SPARK_GENERATE_GOLDEN_FILES=1 
python/run-tests -k --python-executables python3 --testnames 
'pyspark.sql.tests.coercion.test_python_udf_return_type'`;
   1, move to a new directory, since we might add more non-udf coercion cases 
(e.g. createDataFrame/toPandas/toArrow/etc) in the future;
   2, use `pandas` to read/write `csv` files as the golden files, existing raw 
test file processing is too complex and seems unnecessary;
   3, with pandas dataframe, we can programatically change the expected results 
in different testing envs, e.g the support of `np.float128`;
   4, also generate markdown files for review, with `pandas` we can easily 
switch the golden file format;
   5, additionly timezone setup to make result deterministic, the original 
tests actually depends on timezone but it is not set and output different 
results in different timezone;
   
   
   ### Does this PR introduce _any_ user-facing change?
   no, test-only
   
   
   ### How was this patch tested?
   ci
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to