Yicong-Huang commented on PR #54010: URL: https://github.com/apache/spark/pull/54010#issuecomment-3808562097
> @Yicong-Huang @fangchenli I am thinking whether we should use golden files instead. If we use golden files, we don't need to touch the test files too much, just need to regenereate or compare a new file with new conditions (`safe=False` in this PR) hmm I am really not a fan of golden file, especially for this kind of type related tests. To compare on a text based format, all python objects needs to be serialized in a way, (e.g., `str(object)` or `repr(object)`) which could cause edge cases missing. For example, a decimal128 and a decimal256 may have different digits init but serialized to be the same string (I am making it up as an example). We will need to make sure we don't introduce potential problems like this. This particular test for `pa.Array.cast` maybe sensitive to an extra serialization. Maybe we can exercise golden file practice in future/other tests? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
