maropu edited a comment on pull request #32243: URL: https://github.com/apache/spark/pull/32243#issuecomment-824827326
~Ur..., I noticed that generated data are different between the GA env(Ubuntu) and my env(MacOS) by following[ the same workflow](https://github.com/apache/spark/pull/32243/files#diff-48c0ee97c53013d18d6bbae44648f7fab9af2e0bf5b0dc1ca761e18ec5c478f2R524-R535). Probably, the generator behaviour seems to depend on the implementation of random functions. I'm currently not sure that we can generate the same data between different linux distro, so I need more work (e.g., adding a script to generate data on docker env) for making it easy for developers to generate data/golden files....~ My bad. I misunderstood it and I just used different seeds `RNGSEED` when generating data. I've checked that the current code can generate the same TPC-DS data between different env (macos, Ubuntu, Amazon Linux 2, ...). > @maropu would you mind porting the TPC-DS spec and query update to spark-sql-perf when you find some time? Of course not. Probably, I have time to work on it tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
