[GitHub] [spark] maropu edited a comment on pull request #32243: [WIP][SPARK-35192][SQL][TESTS] Port minimal TPC-DS datagen code from databricks/spark-sql-perf

GitBox Fri, 30 Apr 2021 06:16:58 -0700


maropu edited a comment on pull request #32243:
URL: https://github.com/apache/spark/pull/32243#issuecomment-824827326



   ~Ur..., I noticed that generated data are different between the GA 
env(Ubuntu) and my env(MacOS) by following[ the same 
workflow](https://github.com/apache/spark/pull/32243/files#diff-48c0ee97c53013d18d6bbae44648f7fab9af2e0bf5b0dc1ca761e18ec5c478f2R524-R535).
 Probably, the generator behaviour seems to depend on the implementation of 
random functions. I'm currently not sure that we can generate the same data 
between different linux distro, so I need more work (e.g., adding a script to 
generate data on docker env) for making it easy for developers to generate 
data/golden files....~
   
   My bad. I misunderstood it and I just used different seeds `RNGSEED` when 
generating data. I've checked that the current code can generate the same 
TPC-DS data between different env (macos, Ubuntu, Amazon Linux 2, ...).
   
   > @maropu would you mind porting the TPC-DS spec and query update to 
spark-sql-perf when you find some time?
   
   Of course not. Probably, I have time to work on it tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu edited a comment on pull request #32243: [WIP][SPARK-35192][SQL][TESTS] Port minimal TPC-DS datagen code from databricks/spark-sql-perf

Reply via email to