It is fine, but you have to design it that generated rows are written in large blocks for optimal performance. The most tricky part with data generation is the conceptual part, such as probabilistic distribution etc You have to check as well that you use a good random generator, for some cases the Java internal might be not that well.
> On 20. Jun 2017, at 16:04, Esa Heikkinen <esa.heikki...@student.tut.fi> wrote: > > Hi > > > Spark is a data analyzer, but would it be possible to use Spark as a data > generator or simulator ? > > My simulation can be very huge and i think a parallelized simulation using by > Spark (cloud) could work. > > Is that good or bad idea ? > > Regards > Esa Heikkinen >