I have already seen on example where data is generated using spark, no reason 
to think it's a bad idea as far as I know.
You can check the code here, I m not very sure but I think there is something 
there which generates data for the TPCDS benchmark and you can provide how much 
data you want in the tables from 1G upwards.

From: Esa Heikkinen [mailto:esa.heikki...@student.tut.fi]
Sent: Tuesday, June 20, 2017 7:34 PM
To: user@spark.apache.org
Subject: Using Spark as a simulator


Hi



Spark is a data analyzer, but would it be possible to use Spark as a data 
generator or simulator ?



My simulation can be very huge and i think a parallelized simulation using by 
Spark (cloud) could work.

Is that good or bad idea ?



Regards

Esa Heikkinen


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

Reply via email to