You just want to generate some data in Spark or ingest a large dataset outside 
of Spark? What’s the ultimate goal you’re pursuing?

jg


> On Dec 13, 2018, at 21:38, lk_spark <[email protected]> wrote:
> 
> hi,all:
>     I want't to generate some test data , which contained about one hundred 
> million rows .
>     I create a dataset have ten rows ,and I do df.union operation in 'for' 
> circulation , but this will case the operation only happen on driver node.
>     how can I do it on the whole cluster.
>  
> 2018-12-14
> lk_spark

Reply via email to