You just want to generate some data in Spark or ingest a large dataset outside of Spark? What’s the ultimate goal you’re pursuing?
jg > On Dec 13, 2018, at 21:38, lk_spark <[email protected]> wrote: > > hi,all: > I want't to generate some test data , which contained about one hundred > million rows . > I create a dataset have ten rows ,and I do df.union operation in 'for' > circulation , but this will case the operation only happen on driver node. > how can I do it on the whole cluster. > > 2018-12-14 > lk_spark
