generate some data in Spark . 2018-12-14
lk_spark 发件人:Jean Georges Perrin <[email protected]> 发送时间:2018-12-14 11:10 主题:Re: how to generate a larg dataset paralleled 收件人:"lk_spark"<[email protected]> 抄送:"user.spark"<[email protected]> You just want to generate some data in Spark or ingest a large dataset outside of Spark? What’s the ultimate goal you’re pursuing? jg On Dec 13, 2018, at 21:38, lk_spark <[email protected]> wrote: hi,all: I want't to generate some test data , which contained about one hundred million rows . I create a dataset have ten rows ,and I do df.union operation in 'for' circulation , but this will case the operation only happen on driver node. how can I do it on the whole cluster. 2018-12-14 lk_spark
