sorry, now what I can do is like this :
var df5 = spark.read.parquet("/user/devuser/testdata/df1").coalesce(1)
df5 = df5.union(df5).union(df5).union(df5).union(df5)
2018-12-14
lk_spark
发件人:15313776907 <15313776...@163.com>
发送时间:2018-12-14 16:39
主题:Re: how to generat
I also have this problem, hope to be able to solve here, thank you
On 12/14/2018 10:38,lk_spark wrote:
hi,all:
I want't to generate some test data , which contained about one hundred
million rows .
I create a dataset have ten rows ,and I do df.union operation in 'for'
circulation , but
generate some data in Spark .
2018-12-14
lk_spark
发件人:Jean Georges Perrin
发送时间:2018-12-14 11:10
主题:Re: how to generate a larg dataset paralleled
收件人:"lk_spark"
抄送:"user.spark"
You just want to generate some data in Spark or ingest a large dataset outside
of Spark?
You just want to generate some data in Spark or ingest a large dataset outside
of Spark? What’s the ultimate goal you’re pursuing?
jg
> On Dec 13, 2018, at 21:38, lk_spark wrote:
>
> hi,all:
> I want't to generate some test data , which contained about one hundred
> million rows .
>