Re: how to merge dataframe write output files

2016-11-10 Thread Jorge Sánchez
Do you have the logs of the containers? This seems like a Memory issue. 2016-11-10 7:28 GMT+00:00 lk_spark : > hi,all: > when I call api df.write.parquet ,there is alot of small files : how > can I merge then into on file ? I tried df.coalesce(1).write.parquet ,but > it

RE: how to merge dataframe write output files

2016-11-09 Thread Shreya Agarwal
Is there a reason you want to merge the files? The reason you are getting errors (afaik) is because when you try to coalesce and then write, you are forcing all the content to reside on one executor, and the size of data is exceeding the memory you have for storage in your executor, hence