Hi , I observed if I use subset of same dataset or data set is small its writing to many part files . If data set grows its writing to only part files rest all part files empty.
Thanks, Divya On 25 April 2016 at 23:15, nguyen duc tuan <newvalu...@gmail.com> wrote: > Maybe the problem is the data itself. For example, the first dataframe > might has common keys in only one part of the second dataframe. I think you > can verify if you are in this situation by repartition one dataframe and > join it. If this is the true reason, you might see the result distributed > more evenly. > > 2016-04-25 9:34 GMT+07:00 Divya Gehlot <divya.htco...@gmail.com>: > >> Hi, >> >> After joining two dataframes, saving dataframe using Spark CSV. >> But all the result data is being written to only one part file whereas >> there are 200 part files being created, rest 199 part files are empty. >> >> What is the cause of uneven partitioning ? How can I evenly distribute >> the data ? >> Would really appreciate the help. >> >> >> Thanks, >> Divya >> > >