Re: [Spark 1.5.2]All data being written to only one part file rest part files are empty

Divya Gehlot Thu, 28 Apr 2016 23:23:32 -0700

Hi ,

I observed if I use subset of same dataset  or data set is small  its
writing to many part files .
If data set grows its writing to only part files rest all part files empty.



Thanks,
Divya

On 25 April 2016 at 23:15, nguyen duc tuan <newvalu...@gmail.com> wrote:

> Maybe the problem is the data itself. For example, the first dataframe
> might has common keys in only one part of the second dataframe. I think you
> can verify if you are in this situation by repartition one dataframe and
> join it. If this is the true reason, you might see the result distributed
> more evenly.
>
> 2016-04-25 9:34 GMT+07:00 Divya Gehlot <divya.htco...@gmail.com>:
>
>> Hi,
>>
>> After joining two dataframes, saving dataframe using Spark CSV.
>> But all the result data is being written to only one part file whereas
>> there are 200 part files being created, rest 199 part files are empty.
>>
>> What is the cause of uneven partitioning ? How can I evenly distribute
>> the data ?
>> Would really appreciate the help.
>>
>>
>> Thanks,
>> Divya
>>
>
>

Re: [Spark 1.5.2]All data being written to only one part file rest part files are empty

Reply via email to