On Saturday 05 March 2016 02:39 AM,
Jelez Raditchkov wrote:
My streaming job is creating files on S3.
The problem is that those files end up very small if I just
write them to S3 directly.
This is why I use coalesce() to reduce the n
My streaming job is creating files on S3.The problem is that those files end up
very small if I just write them to S3 directly.This is why I use coalesce() to
reduce the number of files and make them larger.
However, coalesce shuffles data and my job processing time ends up higher than
sparkBatc