subject:"Best way to merge files from streaming jobs"

Re: Best way to merge files from streaming jobs

2016-03-08 Thread Sumedh Wale

On Saturday 05 March 2016 02:39 AM, Jelez Raditchkov wrote: My streaming job is creating files on S3. The problem is that those files end up very small if I just write them to S3 directly. This is why I use coalesce() to reduce the n

Best way to merge files from streaming jobs

2016-03-04 Thread Jelez Raditchkov

My streaming job is creating files on S3.The problem is that those files end up very small if I just write them to S3 directly.This is why I use coalesce() to reduce the number of files and make them larger. However, coalesce shuffles data and my job processing time ends up higher than sparkBatc