I am running into serious performance problems with my spark 1.6 streaming app. As it runs it gets slower and slower.
My app is simple. * It receives fairly large and complex JSON files. (twitter data) * Converts the RDD to DataFrame * Splits the data frame in to maybe 20 different data sets * Writes each data set as JSON to s3 * Writing to S3 is really slow. I use an executorService to get the writes to run in parallel I found a lot of error log messages like the following error in my spark streaming executor log files Any suggestions? Thanks Andy 16/07/11 14:53:49 WARN FileOutputCommitter: Failed to delete the temporary output directory of task: attempt_201607111453_128606_m_000000_0 - s3n://com.xxx/json/yyy/2016-07-11/1468244820000/_temporary/_attempt_20160711 1453_128606_m_000000_0