We just deployed our first streaming apps. The next step is running them so they run reliably
We have spend a lot of time reading the various prog guides looking at the standalone cluster manager app performance web pages. Looking at the streaming tab and the stages tab have been the most helpful in tuning our app. However we do not understand the connection between memory and # cores will effect throughput and performance. Usually adding memory is the cheapest way to improve performance. When we have a single receiver call spark-submit --total-executor-cores 2. Changing the value does not seem to change throughput. our bottle neck was s3 write time, saveAsTextFile(). Reducing the number of partitions dramatically reduces s3 write times. Adding memory also does not improve performance I would think that adding more cores would allow more concurrent tasks run. That is to say reducing num partions would slow things down What are best practices? Kind regards Andy