On Wed, Oct 5, 2016 at 7:08 PM, Tarandeep Singh <tarand...@gmail.com> wrote: > @Stephan my flink cluster setup- 5 nodes, each running 1 TaskManager. Slots > per task manager: 2-4 (I tried varying this to see if this has any impact). > Network buffers: 5k - 20k (tried different values for it).
Could you run the job first on a single task manager to see if the error occurs even if no network shuffle is involved? That should be less overhead for you than running the custom build (which might be buggy ;)). – Ufuk