Hi Annemarie, could you please share your topology? If you have a shuffle, your job needs 2 slots per parallelism. So you'd only be able to scale up to 384/2.
On Tue, Jul 28, 2020 at 6:32 PM Robert Metzger <rmetz...@apache.org> wrote: > Ah, the good old cloud-11 cluster at DIMA. I used that one as well in 2014 > to test Flink there :) > > Now regarding your question: Is it possible that > "Experiments.Experiment1(Experiments.java:42)" > depends on the parallelism, and it is doing a lot more work than expected > because of that? > > On Mon, Jul 27, 2020 at 9:50 PM Annemarie Burger < > annemarie.bur...@campus.tu-berlin.de> wrote: > >> Hi, >> >> I am running Flink on a cluster with 24 workers, each with 16 cores. >> Starting the cluster works fine and the Web interface confirms there are >> 384 >> slots working. Executing my code with parallelism 24 works fine, but when >> I >> try a higher parallelism, eg. 384, the job never succeeds in submitting. >> Also submitting from the web interface does not start the job, nor gives >> any >> errors. I also tried starting 4 1-slot taskmanagers on each machine, and >> executing with parallelism 96, but same problem. The code is not very >> complicated, with the logical graph having only 3 steps. >> Attached is a file with the jstacks of the CliFrontend that is using CPU, >> and the StandaloneSessionClusterEntrypoint, as well as the jstack of the >> TaskManagerRunner on a remote machine(cloud-12). The jstacks are all from >> this last scenario, when executing from command line. >> >> My relevant conf is as follows: >> >> queryable-state.enable: true >> jobmanager.rpc.address: cloud-11 >> jobmanager.rpc.port: 6123 >> taskmanager.heap.mb: 28672 >> jobmanager.heap.mb: 14240 >> taskmanager.memory.fraction: 0.7 >> taskmanager.network.numberOfBuffers: 16384 >> taskmanager.network.bufferSizeInBytes: 16384 >> taskmanager.memory.task.off-heap.size: 4000m >> taskmanager.memory.managed.size: 10000m >> #taskmanager.numberOfTaskSlots: 16 #for normal setup >> taskmanager.numberOfTaskSlots: 1 #for when setting multiple taskmanagers >> per >> machine. >> >> Am I doing something wrong? >> Thanks in advance! >> >> jstack.jstack >> < >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t2502/jstack.jstack> >> >> >> >> >> -- >> Sent from: >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >> > -- Arvid Heise | Senior Java Developer <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng