Hi Annemarie,

could you please share your topology? If you have a shuffle, your job needs
2 slots per parallelism. So you'd only be able to scale up to 384/2.

On Tue, Jul 28, 2020 at 6:32 PM Robert Metzger <rmetz...@apache.org> wrote:

> Ah, the good old cloud-11 cluster at DIMA. I used that one as well in 2014
> to test Flink there :)
>
> Now regarding your question: Is it possible that 
> "Experiments.Experiment1(Experiments.java:42)"
> depends on the parallelism, and it is doing a lot more work than expected
> because of that?
>
> On Mon, Jul 27, 2020 at 9:50 PM Annemarie Burger <
> annemarie.bur...@campus.tu-berlin.de> wrote:
>
>> Hi,
>>
>> I am running Flink on a cluster with 24 workers, each with 16 cores.
>> Starting the cluster works fine and the Web interface confirms there are
>> 384
>> slots working. Executing my code with parallelism 24 works fine, but when
>> I
>> try a higher parallelism, eg. 384, the job never succeeds in submitting.
>> Also submitting from the web interface does not start the job, nor gives
>> any
>> errors. I also tried starting 4 1-slot taskmanagers on each machine, and
>> executing with parallelism 96, but same problem. The code is not very
>> complicated, with the logical graph having only 3 steps.
>> Attached is a file with the jstacks of the CliFrontend that is using CPU,
>> and the StandaloneSessionClusterEntrypoint, as well as the jstack of the
>> TaskManagerRunner on a remote machine(cloud-12). The jstacks are all from
>> this last scenario, when executing from command line.
>>
>> My relevant conf is as follows:
>>
>> queryable-state.enable: true
>> jobmanager.rpc.address: cloud-11
>> jobmanager.rpc.port: 6123
>> taskmanager.heap.mb: 28672
>> jobmanager.heap.mb: 14240
>> taskmanager.memory.fraction: 0.7
>> taskmanager.network.numberOfBuffers: 16384
>> taskmanager.network.bufferSizeInBytes: 16384
>> taskmanager.memory.task.off-heap.size: 4000m
>> taskmanager.memory.managed.size: 10000m
>> #taskmanager.numberOfTaskSlots: 16 #for normal setup
>> taskmanager.numberOfTaskSlots: 1 #for when setting multiple taskmanagers
>> per
>> machine.
>>
>> Am I doing something wrong?
>> Thanks in advance!
>>
>>   jstack.jstack
>> <
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t2502/jstack.jstack>
>>
>>
>>
>>
>> --
>> Sent from:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>>
>

-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Reply via email to