Hi Guys, We are having a slight issue using Flink 1.1.3 (we also observed the problem with 1.0.2) in Yarn 2.4.0. Whenever a TaskManager restarts, YARN seems to reserve memory during the TaskManager restart, and not free the memory again. We are using a CapacityScheduler with 2 queues, where the queue in which our Flink Yarn Session runs has a guaranteed capacity of 75%. What we are seeing, is that the amount of reserved memory is exactly the amount of memory available in the queue after the TaskManager is crashed.
On our test system, further TaskManager restarts have been able to get rid of the TaskManager again. When trying to replicate this on our production system I was not successful, one difference being, that I killed a TaskManager with no used slots in prod, while on the test system jobs were restarted. Nothing enlightening in the logs, unfortunately. Is this something that anyone has experienced so far? Cheers, Michael -- Michael Pisula * michael.pis...@tngtech.com * +49-174-3180084 TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke Sitz: Unterföhring * Amtsgericht München * HRB 135082
signature.asc
Description: OpenPGP digital signature