Piyush Narang created FLINK-14158:
-------------------------------------

             Summary: Update Mesos configs to add leaseOfferExpiration and 
declinedOfferRefuse durations
                 Key: FLINK-14158
                 URL: https://issues.apache.org/jira/browse/FLINK-14158
             Project: Flink
          Issue Type: Bug
            Reporter: Piyush Narang


While debugging some Flink on Mesos scheduling issues (tied to our use of Mesos 
quotas) we end up getting skewed offers that are useless fairly often. As we 
are not rejecting these offers fast enough and as we are not telling Mesos to 
not re-send for a long enough period, we end up not being able to schedule our 
job for upwards of an hour (~30 Mesos containers). 

The Fenzo default is to reject expired and unused Mesos offers after 120s, this 
can be overridden using their TaskScheduler builder. Additionally, Mesos allows 
us to override the time for which it won't re-send offers (default is 5s). We 
found that updating to reject more aggressively (every 1s instead of 120s) and 
keeping rejected offers away for longer (60s instead of 5s) dramatically 
increases our chances of scheduling our jobs on Mesos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to