[
https://issues.apache.org/jira/browse/MYRIAD-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939010#comment-14939010
]
Santosh Marella commented on MYRIAD-137:
Fine grained scaling caches resource offers from Mesos. When a FGS NM (zero
profile) dies (either accidentally or via an explicit flexdown request), Myriad
should decline any outstanding offers in FGS' queue for that NM.
> Resources offered by mesos are blocked with Myriad FWK on
> NullPointerException and FlexDown FGS NM.
> ---
>
> Key: MYRIAD-137
> URL: https://issues.apache.org/jira/browse/MYRIAD-137
> Project: Myriad
> Issue Type: Bug
> Components: Scheduler
>Affects Versions: Myriad 0.1.0
>Reporter: Sarjeet Singh
>Assignee: Santosh Marella
>
> Observed this issue on 2 instances when I did a flex down of FGS NM & On
> another instance, this happened when NullPointerException occurred (JIRA
> Myriad-135).
> From Mesos UI, observed that no resources are left to offer, when there was
> no utilization happening in the cluster, except 3 NMs (2 MP, 1 ZP).
> On debugging RM logs, found the NullPointerException which caused the
> OfferEventHandler thread to exit and no more offers from mesos to myriad
> after that.
> Then, I tried restarting RM again, and resources are back to mesos again :)
> Then, I tried running few mapreduce jobs and observed the issue with Flexing
> down FGS NM which caused the whole resources offered to myriad to block
> completely and myriad didn't release any resources after that.
> So, it seems that Flexing down NMs procedure only cleanup the active
> containers & NM itself, but doesn't clean up outstanding offers incase offers
> are saved to OfferLifeCycle for future task by FGS NMs.
> Resources (From mesos-master UI)
> =
> CPUsMem
> Total84253.9 GB
> Used3.3006.1 GB
> Offered80.700247.8 GB
> Idle-1.4210854715202004e-140 B<--- No Resources available.
> Here is the active Offers (*blocked*) shown on mesos UI for offers:
> Offers
> =
> IDFrameworkHostCPUsMem
> …5050-3270-O4151MyriadAlphanode101-1160.564 MB
> …5050-3270-O4149MyriadAlphanode101-1160.200282 MB
> …5050-3270-O4147MyriadAlphanode101-11611.0 GB
> …5050-3270-O4145MyriadAlphanode101-11611.0 GB
> …5050-3270-O4143MyriadAlphanode101-11611.0 GB
> …5050-3270-O4141MyriadAlphanode101-11611.0 GB
> …5050-3270-O4139MyriadAlphanode101-11724.587.8 GB
> …5050-3270-O4137MyriadAlphanode101-11622.987.4 GB
> …5050-3270-O4135MyriadAlphanode101-11733.0 GB
> …5050-3270-O4134MyriadAlphanode101-13725.665.2 GB
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)