Hello Harjinder, We had this same problem some time ago but had additional behaviors during the "Queued" period, that was:
- Other frameworks also stopped getting new offers from mesos master, not only Chronos. Here we have multiple frameworks connected to the same mesos cluster; - In the Mesos UI, at the Offers Tab *all* offers were stuck with Chronos itself, that is, Chronos had all offers and didn't DECLINED or ACCEPTED these offers. What we did to mitigate the problem was to set a 1min timeout to the `--offer_timeout` master option[1]. According to the docs the default value for this option is not to have a timeout, so the offers have the potential to be forever with a misbehaving framework. Maybe you could check if this is the case for you too. Hope this helps you mitigate this problem. [1] http://mesos.apache.org/documentation/latest/configuration/master/#offer_timeout Thanks, Em qua., 25 de dez. de 2019 às 13:49, Vinod Kone <[email protected]> escreveu: > The suggested info would be needed to triage this. > > Thanks, > Vinod > > On Dec 24, 2019, at 11:32 PM, Harjinder Singh Mistry < > [email protected]> wrote: > > > We have been encountering an *intermittent* issue where Chronos stops > getting > resource offers from Mesos master and the scheduled jobs get stuck in > 'Queued' > state at Chronos. > > The sequence of observed events is as follows: > 1. Chronos jobs are not executed by Mesos and status of jobs on Chronos > dashboard is ‘Queued’. > 2. Mesos master dashboard no longer shows agents i.e. slaves. > 3. Mesos master logs show that master has not been sending resource offers > to > framework i.e. Chronos. But master keeps getting update from slaves for > old > tasks. > 4. Zookeeper and slaves are not down. They are working fine. > 5. After restarting Zookeeper, the system starts working fine. Chronos jobs > start getting executed. > > Please suggest a solution if this problem is known. > > Can you please help us with the steps/info required for investigation ? We > plan > to collect following when the issue happens next time: > > 1. Logs from Chronos, Mesos Master, Mesos Slaves and Zookeeper nodes. > 2. Check Mesos UI: http://mesos-master:5050 and see if any agents are > listed > and note status of jobs. > 3. Hit the endpoint http://mesos-master:5050/state and save its output. > 4. Check if Mesos masters and Zookeeper nodes are reachable (i.e. ping) > from > Mesos slaves. > 5. From output of step 3, determine the leader in Mesos master and check > if is > sending offers: tail -f /var/log/mesos-log/mesos-master.INFO | grep -i > sending > > Thanks, > Harjinder > > > > *-----------------------------------------------------------------------------------------* > > *This email and any files transmitted with it are confidential and > intended solely for the use of the individual or entity to whom they are > addressed. If you have received this email in error, please notify the > system manager. This message contains confidential information and is > intended only for the individual named. If you are not the named addressee, > you should not disseminate, distribute or copy this email. Please notify > the sender immediately by email if you have received this email by mistake > and delete this email from your system. If you are not the intended > recipient, you are notified that disclosing, copying, distributing or > taking any action in reliance on the contents of this information is > strictly prohibited.* > > > > *Any views or opinions presented in this email are solely those of the > author and do not necessarily represent those of the organization. Any > information on shares, debentures or similar instruments, recommended > product pricing, valuations and the like are for information purposes only. > It is not meant to be an instruction or recommendation, as the case may be, > to buy or to sell securities, products, services nor an offer to buy or > sell securities, products or services unless specifically stated to be so > on behalf of the Flipkart group. Employees of the Flipkart group of > companies are expressly required not to make defamatory statements and not > to infringe or authorise any infringement of copyright or any other legal > right by email communications. Any such communication is contrary to > organizational policy and outside the scope of the employment of the > individual concerned. The organization will not accept any liability in > respect of such communication, and the employee responsible will be > personally liable for any damages or other liability arising.* > > > > *Our organization accepts no liability for the content of this email, or > for the consequences of any actions taken on the basis of the information * > provided,* unless that information is subsequently confirmed in writing. > If you are not the intended recipient, you are notified that disclosing, > copying, distributing or taking any action in reliance on the contents of > this information is strictly prohibited.* > > > *-----------------------------------------------------------------------------------------* > > > ------------------------------ > > Esta mensagem pode conter informações confidenciais e somente o indivíduo > ou entidade a quem foi destinada pode utilizá-la. A transmissão incorreta > da mensagem não acarreta a perda de sua confidencialidade. Caso esta > mensagem tenha sido recebida por engano, solicitamos que o fato seja > comunicado ao remetente e que a mensagem seja eliminada de seu sistema > imediatamente. É vedado a qualquer pessoa que não seja o destinatário usar, > revelar, distribuir ou copiar qualquer parte desta mensagem. Ambiente de > comunicação sujeito a monitoramento. > > This message may include confidential information and only the intended > addresses have the right to use it as is, or any part of it. A wrong > transmission does not break its confidentiality. If you've received it > because of a mistake or erroneous transmission, please notify the sender > and delete it from your system immediately. This communication environment > is controlled and monitored. > > B2W Digital > > > -- [image: B2WADS] <https://b2wads.com> --

