Hello Harjinder,

We had this same problem some time ago but had additional behaviors during
the "Queued" period, that was:

- Other frameworks also stopped getting new offers from mesos master, not
only Chronos. Here we have multiple frameworks connected to the same mesos
cluster;
- In the Mesos UI, at the Offers Tab *all* offers were stuck with Chronos
itself, that is, Chronos had all offers and didn't DECLINED or ACCEPTED
these offers.

What we did to mitigate the problem was to set a 1min timeout to the
`--offer_timeout` master option[1]. According to the docs the default value
for this option is not to have a timeout, so the offers have the potential
to be forever with a misbehaving framework.

Maybe you could check if this is the case for you too.

Hope this helps you mitigate this problem.

[1]
http://mesos.apache.org/documentation/latest/configuration/master/#offer_timeout

Thanks,

Em qua., 25 de dez. de 2019 às 13:49, Vinod Kone <[email protected]>
escreveu:

> The suggested info would be needed to triage this.
>
> Thanks,
> Vinod
>
> On Dec 24, 2019, at 11:32 PM, Harjinder Singh Mistry <
> [email protected]> wrote:
>
> 
> We have been encountering an *intermittent* issue where Chronos stops
> getting
> resource offers from Mesos master and the scheduled jobs get stuck in
> 'Queued'
> state at Chronos.
>
> The sequence of observed events is as follows:
> 1. Chronos jobs are not executed by Mesos and status of jobs on Chronos
>    dashboard is ‘Queued’.
> 2. Mesos master dashboard no longer shows agents i.e. slaves.
> 3. Mesos master logs show that master has not been sending resource offers
> to
>    framework i.e. Chronos. But master keeps getting update from slaves for
> old
>    tasks.
> 4. Zookeeper and slaves are not down. They are working fine.
> 5. After restarting Zookeeper, the system starts working fine. Chronos jobs
>    start getting executed.
>
> Please suggest a solution if this problem is known.
>
> Can you please help us with the steps/info required for investigation ? We
> plan
> to collect following when the issue happens next time:
>
> 1. Logs from Chronos, Mesos Master, Mesos Slaves and Zookeeper nodes.
> 2. Check Mesos UI: http://mesos-master:5050 and see if any agents are
> listed
>    and note status of jobs.
> 3. Hit the endpoint http://mesos-master:5050/state and save its output.
> 4. Check if Mesos masters and Zookeeper nodes are reachable (i.e. ping)
> from
>    Mesos slaves.
> 5. From output of step 3, determine the leader in Mesos master and check
> if is
>    sending offers: tail -f /var/log/mesos-log/mesos-master.INFO | grep -i
> sending
>
> Thanks,
> Harjinder
>
>
>
> *-----------------------------------------------------------------------------------------*
>
> *This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they are
> addressed. If you have received this email in error, please notify the
> system manager. This message contains confidential information and is
> intended only for the individual named. If you are not the named addressee,
> you should not disseminate, distribute or copy this email. Please notify
> the sender immediately by email if you have received this email by mistake
> and delete this email from your system. If you are not the intended
> recipient, you are notified that disclosing, copying, distributing or
> taking any action in reliance on the contents of this information is
> strictly prohibited.*
>
>
>
> *Any views or opinions presented in this email are solely those of the
> author and do not necessarily represent those of the organization. Any
> information on shares, debentures or similar instruments, recommended
> product pricing, valuations and the like are for information purposes only.
> It is not meant to be an instruction or recommendation, as the case may be,
> to buy or to sell securities, products, services nor an offer to buy or
> sell securities, products or services unless specifically stated to be so
> on behalf of the Flipkart group. Employees of the Flipkart group of
> companies are expressly required not to make defamatory statements and not
> to infringe or authorise any infringement of copyright or any other legal
> right by email communications. Any such communication is contrary to
> organizational policy and outside the scope of the employment of the
> individual concerned. The organization will not accept any liability in
> respect of such communication, and the employee responsible will be
> personally liable for any damages or other liability arising.*
>
>
>
> *Our organization accepts no liability for the content of this email, or
> for the consequences of any actions taken on the basis of the information *
> provided,* unless that information is subsequently confirmed in writing.
> If you are not the intended recipient, you are notified that disclosing,
> copying, distributing or taking any action in reliance on the contents of
> this information is strictly prohibited.*
>
>
> *-----------------------------------------------------------------------------------------*
>
>
> ------------------------------
>
> Esta mensagem pode conter informações confidenciais e somente o indivíduo
> ou entidade a quem foi destinada pode utilizá-la. A transmissão incorreta
> da mensagem não acarreta a perda de sua confidencialidade. Caso esta
> mensagem tenha sido recebida por engano, solicitamos que o fato seja
> comunicado ao remetente e que a mensagem seja eliminada de seu sistema
> imediatamente. É vedado a qualquer pessoa que não seja o destinatário usar,
> revelar, distribuir ou copiar qualquer parte desta mensagem. Ambiente de
> comunicação sujeito a monitoramento.
>
> This message may include confidential information and only the intended
> addresses have the right to use it as is, or any part of it. A wrong
> transmission does not break its confidentiality. If you've received it
> because of a mistake or erroneous transmission, please notify the sender
> and delete it from your system immediately. This communication environment
> is controlled and monitored.
>
> B2W Digital
>
>
>


-- 
[image: B2WADS] <https://b2wads.com>

-- 

Reply via email to