[jira] [Commented] (MESOS-9936) Slave recovery is very slow with high local volume persistant ( marathon app )

2019-08-15 Thread Andrei Budnik (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908065#comment-16908065
 ] 

Andrei Budnik commented on MESOS-9936:
--

How to reproduce the issue? Could you please share an app definition or provide 
steps to reproduce?

Also, there must be more log lines between "Recovering provisioner" and 
"Finished recovering all containerizers". At least, "Provisioner recovery 
complete". Is there anything else between these 2 log lines?

> Slave recovery is very slow with high local volume persistant ( marathon app )
> --
>
> Key: MESOS-9936
> URL: https://issues.apache.org/jira/browse/MESOS-9936
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.8.1
>Reporter: Frédéric Comte
>Priority: Major
>
> I run some local persistant applications..
> After an unplannified shutdown of  nodes running this kind of applications, I 
> see that the recovery process of mesos is taking a lot of time (more than 8 
> hours)...
> This time depends of the amount of data in those volumes.
> What does Mesos do in this process ?
> {code:java}
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.771447 13370 
> docker.cpp:890] Recovering Docker containers Jul 08 07:40:44 boss1 
> mesos-agent[13345]: I0708 07:40:44.783957 13375 containerizer.cpp:801] 
> Recovering Mesos containers 
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.799252 13373 
> linux_launcher.cpp:286] Recovering Linux launcher 
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.810429 13375 
> containerizer.cpp:1127] Recovering isolators 
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.817328 13389 
> containerizer.cpp:1166] Recovering provisioner 
> Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.928683 13373 
> composing.cpp:339] Finished recovering all containerizers 
> Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.950503 13354 
> status_update_manager_process.hpp:314] Recovering operation status update 
> manager 
> Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.957418 13399 
> slave.cpp:7729] Recovering executors
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-9936) Slave recovery is very slow with high local volume persistant ( marathon app )

2019-08-13 Thread JIRA


[ 
https://issues.apache.org/jira/browse/MESOS-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906328#comment-16906328
 ] 

Frédéric Comte commented on MESOS-9936:
---

I am on CoreOS, I don't know how I can do that.

> Slave recovery is very slow with high local volume persistant ( marathon app )
> --
>
> Key: MESOS-9936
> URL: https://issues.apache.org/jira/browse/MESOS-9936
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.8.1
>Reporter: Frédéric Comte
>Priority: Major
>
> I run some local persistant applications..
> After an unplannified shutdown of  nodes running this kind of applications, I 
> see that the recovery process of mesos is taking a lot of time (more than 8 
> hours)...
> This time depends of the amount of data in those volumes.
> What does Mesos do in this process ?
> {code:java}
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.771447 13370 
> docker.cpp:890] Recovering Docker containers Jul 08 07:40:44 boss1 
> mesos-agent[13345]: I0708 07:40:44.783957 13375 containerizer.cpp:801] 
> Recovering Mesos containers 
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.799252 13373 
> linux_launcher.cpp:286] Recovering Linux launcher 
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.810429 13375 
> containerizer.cpp:1127] Recovering isolators 
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.817328 13389 
> containerizer.cpp:1166] Recovering provisioner 
> Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.928683 13373 
> composing.cpp:339] Finished recovering all containerizers 
> Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.950503 13354 
> status_update_manager_process.hpp:314] Recovering operation status update 
> manager 
> Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.957418 13399 
> slave.cpp:7729] Recovering executors
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-9936) Slave recovery is very slow with high local volume persistant ( marathon app )

2019-08-13 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906276#comment-16906276
 ] 

Vinod Kone commented on MESOS-9936:
---

[~Fcomte] That's pretty weird and unexpected. Can you share gdb stack trace 
during one of these long recovery periods?

> Slave recovery is very slow with high local volume persistant ( marathon app )
> --
>
> Key: MESOS-9936
> URL: https://issues.apache.org/jira/browse/MESOS-9936
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.8.1
>Reporter: Frédéric Comte
>Priority: Major
>
> I run some local persistant applications..
> After an unplannified shutdown of  nodes running this kind of applications, I 
> see that the recovery process of mesos is taking a lot of time (more than 8 
> hours)...
> This time depends of the amount of data in those volumes.
> What does Mesos do in this process ?
> {code:java}
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.771447 13370 
> docker.cpp:890] Recovering Docker containers Jul 08 07:40:44 boss1 
> mesos-agent[13345]: I0708 07:40:44.783957 13375 containerizer.cpp:801] 
> Recovering Mesos containers 
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.799252 13373 
> linux_launcher.cpp:286] Recovering Linux launcher 
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.810429 13375 
> containerizer.cpp:1127] Recovering isolators 
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.817328 13389 
> containerizer.cpp:1166] Recovering provisioner 
> Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.928683 13373 
> composing.cpp:339] Finished recovering all containerizers 
> Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.950503 13354 
> status_update_manager_process.hpp:314] Recovering operation status update 
> manager 
> Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.957418 13399 
> slave.cpp:7729] Recovering executors
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-9936) Slave recovery is very slow with high local volume persistant ( marathon app )

2019-08-13 Thread JIRA


[ 
https://issues.apache.org/jira/browse/MESOS-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906181#comment-16906181
 ] 

Frédéric Comte commented on MESOS-9936:
---

I am using dcos v 1.13.3 so mesos is 1.8.1

> Slave recovery is very slow with high local volume persistant ( marathon app )
> --
>
> Key: MESOS-9936
> URL: https://issues.apache.org/jira/browse/MESOS-9936
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Frédéric Comte
>Priority: Major
>
> I run some local persistant applications..
> After an unplannified shutdown of  nodes running this kind of applications, I 
> see that the recovery process of mesos is taking a lot of time (more than 8 
> hours)...
> This time depends of the amount of data in those volumes.
> What does Mesos do in this process ?
> {code:java}
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.771447 13370 
> docker.cpp:890] Recovering Docker containers Jul 08 07:40:44 boss1 
> mesos-agent[13345]: I0708 07:40:44.783957 13375 containerizer.cpp:801] 
> Recovering Mesos containers 
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.799252 13373 
> linux_launcher.cpp:286] Recovering Linux launcher 
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.810429 13375 
> containerizer.cpp:1127] Recovering isolators 
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.817328 13389 
> containerizer.cpp:1166] Recovering provisioner 
> Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.928683 13373 
> composing.cpp:339] Finished recovering all containerizers 
> Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.950503 13354 
> status_update_manager_process.hpp:314] Recovering operation status update 
> manager 
> Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.957418 13399 
> slave.cpp:7729] Recovering executors
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-9936) Slave recovery is very slow with high local volume persistant ( marathon app )

2019-08-13 Thread Andrei Budnik (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906165#comment-16906165
 ] 

Andrei Budnik commented on MESOS-9936:
--

[~Fcomte]
what version of Mesos are you using?

> Slave recovery is very slow with high local volume persistant ( marathon app )
> --
>
> Key: MESOS-9936
> URL: https://issues.apache.org/jira/browse/MESOS-9936
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Frédéric Comte
>Priority: Major
>
> I run some local persistant applications..
> After an unplannified shutdown of  nodes running this kind of applications, I 
> see that the recovery process of mesos is taking a lot of time (more than 8 
> hours)...
> This time depends of the amount of data in those volumes.
> What does Mesos do in this process ?
> {code:java}
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.771447 13370 
> docker.cpp:890] Recovering Docker containers Jul 08 07:40:44 boss1 
> mesos-agent[13345]: I0708 07:40:44.783957 13375 containerizer.cpp:801] 
> Recovering Mesos containers 
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.799252 13373 
> linux_launcher.cpp:286] Recovering Linux launcher 
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.810429 13375 
> containerizer.cpp:1127] Recovering isolators 
> Jul 08 07:40:44 boss1 mesos-agent[13345]: I0708 07:40:44.817328 13389 
> containerizer.cpp:1166] Recovering provisioner 
> Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.928683 13373 
> composing.cpp:339] Finished recovering all containerizers 
> Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.950503 13354 
> status_update_manager_process.hpp:314] Recovering operation status update 
> manager 
> Jul 08 14:42:10 boss1 mesos-agent[13345]: I0708 14:42:10.957418 13399 
> slave.cpp:7729] Recovering executors
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)