Re: How about disable the irc ASFBot to flood the irc channel?
Can't you just '/ignore' the IRC bot if it bothers you? On 17 April 2014 03:01, Chengwei Yang chengwei.yang...@gmail.com wrote: Hi All, I am a irc guy, maybe so as you. However, I found that there are two bots for JIRA, one for the mesos-dev mailing list, one for the irc channel. I generally think the bot for mailing list is fine, which push a JIRA msg in a mail thread, so with full context, readable and easy to understand the full page. However, the irc channel as its a room for human beings to chat with each other, I think not suitable if it's flood by the JIRA bot. I found it's very hard to me to figure out what human beings are talking about in the ASFBot flooding. Could we just kill ASFBot for the irc channel? But left the one for mesos-dev mailing list. -- Thanks, Chengwei footnote: I have to Cc to myself otherwise I found Gmail doesn't mark that email as unread, so I can't pull it into my mutt mbox.
Re: How about disable the irc ASFBot to flood the irc channel?
On Thu, Apr 17, 2014 at 09:31:43AM +0100, Dick Davies wrote: Can't you just '/ignore' the IRC bot if it bothers you? Thanks for your tip. IMHO, I think I'm not the first one have that idea, I also think this is a reason why there isn't much discussion in the irc channel, discussion may always disturbed by the JIRA bot. -- Thanks, Chengwei On 17 April 2014 03:01, Chengwei Yang chengwei.yang...@gmail.com wrote: Hi All, I am a irc guy, maybe so as you. However, I found that there are two bots for JIRA, one for the mesos-dev mailing list, one for the irc channel. I generally think the bot for mailing list is fine, which push a JIRA msg in a mail thread, so with full context, readable and easy to understand the full page. However, the irc channel as its a room for human beings to chat with each other, I think not suitable if it's flood by the JIRA bot. I found it's very hard to me to figure out what human beings are talking about in the ASFBot flooding. Could we just kill ASFBot for the irc channel? But left the one for mesos-dev mailing list. -- Thanks, Chengwei footnote: I have to Cc to myself otherwise I found Gmail doesn't mark that email as unread, so I can't pull it into my mutt mbox.
What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?
I don't recall the exact timeout of framework IDs, but what I'm wondering is what happens if a scheduler tries to failover, but the failover grace period has elapsed? Does it fail to register, or does it successfully register and all the old executors are just gone?
Trying to get task reconciliation to work
Hello, I don't seem to have reconcileTasks() working for me and was wondering if I am either using it incorrectly or hitting a problem. Here's what's happening: 1. There's one Mesos (0.18) master, one slave, one framework, all running on Ubuntu 12.04 2. Mesos master and slave come up fine (using Zookeeper, but that isn't relevant here, I'd think) 3. My framework registers and gets offers 4. Two tasks are launched, both start running fine on the single available slave 5. I restart my framework. During restart my framework knows that it had previously launched two tasks that were last known to be in running state. Therefore, upon getting the registered() callback, it calls driver.reconcileTasks() for the two tasks. In actuality, the tasks are still running fine. I see this in mesos master logs: I0417 12:26:27.207361 27301 master.cpp:2154] Performing task state reconciliation for framework MyFramework But, no other logs about reconciliation. 6. My framework gets no callback about status of tasks that it requested reconciliation on. At this point, I am not sure if the lack of a callback for status update is due to a) the fact that my framework asked for reconciliation on running state, which Mesos also knows to be true, therefore, no status update b) Or, if the reconcile is not working. (hopefully this; reason (a) would be problematic) So, I then proceed to another test: 7. kill my framework and mesos master 8. Then, kill the slave (as an aside, this seems to have killed the tasks as well) 9. Restart mesos master 10. Restart my framework. Now, again the reconciliation is requested. 11. Still no callback. At this time, mesos master doesn't know about the slave because it hasn't returned since master restarted. What is the expected behavior for reconciliation under these circumstances? 12. Restarted slave 13. Killed and restarted my framework. 14. Still no callback for reconciliation. Given these results, I can't see how reconciliation is working at all. I did try this with Mesos 0.16 first and then upgraded to 0.18 to see if it makes a difference. Thank you for any ideas on getting this resolved. Sharma
Re: Trying to get task reconciliation to work
Should've looked at the code before sending the previous email... master/main.cpp confirmed what I needed to know. It doesn't look like I will be able to use reconcileTasks the way I thought I could. Effectively, a lack of callback could either mean that the master agrees with the requested reconcile task state, or that the task and/or slave is currently unknown. Which makes it an unreliable source of data. I understand this is expected to improve later by leveraging the registrar, but, I suspect there's more to it. I take it then that individual frameworks need to have their own mechanisms to ascertain the state of their tasks. On Thu, Apr 17, 2014 at 12:53 PM, Sharma Podila spod...@netflix.com wrote: Hello, I don't seem to have reconcileTasks() working for me and was wondering if I am either using it incorrectly or hitting a problem. Here's what's happening: 1. There's one Mesos (0.18) master, one slave, one framework, all running on Ubuntu 12.04 2. Mesos master and slave come up fine (using Zookeeper, but that isn't relevant here, I'd think) 3. My framework registers and gets offers 4. Two tasks are launched, both start running fine on the single available slave 5. I restart my framework. During restart my framework knows that it had previously launched two tasks that were last known to be in running state. Therefore, upon getting the registered() callback, it calls driver.reconcileTasks() for the two tasks. In actuality, the tasks are still running fine. I see this in mesos master logs: I0417 12:26:27.207361 27301 master.cpp:2154] Performing task state reconciliation for framework MyFramework But, no other logs about reconciliation. 6. My framework gets no callback about status of tasks that it requested reconciliation on. At this point, I am not sure if the lack of a callback for status update is due to a) the fact that my framework asked for reconciliation on running state, which Mesos also knows to be true, therefore, no status update b) Or, if the reconcile is not working. (hopefully this; reason (a) would be problematic) So, I then proceed to another test: 7. kill my framework and mesos master 8. Then, kill the slave (as an aside, this seems to have killed the tasks as well) 9. Restart mesos master 10. Restart my framework. Now, again the reconciliation is requested. 11. Still no callback. At this time, mesos master doesn't know about the slave because it hasn't returned since master restarted. What is the expected behavior for reconciliation under these circumstances? 12. Restarted slave 13. Killed and restarted my framework. 14. Still no callback for reconciliation. Given these results, I can't see how reconciliation is working at all. I did try this with Mesos 0.16 first and then upgraded to 0.18 to see if it makes a difference. Thank you for any ideas on getting this resolved. Sharma
Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?
David, did you see Vinod's response to your (identical) question on dev@? http://www.mail-archive.com/dev@mesos.apache.org/msg11634.html On Thu, Apr 17, 2014 at 11:26 AM, David Greenberg dsg123456...@gmail.comwrote: I don't recall the exact timeout of framework IDs, but what I'm wondering is what happens if a scheduler tries to failover, but the failover grace period has elapsed? Does it fail to register, or does it successfully register and all the old executors are just gone?
Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?
I did not, thank you! I reported when I didn't see a response and couldn't find it in the dev archive--I thought maybe it had gotten blackholed because I don't subscribe to dev. On Thursday, April 17, 2014, Adam Bordelon a...@mesosphere.io wrote: David, did you see Vinod's response to your (identical) question on dev@? http://www.mail-archive.com/dev@mesos.apache.org/msg11634.html On Thu, Apr 17, 2014 at 11:26 AM, David Greenberg dsg123456...@gmail.comjavascript:_e(%7B%7D,'cvml','dsg123456...@gmail.com'); wrote: I don't recall the exact timeout of framework IDs, but what I'm wondering is what happens if a scheduler tries to failover, but the failover grace period has elapsed? Does it fail to register, or does it successfully register and all the old executors are just gone?
Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?
My follow-up question is this--is there a way to tell whether I'm outside of the timeout window? I'd like to have my framework check ZK and determine whether it's w/in the framework timeout or not, so that it can make the correct call. On Thu, Apr 17, 2014 at 5:23 PM, Adam Bordelon a...@mesosphere.io wrote: David, did you see Vinod's response to your (identical) question on dev@? http://www.mail-archive.com/dev@mesos.apache.org/msg11634.html On Thu, Apr 17, 2014 at 11:26 AM, David Greenberg dsg123456...@gmail.comwrote: I don't recall the exact timeout of framework IDs, but what I'm wondering is what happens if a scheduler tries to failover, but the failover grace period has elapsed? Does it fail to register, or does it successfully register and all the old executors are just gone?
Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?
On Thu, Apr 17, 2014 at 2:56 PM, David Greenberg dsg123456...@gmail.comwrote: My follow-up question is this--is there a way to tell whether I'm outside of the timeout window? I'd like to have my framework check ZK and determine whether it's w/in the framework timeout or not, so that it can make the correct call. Hey David, Currently, the only signal you can get is by hitting /state.json endpoint on the master. The framework should've been moved to 'completed_frameworks' after the failover timeout. Of course, if a master fails over this information is lost so you can't reliably depend on it. When master starts storing persistent state about frameworks (likely couple of releases away), a re-registration attempt in such a case would be denied by the master. So that could be your signal. Alternatively, with persistence, you could also more reliably depend on /state.json to get this info. To take a step back, what is the problem you are trying to solve? Thanks,