Re: How about disable the irc ASFBot to flood the irc channel?

2014-04-17 Thread Dick Davies
Can't you just '/ignore' the IRC bot if it bothers you?

On 17 April 2014 03:01, Chengwei Yang chengwei.yang...@gmail.com wrote:
 Hi All,

 I am a irc guy, maybe so as you. However, I found that there are two
 bots for JIRA, one for the mesos-dev mailing list, one for the irc
 channel.

 I generally think the bot for mailing list is fine, which push a JIRA
 msg in a mail thread, so with full context, readable and easy to
 understand the full page.

 However, the irc channel as its a room for human beings to chat with
 each other, I think not suitable if it's flood by the JIRA bot. I found
 it's very hard to me to figure out what human beings are talking about
 in the ASFBot flooding.

 Could we just kill ASFBot for the irc channel? But left the one for
 mesos-dev mailing list.

 --
 Thanks,
 Chengwei


 footnote: I have to Cc to myself otherwise I found Gmail doesn't mark
 that email as unread, so I can't pull it into my mutt mbox.


Re: How about disable the irc ASFBot to flood the irc channel?

2014-04-17 Thread Chengwei Yang
On Thu, Apr 17, 2014 at 09:31:43AM +0100, Dick Davies wrote:
 Can't you just '/ignore' the IRC bot if it bothers you?

Thanks for your tip.

IMHO, I think I'm not the first one have that idea, I also think this is
a reason why there isn't much discussion in the irc channel, discussion
may always disturbed by the JIRA bot.

--
Thanks,
Chengwei

 
 On 17 April 2014 03:01, Chengwei Yang chengwei.yang...@gmail.com wrote:
  Hi All,
 
  I am a irc guy, maybe so as you. However, I found that there are two
  bots for JIRA, one for the mesos-dev mailing list, one for the irc
  channel.
 
  I generally think the bot for mailing list is fine, which push a JIRA
  msg in a mail thread, so with full context, readable and easy to
  understand the full page.
 
  However, the irc channel as its a room for human beings to chat with
  each other, I think not suitable if it's flood by the JIRA bot. I found
  it's very hard to me to figure out what human beings are talking about
  in the ASFBot flooding.
 
  Could we just kill ASFBot for the irc channel? But left the one for
  mesos-dev mailing list.
 
  --
  Thanks,
  Chengwei
 
 
  footnote: I have to Cc to myself otherwise I found Gmail doesn't mark
  that email as unread, so I can't pull it into my mutt mbox.


What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-17 Thread David Greenberg
I don't recall the exact timeout of framework IDs, but what I'm wondering
is what happens if a scheduler tries to failover, but the failover grace
period has elapsed? Does it fail to register, or does it successfully
register and all the old executors are just gone?


Trying to get task reconciliation to work

2014-04-17 Thread Sharma Podila
Hello,

I don't seem to have reconcileTasks() working for me and was wondering if I
am either using it incorrectly or hitting a problem. Here's what's
happening:

1. There's one Mesos (0.18) master, one slave, one framework, all running
on Ubuntu 12.04
2. Mesos master and slave come up fine (using Zookeeper, but that isn't
relevant here, I'd think)
3. My framework registers and gets offers
4. Two tasks are launched, both start running fine on the single available
slave
5. I restart my framework. During restart my framework knows that it had
previously launched two tasks that were last known to be in running state.
Therefore, upon getting the registered() callback, it calls
driver.reconcileTasks() for the two tasks. In actuality, the tasks are
still running fine. I see this in mesos master logs:

I0417 12:26:27.207361 27301 master.cpp:2154] Performing task state
reconciliation for framework MyFramework

​But, no other logs about reconciliation.​

6. My framework gets no callback about status of tasks that it requested
reconciliation on.

At this point, I am not sure if the lack of a callback for status update is
due to
  a) the fact that my framework asked for reconciliation on running state,
which Mesos also knows to be true, therefore, no status update
  b) Or, if the reconcile is not working. (hopefully this; reason (a) would
be problematic)

So, I then proceed to another test:

7. kill my framework and mesos master
8. Then, kill the slave (as an aside, this seems to have killed the tasks
as well)
9. Restart mesos master
10. Restart my framework. Now, again the reconciliation is requested.
11. Still no callback.

At this time, mesos master doesn't know about the slave because it hasn't
returned since master restarted.
What is the expected behavior for reconciliation under these circumstances?

12. Restarted slave
13. Killed and restarted my framework.
14. Still no callback for reconciliation.

Given these results, I can't see how reconciliation is working at all. I
did try this with Mesos 0.16 first and then upgraded to 0.18 to see if it
makes a difference.

Thank you for any ideas on getting this resolved.

Sharma


Re: Trying to get task reconciliation to work

2014-04-17 Thread Sharma Podila
Should've looked at the code before sending the previous email...
master/main.cpp confirmed what I needed to know. It doesn't look like I
will be able to use reconcileTasks the way I thought I could. Effectively,
a lack of callback could either mean that the master agrees with the
requested reconcile task state, or that the task and/or slave is currently
unknown. Which makes it an unreliable source of data. I understand this is
expected to improve later by leveraging the registrar, but, I suspect
there's more to it.

I take it then that individual frameworks need to have their own mechanisms
to ascertain the state of their tasks.


On Thu, Apr 17, 2014 at 12:53 PM, Sharma Podila spod...@netflix.com wrote:

 Hello,

 I don't seem to have reconcileTasks() working for me and was wondering if
 I am either using it incorrectly or hitting a problem. Here's what's
 happening:

 1. There's one Mesos (0.18) master, one slave, one framework, all running
 on Ubuntu 12.04
 2. Mesos master and slave come up fine (using Zookeeper, but that isn't
 relevant here, I'd think)
 3. My framework registers and gets offers
 4. Two tasks are launched, both start running fine on the single available
 slave
 5. I restart my framework. During restart my framework knows that it had
 previously launched two tasks that were last known to be in running state.
 Therefore, upon getting the registered() callback, it calls
 driver.reconcileTasks() for the two tasks. In actuality, the tasks are
 still running fine. I see this in mesos master logs:

 I0417 12:26:27.207361 27301 master.cpp:2154] Performing task state
 reconciliation for framework MyFramework

 ​But, no other logs about reconciliation.​

 6. My framework gets no callback about status of tasks that it requested
 reconciliation on.

 At this point, I am not sure if the lack of a callback for status update
 is due to
   a) the fact that my framework asked for reconciliation on running state,
 which Mesos also knows to be true, therefore, no status update
   b) Or, if the reconcile is not working. (hopefully this; reason (a)
 would be problematic)

 So, I then proceed to another test:

 7. kill my framework and mesos master
 8. Then, kill the slave (as an aside, this seems to have killed the tasks
 as well)
 9. Restart mesos master
 10. Restart my framework. Now, again the reconciliation is requested.
 11. Still no callback.

 At this time, mesos master doesn't know about the slave because it hasn't
 returned since master restarted.
 What is the expected behavior for reconciliation under these circumstances?

 12. Restarted slave
 13. Killed and restarted my framework.
 14. Still no callback for reconciliation.

 Given these results, I can't see how reconciliation is working at all. I
 did try this with Mesos 0.16 first and then upgraded to 0.18 to see if it
 makes a difference.

 Thank you for any ideas on getting this resolved.

 Sharma




Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-17 Thread Adam Bordelon
David, did you see Vinod's response to your (identical) question on dev@?
http://www.mail-archive.com/dev@mesos.apache.org/msg11634.html


On Thu, Apr 17, 2014 at 11:26 AM, David Greenberg dsg123456...@gmail.comwrote:

 I don't recall the exact timeout of framework IDs, but what I'm wondering
 is what happens if a scheduler tries to failover, but the failover grace
 period has elapsed? Does it fail to register, or does it successfully
 register and all the old executors are just gone?




Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-17 Thread David Greenberg
I did not, thank you! I reported when I didn't see a response and couldn't
find it in the dev archive--I thought maybe it had gotten blackholed
because I don't subscribe to dev.

On Thursday, April 17, 2014, Adam Bordelon a...@mesosphere.io wrote:

 David, did you see Vinod's response to your (identical) question on dev@?
 http://www.mail-archive.com/dev@mesos.apache.org/msg11634.html


 On Thu, Apr 17, 2014 at 11:26 AM, David Greenberg 
 dsg123456...@gmail.comjavascript:_e(%7B%7D,'cvml','dsg123456...@gmail.com');
  wrote:

 I don't recall the exact timeout of framework IDs, but what I'm wondering
 is what happens if a scheduler tries to failover, but the failover grace
 period has elapsed? Does it fail to register, or does it successfully
 register and all the old executors are just gone?





Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-17 Thread David Greenberg
My follow-up question is this--is there a way to tell whether I'm outside
of the timeout window? I'd like to have my framework check ZK and determine
whether it's w/in the framework timeout or not, so that it can make the
correct call.


On Thu, Apr 17, 2014 at 5:23 PM, Adam Bordelon a...@mesosphere.io wrote:

 David, did you see Vinod's response to your (identical) question on dev@?
 http://www.mail-archive.com/dev@mesos.apache.org/msg11634.html


 On Thu, Apr 17, 2014 at 11:26 AM, David Greenberg 
 dsg123456...@gmail.comwrote:

 I don't recall the exact timeout of framework IDs, but what I'm wondering
 is what happens if a scheduler tries to failover, but the failover grace
 period has elapsed? Does it fail to register, or does it successfully
 register and all the old executors are just gone?





Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-17 Thread Vinod Kone
On Thu, Apr 17, 2014 at 2:56 PM, David Greenberg dsg123456...@gmail.comwrote:

 My follow-up question is this--is there a way to tell whether I'm outside
 of the timeout window? I'd like to have my framework check ZK and determine
 whether it's w/in the framework timeout or not, so that it can make the
 correct call.


Hey David,

Currently, the only signal you can get is by hitting /state.json endpoint
on the master. The framework should've been moved to 'completed_frameworks'
after the failover timeout. Of course, if a master fails over this
information is lost so you can't reliably depend on it.

When master starts storing persistent state about frameworks (likely couple
of releases away), a re-registration attempt in such a case would be denied
by the master. So that could be your signal. Alternatively, with
persistence, you could also more reliably depend on /state.json to get
this info.

To take a step back, what is the problem you are trying to solve?

Thanks,