Re: Review Request 37531: MESOS-3070 (Master CHECK failure if a framework uses duplicated task id)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37531/#review100737 --- Patch looks great! Reviews applied: [37531] All tests passed. - Mesos ReviewBot On Sept. 26, 2015, 2:52 a.m., Klaus Ma wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/37531/ > --- > > (Updated Sept. 26, 2015, 2:52 a.m.) > > > Review request for mesos, Jie Yu and Vinod Kone. > > > Bugs: MESOS-3070 > https://issues.apache.org/jira/browse/MESOS-3070 > > > Repository: mesos > > > Description > --- > > __Phenomenon:__ > The master crash because of duplicated task id > > __Root Cause:__ > The task id are stored in slave agent; if master failover, there's a time > window that new slave lanched a task with same task id; so if the old task > re-registered back, the master will crash because of duplicated task id. > > __Solution:__ > Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue. > > > Diffs > - > > src/master/http.cpp cd37c91 > src/master/master.hpp 4bb65f0 > src/master/master.cpp 6bee4f3 > src/tests/master_tests.cpp ee24739 > > Diff: https://reviews.apache.org/r/37531/diff/ > > > Testing > --- > > make > make check > > > Thanks, > > Klaus Ma > >
Re: Review Request 37531: MESOS-3070 (Master CHECK failure if a framework uses duplicated task id)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37531/ --- (Updated Sept. 26, 2015, 2:52 a.m.) Review request for mesos, Jie Yu and Vinod Kone. Changes --- Merge the code with the latest code; and re-check whether any potentail issue. I'll add more UT case on "kill duplicated tasks" and "show duplicated tasks in metrics" Bugs: MESOS-3070 https://issues.apache.org/jira/browse/MESOS-3070 Repository: mesos Description --- __Phenomenon:__ The master crash because of duplicated task id __Root Cause:__ The task id are stored in slave agent; if master failover, there's a time window that new slave lanched a task with same task id; so if the old task re-registered back, the master will crash because of duplicated task id. __Solution:__ Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue. Diffs (updated) - src/master/http.cpp cd37c91 src/master/master.hpp 4bb65f0 src/master/master.cpp 6bee4f3 src/tests/master_tests.cpp ee24739 Diff: https://reviews.apache.org/r/37531/diff/ Testing --- make make check Thanks, Klaus Ma
Re: Review Request 37531: MESOS-3070 (Master CHECK failure if a framework uses duplicated task id)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37531/ --- (Updated Sept. 5, 2015, 3:27 a.m.) Review request for mesos and Vinod Kone. Changes --- Add summary & description Summary (updated) - MESOS-3070 (Master CHECK failure if a framework uses duplicated task id) Bugs: MESOS-3070 https://issues.apache.org/jira/browse/MESOS-3070 Repository: mesos Description (updated) --- __Phenomenon:__ The master crash because of duplicated task id __Root Cause:__ The task id are stored in slave agent; if master failover, there's a time window that new slave lanched a task with same task id; so if the old task re-registered back, the master will crash because of duplicated task id. __Solution:__ Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue. Diffs - src/master/http.cpp 37d76ee src/master/master.hpp 36c6759 src/master/master.cpp 95207d2 src/tests/master_tests.cpp 8a6b98b Diff: https://reviews.apache.org/r/37531/diff/ Testing --- make make check Thanks, Klaus Ma